A Blog post by Paolo Manghi and Sandro La Bruzzo (OpenAIRE)
Sharing links between the published literature and datasets is crucial to achieve the full potential of research data publishing. This article presents the coordination and implementation efforts of the ICSU-WDS–RDA Data Publishing Services Working Group (DPS-WG) and the OpenAIRE infrastructure towards realizing and operating an open and universal data-literature interlinking service (DLI Service). The service is the result of an open collaboration between major stakeholders in the field of data publishing. It provides access to a graph of dataset–literature and dataset–dataset links collected from a variety of major data centres, publishers, and research organizations. On the basis of feedback from content providers and consumers, the service will also enable the incremental refinement of an interlinking data model and exchange format, towards shaping up a universal, cross-platform, cross-discipline solution for sharing dataset–literature links.
Introduction and vision
Challenges to realize the full potential of research data exist at different levels—from cultural aspects, such as proper rewards and incentives, to policy and funding, and to technology. The challenges are interconnected and impact a diversity of stakeholders in the research data landscape—including researchers, research organizations, funding bodies, data centres, and publishers. To make progress in overcoming barriers and building a stronger research data infrastructure, it is essential that the different stakeholders work together to address common issues and move forward on a common path. Alongside other organizations, the ICSU World Data System (ICSU-WDS), the Research Data Alliance (RDA), and OpenAIRE provide useful forums for such collaborations. In particular, they are today working in synergy on an initiative that brings together different parties in the research data landscape with the objective of creating the Data Literature Interlinking Service (DLI Service), namely, 'an open, freely accessible, web-based service that enables its users to identify datasets that are associated with a given article, and vice versa'. At the moment of writing, members of the initiative include: the ICSU-WDS–RDA DSP-WG, OpenAIRE, RDA, ICSU-WDS, STM, CrossRef, DataCite, ORCID, the Australian National Data Service, and the RMap project. The vision is that of moving away from several bilateral arrangements that characterizes the research ecosystem today, towards establishing common standards and tools that sit in the middle and interact with all parties (see Figure). Such a transition would facilitate interoperability between platforms and systems operated by the different parties, reduce systemic inefficiencies in the ecosystem, and ultimately enable new tools and functionalities to the benefit of researchers.
The DLI Service populates and provides access to a graph of 'authoritative' dataset–literature links collected and aggregated from a variety of major data centres, publishers, and research organizations. It is intended to offer facilities for the following classes of actors:
– End users: Searching and browsing the graph of links via the Prototype PORTAL
– Third-party service developers: Accessing publications and datasets in the graph via programmatic APIs
– Content providers: Willing to feed high-quality authoritative links between publications and datasets or between datasets to the service (complete list of content providers).
Note: Formal data acquisition policies, SLAs, and data provider registration procedures will be produced at a later stage; currently each 'application' is processed independently with bilateral agreements. on the basis of feedback from content providers and consumers, The DLI Service will refine its underlying interlinking data model and exchange format to make it a universal, cross-platform, cross-discipline solution for collecting and sharing dataset–literature links, balancing between the information that can be shared across content providers and the information needed by its consumers.
In the forthcoming months, further work will be carried out towards the delivery of a production service that is fully reliable in terms of QoS and quality of content. The following actions will be undertaken:
- Definition of a content acquisition policy: minimal quality requirements to be respected by content providers in order for their publications, datasets and relative relationships to be aggregated by the system;
- Definition of SLAs for content providers: make sure content providers are aware and agree on how their content (metadata) will be made openly accessible via the service;
- Technical enhancements: data harmonization (e.g. cross-PID deduplication), data programmatic access (e.g. high-throughput resolver), data scalability (e.g. moving away from open source databases).
- Deployment as an OpenAIRE infrastructure operational service: deploying the service on the OpenAIRE hardware infrastructure.