Just returned from a useful trip to visit my collaborators working with the 'Chinese plant trait database' at the Northwestern Agricultural and Forestry University in Yangling, China. We now have information from several hundred sites across China, and this will allow us to make detailed analyses of trait–climate relationships. But trips like these remind me that data are the bitcoin of Chinese science. It requires management of complex social networks to put together a dataset this large, but there are always people outside these networks who nevertheless could contribute. And then, once the science is done, what happens to the data? There is an international database for plant trait data, but where is the trusted repository for such data in China? My young collaborators are keen to share openly with other scientists and we need to make this easier. China is one of the few countries that has a national group supporting WDS activities – guess what I am going to be talking to them about next? Sandy H.
During the last International Polar Year (IPY) 2007–2008, a wide range of research topics were addressed, from glaciology to biology, from biochemistry to biophysics, from oceanography to physiology, from atmospheric to social sciences. Despite the vast amounts collected, there was no central archive for IPY-related data. Instead they have been spread widely, with a lot of the data published in research articles only.
To enhance the availability and visibility of publication-related IPY data, a concerted effort among PANGAEA – Data Publisher for Earth and Environmental Science, the ICSU World Data System, and the International Council for Scientific and Technical Information (ICSTI) was undertaken to extract data resulting from IPY publications for long-term preservation. A list of 1380 references was provided by ICSTI, and this bibliography served as a basis for me to filter out journal articles containing extractable data—either from the articles themselves (in the form of tables) or from supplementary materials supplied with the publication.
Ultimately, data and their associated metadata were extracted from 450 IPY articles. These data can now be accessed from here, and individual parts can be searched using the PANGAEA search engine and adding +project:ipy.
For more information, see also Driemel et al. (2015), The IPY 2007–2008 data legacy –creating open data from IPY publications. Earth Syst. Sci. Data, 7, 239–244, doi:10.5194/essd-7-239-2015.
Christine Borgman—a Professor in Information Studies at University of California, Los Angeles—has been given a three-year research grant by the Alfred P. Sloan Foundation to analyze how data are handled in four different research projects with the aim of simplifying data practices and challenging assumptions about the value of sharing data.
The following article on Professor Borgman's work and on the complexities of data sharing by Tiffany Esmailian was first published on 25 September 2015 on phys.org. We hope that the WDS community will find it of interest.
A Blog post by Paolo Manghi and Sandro La Bruzzo (OpenAIRE)
Sharing links between the published literature and datasets is crucial to achieve the full potential of research data publishing. This article presents the coordination and implementation efforts of the ICSU-WDS–RDA Data Publishing Services Working Group (DPS-WG) and the OpenAIRE infrastructure towards realizing and operating an open and universal data-literature interlinking service (DLI Service). The service is the result of an open collaboration between major stakeholders in the field of data publishing. It provides access to a graph of dataset–literature and dataset–dataset links collected from a variety of major data centres, publishers, and research organizations. On the basis of feedback from content providers and consumers, the service will also enable the incremental refinement of an interlinking data model and exchange format, towards shaping up a universal, cross-platform, cross-discipline solution for sharing dataset–literature links.
Introduction and vision
Challenges to realize the full potential of research data exist at different levels—from cultural aspects, such as proper rewards and incentives, to policy and funding, and to technology. The challenges are interconnected and impact a diversity of stakeholders in the research data landscape—including researchers, research organizations, funding bodies, data centres, and publishers. To make progress in overcoming barriers and building a stronger research data infrastructure, it is essential that the different stakeholders work together to address common issues and move forward on a common path. Alongside other organizations, the ICSU World Data System (ICSU-WDS), the Research Data Alliance (RDA), and OpenAIRE provide useful forums for such collaborations. In particular, they are today working in synergy on an initiative that brings together different parties in the research data landscape with the objective of creating the Data Literature Interlinking Service (DLI Service), namely, 'an open, freely accessible, web-based service that enables its users to identify datasets that are associated with a given article, and vice versa'. At the moment of writing, members of the initiative include: the ICSU-WDS–RDA DSP-WG, OpenAIRE, RDA, ICSU-WDS, STM, CrossRef, DataCite, ORCID, the Australian National Data Service, and the RMap project. The vision is that of moving away from several bilateral arrangements that characterizes the research ecosystem today, towards establishing common standards and tools that sit in the middle and interact with all parties (see Figure). Such a transition would facilitate interoperability between platforms and systems operated by the different parties, reduce systemic inefficiencies in the ecosystem, and ultimately enable new tools and functionalities to the benefit of researchers.
The DLI Service populates and provides access to a graph of 'authoritative' dataset–literature links collected and aggregated from a variety of major data centres, publishers, and research organizations. It is intended to offer facilities for the following classes of actors:
– End users: Searching and browsing the graph of links via the Prototype PORTAL – Third-party service developers: Accessing publications and datasets in the graph via programmatic APIs – Content providers: Willing to feed high-quality authoritative links between publications and datasets or between datasets to the service (complete list of content providers).
Note: Formal data acquisition policies, SLAs, and data provider registration procedures will be produced at a later stage; currently each 'application' is processed independently with bilateral agreements. on the basis of feedback from content providers and consumers, The DLI Service will refine its underlying interlinking data model and exchange format to make it a universal, cross-platform, cross-discipline solution for collecting and sharing dataset–literature links, balancing between the information that can be shared across content providers and the information needed by its consumers.
In the forthcoming months, further work will be carried out towards the delivery of a production service that is fully reliable in terms of QoS and quality of content. The following actions will be undertaken:
Definition of a content acquisition policy: minimal quality requirements to be respected by content providers in order for their publications, datasets and relative relationships to be aggregated by the system;
Definition of SLAs for content providers: make sure content providers are aware and agree on how their content (metadata) will be made openly accessible via the service;
Technical enhancements: data harmonization (e.g. cross-PID deduplication), data programmatic access (e.g. high-throughput resolver), data scalability (e.g. moving away from open source databases).
Deployment as an OpenAIRE infrastructure operational service: deploying the service on the OpenAIRE hardware infrastructure.
NSF states that accepted manuscripts or versions of record must be publicly available in an approved repository within 12 months of publication. Availability signifies that any user can download, read, and analyze the data free of charge. This will apply to new awards resulting from proposals submitted, or due, on or after the effective date of the Proposal & Award Policies & Procedures Guide that will be issued in January 2016.
The responses of the three organizations can be accessed below:
The NASA Land Atmosphere Near real-time Capability for EOS (LANCE) AMSR2 Processing Center at the Global Hydrology Resource Center (WDS Regular Member) in Huntsville, Alabama would like to announce the availability of its first AMSR2 near real-time dataset, NRT AMSR2 L2B Global Swath GSFC Profiling Algorithm 2010: Surface Precipitation, Wind Speed over Ocean, Water Vapor over Ocean and Cloud Liquid Water over Ocean. These LANCE AMSR2 near real-time products, with noted limitations, are generated and available to registered users via HTTPS with an average latency of less than 3 hours. More information about LANCE AMSR2 near real-time data is available here.