Personal tools

Community

Data Publication Working Group

In the empirical sciences, data has traditionally been an integral part of scholarly publishing. However, rapid technical developments—such as digital data and high-throughput techniques—changed the scholarly publishing paradigm dramatically in the last decades, which requires new approaches to ensure availability and usability of science data. Existing approaches to address this issue are mostly technically dominated and lack success because they do not supply the necessary benefit for data producers, the wider community, and society. The concept of Data Publication is undergoing a renaissance as part of scholarly communication and on the base of new and proven technologies. Publishing data is a new and strong incentive for scientist to share their data and has positive effects on the data quality. The impact on citation rates can be seen in recent bibliometric studies on science articles providing access to underlying data.

Data Publication

In late 2012, a first WDS Working Group on Data Publication was started, which was subsequently endorsed in 2013 by the Research Data Alliance (RDA) as an RDA–WDS Interest Group (IG) on Publishing Data. The aim of this initiative is to identify and define best practice for publishing data and to test its implementation by involving the core stakeholders: Researchers, Institutions, Data Centres, Scholarly Publishers, and Funders. Currently, publishing data is facing core issues that are best ironed out in its early days, when the habits and customs are still flexible. The WDS concept addresses the essential problems and practical issues to help enable the publication of research data as part of the scholarly record, as well as their implications for the different stakeholders. Because the interlinkage between the issues was identified from the outset and addressing them separately would not be efficient, it was decided to establish 4 Working Groups (WGs)—Workflows, Bibliometrics, Services, and Costs recovery—under one coordination umbrella. The 4 WGs are closely interlinked: bibliometrics on published data depends on the way data are published and cited, which in turn strongly influences the way services supporting the publication of data can be set up. Any conceived solution will raise financial issues and thus raising the question on how resources and costs for publishing data may be identified and addressed. Interaction and exchange of results between groups is monitored and guided through a common management structure that ensures involvement of the main stakeholders at all levels. The concept of this WG supplies a holistic approach aiming at pushing and establishing data publishing amongst stakeholders.

Bibliometrics for published data

As one of the major means to measure research productivity, bibliometrics of research data practically does not exist. The way data are referenced is inconsistent and—aggravating the situation—citing data is not common practice in scholarly publishing. Instead, data centres are trying to keep track of literature using their data or by supplying other metrics such as download statistics. The WG will investigate approaches and develop solutions that allow proper analysis of content and citations.

Costs recovery for Data Centres

General services to publish data are not available and the necessary editorial process leading to quality assured and efficiently usable data requires resources to be quantified.  At present, there is an imbalance between the capacities and functionality of existing data centres and data repositories and the global production of scientific data. Budgets of data centres generally cover a precise scope, mostly data production of the host institution. The WG will supply cost estimates and elaborate a business model to compensate for additional costs of publishing data in an open access environment.

Services for publishing data

During the last century, services for publishing data have concentrated on registration of data entities. Services to cross-reference research data and literature or to publish data have only started recently and are limited in scope and functionality. The WG will investigate content and interoperability requirements for data centres and academic publishers. Building on existing components, the WG will concentrate on the conception and implementation of a one-for-all cross-referencing service.

Objectives

Publishing data can follow good practices of conventional publication of articles in journals that includes online submission, quality checks, peer-review, editorial decisions, and an equivalent of ‘page proofs’. In fact, storage of data in public repositories and the ability to reference datasets is getting increasingly important. It is already mandatory for the acceptance of peer-reviewed publications in specific fields of research such as molecular sciences or ecology.  Data Publication as a generally accepted new publication type—self-standing or supplementary to literature—is not without controversy. For data centres, science publishers, and service providers data publication is a challenge in terms of organization, technical developments, and funding. Compared to science articles, the economic value of data is generally higher but they also need more resources for production, processing, long-term archiving and publication. If published data are to be usable and as reliable as peer-reviewed science articles they should not only meet scientific requirements, but also archival and longevity requirements. Archiving and publishing procedures for data must be transparent and accepted as part of the science culture. Moreover, published data should conform to generally accepted content and interoperability standards, thus allowing for efficient usage and integration of data from various sources.

The overall objectives of the WGs under the umbrella of the WDS–RDA Publishing Data IG are to incentivize and enable researchers to publish data by:

  • Promote and establish the data publication concept among data centres: What are possible workflows for publishing data? What are the experiences gained so far? Are there generic models applicable to the various data centres? What should be the difference between publicly accessible data and published data? What is the role of QA/QC and peer-review? What is the role of certification? What are the costs? WDS as an umbrella organization for data publishers?
  • Promote and establish the data publication concept among science publishers and bibliometric service providers: What are the experiences gained so far? Is there a common workflow for the editorial of data and articles? What are the implications for journal editors and publishers in general? What are the implications for peer-review? What is the role and perspective of supplementary materials compared to published data related to an article? What are the benefits? What are the costs
  • Establish data publishing services as part of scholarly publishing: Who are the relevant stakeholders? Which services are needed to embed data publications into the current framework of scholarly publishing - on the side of data centres / science publishers / bibliometric service providers? What are the organizational and technical requirements for the different stakeholders? Which are the relevant standards for content and interoperability? How do data publication services fit into the globally evolving data infrastructures? Is there a common model for a service infrastructure? What are the benefits? What are the costs? 

Co-chairs:

  • Michael Diepenbroek (Germany, PANGAEA, WDS-SC)
  • Eefke Smit (The Netherlands, STM)
  • Jonathan Tedds (UK, University of Leicester)
  • Mustapha Mokrane (Ex officio, WDS-IPO)

Members:

  • David Anderson (US, NOAA's NCDC–WDS Regular Member)
  • Geoffrey Bilder (UK, CrossRef)
  • Sarah Callaghan (UK, BADC)
  • Ross Cameron (UK, Scopus)
  • David Carlson (UK, ESSD)
  • Cyndy Chandler (US, Woods Hole Oceanographic Institution)
  • Merce Crosas (US, Harvard University)
  • Mikael Karstensen Elbæk (Denmark, OpenAIRE/Technical University of Denmark)
  • Janine Felden (Germany, MARUM–WDS Regular Member)
  • Bettina Görner (Germany, Springer)
  • John Helly (US, UCSD)
  • Francisco Hernandez (Belgium, VLIZ Data Centre–WDS Regular Member)
  • Simon Hodson (UK, CODATA)
  • Hylke Koers (The Netherlands, Elsevier)
  • Rebecca Lawrence (UK, F1000 Research Ltd.)
  • Paolo Manghi (Italy, OpenAIRE, CNR)
  • Caroline Martin (France, IRSTEA)
  • Jo McEntyre (UK, EBI)
  • Ingeborg Meijer (The Netherlands, Leiden University)
  • Fiona Murphy (UK, Wiley-Blackwell–WDS Associate Member)
  • Fiona Nielsen (UK, DNAdigest.org)
  • Amy Nurnberger (US, Columbia University Libraries)
  • Lyubomir Penev (Bulgaria, Pensoft Publishers)
  • Lisa Raymond (US, Library Woods Hole Oceanographic Institution)
  • Nigel Robinson (UK, Thomson Reuters)
  • Sergio Ruiz/Jan Brase (Germany, DataCite–WDS Associate Member)
  • Jochen Schirrwagen (Germany, OpenAIRE)
  • Johanna Schwarz (Germany, Springer)
  • Barbara Sierman/Marcel Ras (The Netherlands, Koninklijke Bibliotheek)
  • Mark Thorley (UK, NERC)
  • Frank Toussaint (Germany, DKRZ-WDC Climate–WDS Regular Member)
  • Mary Vardigan (US, ICPSR–WDS Regular Member)
  • Anita de Waard (The Netherlands, Elsevier–WDS Associate Member)
  • Angus Whyte (Digital Curation Centre, University of Edinburgh)
  • Juanle Wang (China, WDC RRE–WDS Regular Member)
  • Eva Zanzerkia (US, NSF)