The term of the current Scientific Committee will expire in June 2018 and several seats will be up for renewal. The International Council for Science and the World Data System call their respective members as well as partners organizations concerned with scientific data stewardship to nominate new members of the Scientific Committee. The Scientific Committee is the governing body of WDS as ...
Deadline for proposals is extended to midnight UTC on 19 February 2018: Proposals are invited for sessions at SciDataCon 2018: The Digital Frontiers of Global Science. SciDataCon 2018 will take place as part of International Data Week (IDW), 5–8 November 2018 [NEW DATE], in Gaborone, Botswana. Session proposals should be made at: http://www.scidatacon.org/IDW2018/ .
International Data Week 2018, coming for the first time to Africa, will take place on 5–8 November 2018, in Gaborone, Botswana. This decision has been taken in order to avoid clashes with the UN World Data Forum (22–24 October) and the Plenary Meeting of the Group on Earth Observations (31 Oct–1 Nov). For this data conference taking place in Africa and which will contain as one of its ...
The OECD Global Science Forum (GSF) and the World Data System partnered on a project to inform policies to promote open data for science focused on internationally coordinated data networks. The overall aim of this project was to identify principles and policy actions that can enable the establishment and maintenance of effective international data networks that are necessary to support a ...
DOIs and Licensing for Geomagnetic Data & Products: Current Status
A Blog post by Aude Chambodut (WDS Scientific Committee Member)
The longest time series of geomagnetic data are certainly the ones acquired by magnetic observatories (Fig. 1), some of which reach a century of uninterrupted measurements.
There are currently about 200 open magnetic observatories worldwide. In each of them, absolute vector observations of the Earth's magnetic field are recorded accurately and continuously, with a time resolution of one minute or less, over a long period of time. Magnetic observatory data are 'primary data' that are extensively used in the derivation of data products ('secondary data') such as: International Geomagnetic Reference Field models, geomagnetic indices, space weather applications…
Figure 1. Paris declination series: annual means of declination corrected and adjusted to actual French National Magnetic Observatory - CLF (Mandea and LeMouël, 2016).
The whole community of geomagnetic observatories is particularly well organized and federated under the auspices of the International Association of Geomagnetism and Aeronomy (IAGA) one of the associations of the International Union of Geodesy and Geophysics [WDS Partner Member].
Since the beginning of the 1960’s (the birth of the World Data Centre system established in 1957 provided archives for the observational data resulting from the 'International Geophysical Year'), magnetic observatories data have been mostly publicly available (Fig. 2). Getting access to a network of stations is much more interesting than having access to just one isolated observatory.
Figure 2. Location of magnetic observatories (all periods) having at least one datum ingested into the Geomagnetism Data Portal of WDC – Geomagnetism, Edinburgh [WDS Regular Member].
The cooperative spirit within the geomagnetic community thus knows a fairly long-standing history that has had to cope with the successive technological revolutions regarding data recording (e.g., analogic to numeric; Fig. 3), but also regarding the way data are made available (from yearly books, via isolated recording supports, up to connected data repositories). In this regard, the community had practices based on fair-play and goodwill recognition of data sources/providers. Such practices worked, and would have worked for many more decades without new challenges to meet the changing requirements of users and stakeholders.
Indeed, in our increasingly connected world, it is evermore important to closely follow evolution regarding data management. Some aspects were previously not sufficiently taken into account, such as the discovery, citation, and reuse of the geomagnetic data. Nowadays, it appears no longer possible to keep sources of data for only 'informed people', and the existing licensing conditions for distribution of geomagnetic data and data products are (in part) not adequately elaborated to address this change and need to be improved.
Figure 3. Analogue magnetogram from Vladivostok; 24 September 1934 (through ICSU grant-2003 by WDC – Solar–Terrestrial Physics, Moscow [WDS Regular Member]).
IAGA has thus agreed to set up Task Forces on the abovementioned aspects, with a consensus already found when it comes to the aims of data/ data-product licensing and Digital Object Identifier (DOI) minting to:
– Provide recognition and acknowledgement.
– Enable creation of new data products from primary data (e.g., geomagnetic indices) or in combination with other data sources (e.g., global models of geomagnetic field).
– Prevent the change and/or appropriation of data by a third party.
– Enable reuse of data in a reproducible way.
– Supply metadata that enable unique identification of a dataset, as well as providing relevant information to the user.
– Use machine-readable and widely used licenses.
– Enable easy online access to research data for discovery.
The work is in progress such that it meets the state-of-art when it comes to applying licenses and minting DOI for geomagnetic data and data products, with the goal to ensure the availability into the 21st century of the tremendous efforts achieved by generations of observers in geomagnetism throughout the world.
Ensuring Scientific Data Remain a Global Public Good
A Blog post by Sandy Harrison (WDS-SC Chair)
At the end of October last year, the members of the International Council for Science and the International Social Science Council voted overwhelming for a merger of the two organizations. The new organization—which will be called the International Science Council and which should come into being in summer 2018—will serve as a single, global voice for science and will help to provide the evidence base for, and coordinate action on, issues of public concern.
The importance of data for enabling science and for providing the necessary evidence base for action was necessarily both a major concern and discussion point during the meeting at which this historic vote was taken. Access to high-quality data from multiple disciplines is needed to be able to understand and address the complex issues facing our global society. New pluridisciplinary approaches to analyzing and modelling data will be required. And the data upon which decision-making and management of our planet rests must be open access, freely available, and subject to public scrutiny.
So far, so good. However, recognition of the importance of free and open access to data is only the beginning. The new 'voice of science' in the 21st century will need to champion the infrastructure required to ensure free and open access to data. Data stewardship cannot be achieved through pious statements or international accords, it requires the existence of data stewards—organizations that are funded and supported to provide professional support for data archiving, data analysis, and data sharing.
The mission of the World Data System is, of course, to provide an umbrella for data stewards worldwide and to champion new and better ways of ensuring the continuance of our data infrastructure. But there is still a long way to go to ensure both the continued funding for the many organizations that are part of this landscape and that these organizations continue to adopt and promote best data practices.
Too much of the data compilation is currently being done by individual scientists or science teams on short-term funding; too much of the work of data stewardship is currently being done pro bono. Neither of these situations is sustainable. Thus, we must hope that the new International Science Council will make the practical issues of data stewardship in the 21st century a major focus of its work. And then we really will have something to celebrate next summer!
The What, Why, and How of Data Management Planning
A Blog post by Ingrid Dillo (WDS-SC Vice-chair)
Whether your research is performed in a lab, in the field, or at the office, and with a large or small team, it inevitably involves research information, or data. These data are valuable, and deserve to be properly managed. Over the last few years, the notion that good data management is an important part of scientific practice has increasingly found widespread acceptance.
Data management planning is the structured way of thinking about the research data you are going to collect. What type of research data will the research project produce? What format will you use? How will you store them and how can they be accessed? By thinking about these questions at an early stage and documenting your answers you will avert future problems as a researcher.
One of the ways to think about the data collecting process is by using a format: a Data Management Plan (DMP). These formats come in a variety of shapes and sizes, depending on the research discipline, requirements from the research funder, and local initiatives.
A DMP can be a separate document. It helps the researcher identify and list the risks with regard to management of research data during the entire research process. Because not everything is known from the outset, it is recommended to treat the DMP as a 'living document', which can be revised and detailed periodically.
Research funders nowadays often require that a DMP is included in the project proposal. Typically, the research proposal either contains a data section or a separate DMP is incorporated as an annex. For research funders, the reason behind it is to promote open access to research data: in their opinion, research data produced in the context of a publicly funded research project should be freely made available for reuse and verification. Recent cases of data manipulation and fraud emphasize the importance of access to the original data.
By increasing awareness of research data management across the board—from funders, to researchers, to support staff—we can ensure that research data are handled properly, both in the present and future. More data will be available for reuse, and more data will be reused. In the end, reuse of any form will help contribute to the solutions of today's grand challenges.
More Information on Data Management Planning
- In the five-minute video below, Research Data Netherlands, has listed what a DMP is and what its advantages are, and an example is given of a format and a completed DMP.
- The Essentials 4 Data Support is an introductory course for those who provide support to researchers in storing, managing, archiving and sharing their research data (data support staff). With this course, Research Data Netherlands aims to contribute to the professional development of, and coordination among, data support staff. The course covers the basic building blocks of the discipline and revolves around online material. Research Data Netherlands has placed the online learning materials at everyone's disposal free of charge, based on the idea of open access and sharing knowledge. You are free to take the online course anytime at your own initiative.
- Research Data Management: An overview of recent developments in the Netherlands by Marjan Grootveld (DANS) and Marnix van Berchum (Huygens ING), 2017.
Thoughts on Future Trust
A Blog post by Wim Hugo (WDS-SC Vice-chair)
The ICSU World Data System (ICSU-WDS) and the Data Seal of Approval have recently collaborated on the alignment of their respective sets of criteria for certification as a Trusted Digital Repository, and is in process of establishing a joint certification authority—the CoreTrustSeal—to manage the certification process associated with it. This activity contributes to a significant future focus on the trust that can be placed in elements of a distributed global research infrastructure, and the increased automation of its verification. However, it is the tip of the iceberg.
The WDS Knowledge Network defines many of the components of research activity for which there is some form of trusted service or infrastructure component required: ranging from the obvious need to reliably refer to research outputs, researchers, institutions, artefacts, projects, and the like, though the more complex aspects of trusted repositories, registries, vocabulary, and ontology services, to the assigning of levels of maturity, sustainability, or quality to these.
The trust that is required for research infrastructure to function properly is somewhat different to the trust that can be placed in the content that is curated by the research infrastructure—although one has to recognize that the two aspects are interrelated and, in some instances, inseparable. Furthermore, the trust that can be placed in content should ideally also distinguish between the significance and usability of that content, and its quality. These facets are not necessarily the same, but again are conflated to some extent in discussions about fitness-for-use, quality metrics, and the like.
Let’s work though these distinctions at the hand of some examples.
The main aim of a scholarly publication is to assert a claim in respect of a novel finding, and to expose that claim to peer review for the purpose of correction, as required1. One needs to distinguish the rules (criteria for trust) associated with the process of science and the value of the content. The latter is largely judged by significance, and measured—with varying degrees of usefulness—through citation indices and impact factors.
There are arguments that this stream of self-correcting progress is broken, especially in some disciplines, and this is strongly related to the criteria for trust. Such criteria are largely stated informally and implemented with varying degrees of diligence in research institutions, and are mostly delegated to peer review to determine if the result is trustworthy. Peer review purports to determine originality (not easily automated, and essentially linked to end-user value), quality (certainly possible to automate) and validity (can be partly automated).
One could—and in my view, should—argue that processes can be verified objectively and preferably automatically, and that our aim should be to certify their veracity using measurable criteria. Such validity and quality criteria could be extended to feasibility of reproduction, access to supporting datasets, and the like. References to widely used protocols and methods, standards, samples, and research pattern—increasingly linked to persistent identifiers—also increase the verifiable level of trust in the process.
Vocabulary (name) services play an increasingly important role in research infrastructures for a variety of reasons. Firstly, vocabularies and name services are critical to the realization of the semantic web and Linked Open Data: in essence, reducing ambiguity by referring precisely to a concept, entity, relationship, and/or characteristic of either. Secondly, these services are used to enhance the experiences of users and the value of knowledge by navigating the relationships that exist among them, which is conceptually captured in the WDS Knowledge Network and is increasingly implemented, for example, in projects such as Scholix. Again, one should not confuse the acceptability of the vocabulary or service content (e.g., whether all taxonomists in the world agree that a taxon is correct), and the quality of the service provided by the infrastructure component. For the first case, there may never be agreement (especially with taxonomists!); but, for the latter, it is a relatively simple matter to determine what constitutes a well-defined, standardized vocabulary or name service, and community efforts are underway to document and define these criteria. In addition to such operational requirements, one should include the need for sustainability and continued access into the reasonable future.
In general, one can distinguish—for all of the elements of the WDS Knowledge Network—a clear separation between judgements about value (significance, originality, inclusiveness, consensus, etc.) and the quality of the process (sustainability, standards compliance, reproducibility, and similar concerns). And, extrapolating this into the future, I suspect that we need to get ready for the following:
- Significant broadening of services and infrastructure that cover all aspects of the WDS Knowledge Network, as well as a parallel rise in the need for certification of these services and infrastructure. Already, there is a perceived need for the certification of repositories of open source code and of vocabulary services, to name but two.
- Increased automation of the certification of processes that is in tune with an expected, rapid upturn in artificial intelligence and machine learning. This will be needed because I have no doubt that the scientific method will be increasingly automated within the next decade or so. We are already overwhelmed by volumes of data and numbers of publications, and science cannot scale any further as it is limited by human capacity.
On the basis of the above, and with science increasingly reliant on trust in a wider context, ICSU-WDS should start focussing on defining trust criteria beyond data repositories and services, and on how to automate its assessment: this being the only really scalable solution to a problem of rapidly growing scope.
1 There is a parallel focus on review and consolidation or synthesis based on existing knowledge.