Personal tools

News Archive

View all »

Change of Dates: International Data Week 2018

Change of Dates: International Data Week 2018

International Data Week 2018, coming for the first time to Africa, will take place on 5–8 November, in Gaborone, Botswana. This decision has been taken in order to avoid clashes with the UN World Data Forum (22–24 October) and the Plenary Meeting of the Group on Earth Observations (31 Oct–1 Nov). For this data conference taking place in Africa and which will contain as one of its major ...

Coordination and Support of International Research Data Networks: Final Report Published

Coordination and Support of International Research Data Networks: Final Report Published

The OECD Global Science Forum (GSF) and the World Data System partnered on a project to inform policies to promote open data for science focused on internationally coordinated data networks.  The overall aim of this project was to identify principles and policy actions that can enable the establishment and maintenance of effective international data networks that are necessary to support a ...

Call for Session Proposals: SciDataCon 2018

Call for Session Proposals: SciDataCon 2018

Proposals are invited for sessions at SciDataCon 2018: The Digital Frontiers of Global Science . SciDataCon 2018 will take place as an integrated part of International Data Week , 5–8 November 2018 [NEW DATE], in Gaborone, Botswana. Session proposals should be made via: http://www.scidatacon.org/IDW2018/ The deadline for proposals is midnight UTC on 2 February 2018. If you have not ...

WDS Blog

View all »

The What, Why, and How of Data Management Planning

Ingrid DilloA Blog post by Ingrid Dillo (WDS-SC Vice-chair)

Whether your research is performed in a lab, in the field, or at the office, and with a large or small team, it inevitably involves research information, or data. These data are valuable, and deserve to be properly managed. Over the last few years, the notion that good data management is an important part of scientific practice has increasingly found widespread acceptance.

Data management planning is the structured way of thinking about the research data you are going to collect. What type of research data will the research project produce? What format will you use? How will you store them and how can they be accessed? By thinking about these questions at an early stage and documenting your answers you will avert future problems as a researcher.

One of the ways to think about the data collecting process is by using a format: a Data Management Plan (DMP). These formats come in a variety of shapes and sizes, depending on the research discipline, requirements from the research funder, and local initiatives.

A DMP can be a separate document. It helps the researcher identify and list the risks with regard to management of research data during the entire research process. Because not everything is known from the outset, it is recommended to treat the DMP as a 'living document', which can be revised and detailed periodically.

Research funders nowadays often require that a DMP is included in the project proposal. Typically, the research proposal either contains a data section or a separate DMP is incorporated as an annex. For research funders, the reason behind it is to promote open access to research data: in their opinion, research data produced in the context of a publicly funded research project should be freely made available for reuse and verification. Recent cases of data manipulation and fraud emphasize the importance of access to the original data.

By increasing awareness of research data management across the board—from funders, to researchers, to support staff—we can ensure that research data are handled properly, both in the present and future. More data will be available for reuse, and more data will be reused. In the end, reuse of any form will help contribute to the solutions of today's grand challenges.

More Information on Data Management Planning

  1. In the five-minute video below, Research Data Netherlands, has listed what a DMP is and what its advantages are, and an example is given of a format and a completed DMP.
    https://youtu.be/gYDb-GP1CA4
  2. The Essentials 4 Data Support is an introductory course for those who provide support to researchers in storing, managing, archiving and sharing their research data (data support staff). With this course, Research Data Netherlands aims to contribute to the professional development of, and coordination among, data support staff. The course covers the basic building blocks of the discipline and revolves around online material. Research Data Netherlands has placed the online learning materials at everyone's disposal free of charge, based on the idea of open access and sharing knowledge. You are free to take the online course anytime at your own initiative.
    http://datasupport.researchdata.nl/en/about-the-course/
  3. Research Data Management: An overview of recent developments in the Netherlands by Marjan Grootveld (DANS) and Marnix van Berchum (Huygens ING), 2017.
    https://dans.knaw.nl/en/about/organisation-and-policy/information-material?set_language=en

Thoughts on Future Trust

Wim HugoA Blog post by Wim Hugo (WDS-SC Vice-chair)

The ICSU World Data System (ICSU-WDS) and the Data Seal of Approval have recently collaborated on the alignment of their respective sets of criteria for certification as a Trusted Digital Repository, and is in process of establishing a joint certification authority—the CoreTrustSeal—to manage the certification process associated with it. This activity contributes to a significant future focus on the trust that can be placed in elements of a distributed global research infrastructure, and the increased automation of its verification. However, it is the tip of the iceberg.

The WDS Knowledge Network defines many of the components of research activity for which there is some form of trusted service or infrastructure component required: ranging from the obvious need to reliably refer to research outputs, researchers, institutions, artefacts, projects, and the like, though the more complex aspects of trusted repositories, registries, vocabulary, and ontology services, to the assigning of levels of maturity, sustainability, or quality to these.

The Knowledge Network

The trust that is required for research infrastructure to function properly is somewhat different to the trust that can be placed in the content that is curated by the research infrastructure—although one has to recognize that the two aspects are interrelated and, in some instances, inseparable. Furthermore, the trust that can be placed in content should ideally also distinguish between the significance and usability of that content, and its quality. These facets are not necessarily the same, but again are conflated to some extent in discussions about fitness-for-use, quality metrics, and the like. 

Let’s work though these distinctions at the hand of some examples.

Scholarly Publication

The main aim of a scholarly publication is to assert a claim in respect of a novel finding, and to expose that claim to peer review for the purpose of correction, as required1. One needs to distinguish the rules (criteria for trust) associated with the process of science and the value of the content. The latter is largely judged by significance, and measured—with varying degrees of usefulness—through citation indices and impact factors.

There are arguments that this stream of self-correcting progress is broken, especially in some disciplines, and this is strongly related to the criteria for trust. Such criteria are largely stated informally and implemented with varying degrees of diligence in research institutions, and are mostly delegated to peer review to determine if the result is trustworthy. Peer review purports to determine originality (not easily automated, and essentially linked to end-user value), quality (certainly possible to automate) and validity (can be partly automated).

One could—and in my view, should—argue that processes can be verified objectively and preferably automatically, and that our aim should be to certify their veracity using measurable criteria. Such validity and quality criteria could be extended to feasibility of reproduction, access to supporting datasets, and the like. References to widely used protocols and methods, standards, samples, and research pattern—increasingly linked to persistent identifiers—also increase the verifiable level of trust in the process.

Vocabulary Services

Vocabulary (name) services play an increasingly important role in research infrastructures for a variety of reasons. Firstly, vocabularies and name services are critical to the realization of the semantic web and Linked Open Data: in essence, reducing ambiguity by referring precisely to a concept, entity, relationship, and/or characteristic of either. Secondly, these services are used to enhance the experiences of users and the value of knowledge by navigating the relationships that exist among them, which is conceptually captured in the WDS Knowledge Network and is increasingly implemented, for example, in projects such as Scholix. Again, one should not confuse the acceptability of the vocabulary or service content (e.g., whether all taxonomists in the world agree that a taxon is correct), and the quality of the service provided by the infrastructure component. For the first case, there may never be agreement (especially with taxonomists!); but, for the latter, it is a relatively simple matter to determine what constitutes a well-defined, standardized vocabulary or name service, and community efforts are underway to document and define these criteria. In addition to such operational requirements, one should include the need for sustainability and continued access into the reasonable future.

Conclusion

In general, one can distinguish—for all of the elements of the WDS Knowledge Network—a clear separation between judgements about value (significance, originality, inclusiveness, consensus, etc.) and the quality of the process (sustainability, standards compliance, reproducibility, and similar concerns). And, extrapolating this into the future, I suspect that we need to get ready for the following:

  1. Significant broadening of services and infrastructure that cover all aspects of the WDS Knowledge Network, as well as a parallel rise in the need for certification of these services and infrastructure. Already, there is a perceived need for the certification of repositories of open source code and of vocabulary services, to name but two.
  2. Increased automation of the certification of processes that is in tune with an expected, rapid upturn in artificial intelligence and machine learning. This will be needed because I have no doubt that the scientific method will be increasingly automated within the next decade or so. We are already overwhelmed by volumes of data and numbers of publications, and science cannot scale any further as it is limited by human capacity.

On the basis of the above, and with science increasingly reliant on trust in a wider context, ICSU-WDS should start focussing on defining trust criteria beyond data repositories and services, and on how to automate its assessment: this being the only really scalable solution to a problem of rapidly growing scope.


1 There is a parallel focus on review and consolidation or synthesis based on existing knowledge.

From People to Pixels: Integrating Data Across the NASA DAACs

A Blog post by Lindsey M. Harriman (SGT, Inc. Contractor to USGS EROS Center/LP DAAC) and Alex de Sherbinin (WDS Scientific Committee member)

Socioeconomic and Earth Sciences researchers in search of pertinent data can now reap the benefits of a recent collaboration between two Regular Members of the ICSU World Data System.

Today, our planet supports about 7.6 billion people, with a projected increase to nearly 10 billion by 2050, and more than 11 billion by 2100. These 7.6 billion people are using land and water resources to meet their basic needs. As the population increases, their use of, and their impact on, Earth’s resources is going to change. Researchers who study the dynamics between such human–land interactions and their changes over time will look at a range of variables, such as surface temperature, vegetation health, forest cover extent, and change in land cover and habitat, as well as impacts of natural disasters, and climate trends and extremes.

Research questions that often ask about such dynamics include:

  • What is the proximity between populated areas and fire occurrences over time?
  • What is the correlation between the increase of population and land surface temperature in urban areas?
  • How has population affected land-cover change and vegetation growth over time in urban sprawl areas?
  • How will land-cover changes affect flood and drought risk around rural and urban settlements?

To answer these types of questions, researchers need to integrate census data with Earth observation data, including data collected by NASA’s Earth Science Division Operating Missions. Recently, two NASA Distributed Active Archive Centers (DAACs)—the Land Processes DAAC (LP DAAC; WDS Regular Member) and the Socioeconomic Data and Applications Center (SEDAC; WDS Regular Member)—collaborated to make that integration much easier. LP DAAC and SEDAC worked together to provide access to georeferenced population data alongside land remote sensing data in the Application for Extracting and Exploring Analysis Ready Samples (AppEEARS). SEDAC’s Gridded Population of the World version 4 (GPWv4) aggregates census data from around the world into a globally consistent grid with 30 arc-second resolution (1 kilometer at the equator) for population density and counts. Soon researchers will also have access to age and sex distribution grids. LP DAAC disseminates land remote sensing data collected by several NASA missions—including from the popular Moderate Resolution Imaging Spectroradiometer (MODIS) sensor onboard Terra and Aqua—and provides access to a selection of these datasets through AppEEARS.

AppEEARS_Blog_a_top.png

AppEEARS_Blog_a_middle.png

AppEEARS_Blog_a_bottom.png(a)

AppEEARS_Blog_b_top.png

AppEEARS_Blog_b_middle.png

AppEEARS_Blog_b_bottom.png(b)

Figure 1. Daily land surface temperature in Kelvin (K) and population trend, 2010–2017 for rural and urban points in North Carolina (based on MODIS MOD11A1 daily 1-km data and GPWv4, UN-Adjusted)
(a) Farm northwest of Nashville, North Carolina, USA. The red pin represents the location 36°N, 78°W. Image: Google Maps. Time series plots: output from AppEEARS.
(b) Suburban area of Charlotte, North Carolina, USA, experiencing rapid population growth. The red pin represents the approximate location 35°N, 81°W). Image: Google Maps. Time series plots: output from AppEEARS.

Figure 1 provides examples of time series plots of population growth and daily land surface temperature using the Point Sample function in AppEEARS. Users can interact with these visualizations within the application and also download the data values in comma separated value format.

Additionally, LP DAAC has collaborated with a third DAAC, the National Snow and Ice Data Center DAAC (NSIDC DAAC; WDS Regular Member), to provide MODIS snow-cover data from its archive for access through AppEEARS as an additional variable describing land dimension. SEDAC, LP DAAC, and NSIDC DAAC are all part of NASA’s Earth Observing System Data and Information System, and through their collaborations, AppEEARS now provides access to more than 100 data products from the three data centers in a single place, at no cost to the user. Many possible combinations of data can be extracted from AppEEARS for use in analyses of the dynamics between populations and ecosystems over time.

AppEEARS also provides benefits during the data preparation process. When performing a sample request, users drastically reduce the amount of data they ultimately need to download to perform their analysis. AppEEARS enables users to subset data based on geographic and temporal parameters, as well as by specific data layer. Since users can reformat the data and reproject within the application, the amount of post-processing required is reduced. Furthermore, AppEEARS not only provides data values, but also quality data values and their descriptions, when applicable. Lastly, users can visualize plots of the data values (point sample) or summary statistics (area samples) from the sample request within the application.

The collaboration around AppEEARS represents an initial step away from the idea that users need to download large amounts of data for local filtering, processing, integration, and analysis, and moves towards a model where analysis-ready data can be more immediately accessed. Coordinated tools and application development on the substantial holdings of all 12 DAACs is an important strategic direction for NASA’s Earth Science Data and Information System Project (WDS Network Member).

So, what’s your use case for AppEEARS?

Additional information about the DAACs mentioned above can be found here:
 – SEDAC: http://sedac.ciesin.columbia.edu/
 – LP DAAC: https://lpdaac.usgs.gov/
 – NSIDC DAAC: https://nsidc.org/daac

Have questions about AppEEARS? Email: .

Announcements

View all »

Global Glacier Change Bulletin No. 2 (2014–2015)

Global Glacier Change Bulletin No. 2 (2014–2015)

The  World Glacier Monitoring Service  (WGMS; WDS Regular Member) has announced the publication of the second issue of the Global Glacier Change Bulletin series. The full report can be downloaded in PDF-format from the WGMS website: http://wgms.ch/ggcb/ In connection with the release of the second Bulletin, Mauri Pelto (Director of the North Cascades Glacier Climate Project) has written a ...

Call for Proposals: Göttingen–CODATA Symposium on RDM in Research Institutions

Call for Proposals: Göttingen–CODATA Symposium on RDM in Research Institutions

Abstracts are invited for the Göttingen–CODATA Research Data Management (RDM) Symposium 2018 on ‘The critical role of university RDM infrastructure in transforming data to knowledge’. The symposium is a collaboration between the University of Göttingen and CODATA, and will take place on 18–20 March 2018 in Göttingen, Germany as a precursor event to the 11th Plenary Meeting of the Research ...

New SEDAC Dataset

New SEDAC Dataset

A new dataset has been released by the NASA  Socioeconomic Data and Applications Center (SEDAC; WDS Regular Member).  India Annual Winter Cropped Area, 2001–2016 consists of annual winter cropped areas for most of India (excluding the northeastern states) from 2000–2001 to 2015–2016. The data can be used in land-cover and land-use change studies, agricultural applications, and to assist with ...

WDS-related

View all »

WDC – RSER Transfers Data Holdings to WDC – Meterology, Obninsk

The All-Russia Research Institute of Hydrometeorological Information – World Data Centre (RIHMI-WDC) has announced to the  WDS Scientific Committee  (WDS-SC) that it has discontinued the existence of WDC – Rockets, Satellites and Earth Rotation (WDC – RSER) since the topics are no longer its priorities. However, the WDS-SC is extremely pleased to learn that the data holdings of WDC – RSER will ...

Integration of the Ukrainian science into the World Data System

Zgurovsky et al. in Cybernetics and Systems Analysis (Volume 46, Issue 2). Abstract: Creating the World Data Center for Geoinformatics and Sustainable Development (WDC-Ukraine), its certification and integration into the World Data System are described. The main principles of the WDC and its research priorities are considered. Main projects carried out by the WDC are reviewed. One of them is ...

Collaboration between ICSU World Data System and SCOSTEP/VarSITI

Takashi Watanabe and Rorie Edmunds in VarSITI Newsletter, Volume 3. The International Council for Science (ICSU) has a long history of collaborating internationally on the archiving and provision of scientific data. The World Data Centres (WDCs) and the Federation of Geophysical and Astrophysical Data Services were established by ICSU during the International Geophysical Year (IGY). Building ...

Press Releases

View all »