Personal tools

Call for Papers for SciDataCon 2016 Extended Until 30 May!

eDS

In response to the numerous requests to have a full call for posters, we are delighted to inform you that the deadline for submitting abstracts for papers and posters to  SciDataCon 2016  has been extended until Monday, 30 May. Please note, however, that there will be no further extension of this deadline. Submit your abstract t o any of the 48 accepted sessions or to the general submission ...

WDS Data Stewardship Award 2016: Call for Nominations (Deadline Extended!)

CODATA23

The Call for Nominations for the 2016 WDS Data Stewardship Award has been extended until Monday, 30 May 2016. This annual prize is directed towards early career researchers, and the 2016 winner will be presented with their Award alongside the 2015 awardee, Dr Yaxing Wei ( ORNL DAAC ; WDS Regular Member), at SciDataCon 2016 (11–13 September 2016; Denver, Colorado). Details of the Call * , ...

Webinar #9: Big Data, Little Data, No Data – Who is in Charge of Data Quality?

On Monday, 9 May at 11:00 UTC, Christine Borgman and Andrea Scharnhorst explored the issues surrounding responsibilities for data quality in  Big Data, Little Data, No Data – Who is in Charge of Data Quality? , the 9th presentation in the WDS Webinar series . Data quality is a consideration throughout the research process. To what extent should responsibility for assuring data quality be the ...

World Data System Marks Fifth Anniversary of International Programme Office

CODATA23

A ceremony was held yesterday to mark five years of successful collaboration between the Japanese National Institute of Information and Communications Technology (NICT) and the International Council for Science (ICSU) for the hosting and supporting of the World Data System – International Programme Office (WDS-IPO). The WDS-IPO was established in April 2011 and formally inaugurated in May ...

More »

Yet Another Paradigm Shift…

Wim HugoA Blog post by Wim Hugo (WDS Scientific Committee member)

At the recently completed European Geosciences Union General Assembly 2016, I was one of the participants in a double session called "20 years of persistent identifiers – where do we go next?". Apart from reviewing the obvious elements, issues, and benefits of persistent identification—and agreeing on the success of the Research Data Alliance (RDA) Working Group on Data Citation and their excellent set of 14 guidelines for implementation—we also had a number of robust discussions; not least because Vienna was an airport too far for some of the presenters, leaving us with free time.

Firstly, most of us agreed that being able to reproduce the result of queries (and potentially other transformations or processes) applied to data or subsets of the data was the hardest of the guidelines to implement.

One can deal with this by keeping archived copies of all such query and transformation results (painless to implement, but potentially devastating from a storage provisioning perspective), or one could opt to store the query and transformation instructions themselves, with a view to reproducing the query or transformation result at some point in the future.

This second option equates to always starting with base ingredients (egg yolks, lemon juice, butter, and maybe mustard or cayenne) and to store this with a recipe (in this case for Hollandaise Sauce). This option is also painless to implement, until there is a change in the underlying database schema, code, or both—in which case one will have to (potentially almost ad infinitum) maintain backward compatibility so that historical operations continue to work, or maintain working copies of all historical releases for the purpose of reproducing a query or transformation result at some point in the future. Clearly this is not very practical.

By the way, there were some excellent ideas on how to record recipes systematically: Lesley Wyborn presented work on defining an ontology whereby queries and transformations could be documented as an automated script, and Edzer Pebesma and colleagues are conceiving an algebra for spatial operations with much the same objective in mind.

This approach, of course, requires an additional consensus: at what point do we store results as a new dataset instead of executing a potentially longer and longer list of processes on original data? There must be some value to buying Hollandaise Sauce off the shelf for our Eggs Benedict—at least some of the time.

Secondly, all of this trouble is required to achieve either one or both of two objectives: reliably finding the data referenced by a citation (via a digital object identifier or other persistent identifier), and supporting reproducibility in science. This last point was enthusiastically agreed on by most (one or two abstained, and there was one dissenter):

"Science Isn’t Science If It Isn’t Reproducible".

This assertion set me thinking about the process of reproducing results in the new world of data-intensive science, a world in which code and systems are increasingly distributed, reliant on external vocabularies, lookups, services, and libraries (that may be themselves referenced by persistent identifiers). None of these resources, which may have a significant outcome on the result of a process should they change, are under the control of the code running in my environment. Which brings us to Claerbout’s Principle:

"The scholarship does not only consist of theorems and proofs but also (and perhaps even more important) of data, computer code and a runtime environment which provides readers with the possibility to reproduce all tables and figures in an article."

Easier said than done. We can, of course (as we should in a world of formal systems engineering) insist on proper configuration control and versioning of all components, internal and external, but I am not convinced that the research community is ready for this level of maturity—typically reserved for moon rockets and defense procurement, with orders of magnitude in additional costs. Perhaps more importantly, the scientists writing code are not going to invest time and effort to document, version, and package their code to a standard that supports reproducibility. Hence, the code that we use to transform our data, whether we like it or not, will not automatically produce the same result at some unspecified point in the future, and much more so if it has external web-based dependencies (which, in turn, may also have external dependencies). There is some utility in packaging entire runtime environments (much in the way that one could persist the result of a query or transformation), but this does not solve the problem of external dependencies.

Which raises an interesting dilemma: in the world of linked open data, the semantic web, and open distributed processing, the state of the web at any point in time cannot be reproduced ever again—which may create significant issues for reproducible science if it uses any form of distributed code.

Not only that! As we rely more and more on processing enormous volumes of data by digital means, we will depend more and more on artificial intelligence, machine learning, and automated research. As the body of knowledge available to automated agents changes, so presumably, will their conclusions and inferences.

So...we need a new consensus on what science means in the era of data-intensive, increasingly automated science: our rules, notions, and paradigms will soon be outdated.

Fitting subject for an RDA Interest Group, I would think.

Some interesting additional reading:
http://www.nature.com/news/reproducibility-1.17552
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/MattShotwell/MSRetreat2013Slides.pdf

ASTER Data Made Freely Available by LP DAAC

On 1 April 2016, NASA's Land Processes Distributed Active Archive Center (LP DAAC; WDS Regular Member) began distributing ASTER Level 1 Precision Terrain Corrected Registered At-Sensor Radiance (AST_L1T) data products over the entire globe at no charge. Greater than 2.95 million scenes of archived ASTER data are now available for direct download through the LP DAAC Data Pool and for search and download through NASA's Earthdata Search Client and USGS' GloVis. New scenes will be added as they are acquired and archived.

The AST_L1T product provides a quick turn-around of consistent GIS-ready data as a multifile product, including: a Hierarchical Data Format – Earth Observing System datafile, full-resolution composite GeoTIFFs, and associated metadata files. In addition, each AST_L1T granule contains related products such as a low-resolution browse and when applicable, a Quality Assurance browse and text report.

Learn more about ASTER data and the change in policy here.

Surveying User Satisfaction: The NASA DAAC Experience

Alex de SherbininA Blog post by Alex de Sherbinin (WDS Scientific Committee member)

It is often said—disparagingly—that America’s culture is a consumer culture. Although it may be true that America’s consumerism is problematic, not least for the planet, the flip side is how consumer culture drives a service mentality in businesses and government. The old adage that “the customer is king” does motivate US government agencies and government-supported centers, including NASA’s Distributed Active Archive Centers (DAACs), to innovate and improve services in response to user feedback and evolving user needs.

Since 2004, NASA’s Earth Science Data and Information System (ESDIS) Project [WDS Network Member] has commissioned the CFI Group to conduct an annual customer satisfaction survey of users of Earth Observing System Data and Information System (EOSDIS) data and services available through the twelve DAACs. The American Customer Satisfaction Index (ACSI) is a uniform, cross-industry measure of satisfaction with goods and services available to US consumers, including both the private and public sectors. The ACSI represents an important source of information on user satisfaction and needs that feeds into DAAC operations and evolution. This may hold some lessons for WDS data services more broadly as they seek feedback from their users, and endeavor to expand their user bases and justify funding support.

The ACSI survey invitation is sent to anyone who has registered to download data from the NASA DAACs. In the past registration was ad hoc, and each DAAC had its own system. In early 2015, ESDIS began implementing a uniform user registration system called EarthData Login that requires that users establish a free account before they can access datasets. Accounts are associated with a given DAAC, but they allow access to data across all the DAACs. All those who register are sent invitations to fill out the ACSI survey. Response rates vary from a few percent among most DAACs, to as high as 38% for the Land Processes DAAC [WDS Regular Member] (which also has the highest number of respondents at just over 2,000).

In 2015, the overall EOSDIS ACSI was 77 out of 100, which is better then the overall government and National ACSI scores for 2015 (64 and 74, respectively), but lower than the National Weather Service (80). This score is based on users’ overall assessment of satisfaction with each data center based on expectations and comparison with an “ideal” data center. The ACSI model provided by the CFI Group also assesses specific “drivers” of user satisfaction—customer support, product search, product selection and order, product documentation, product quality, and data delivery—and their relative importance to the overall ACSI score. This allows the DAACs to identify areas where improvement is needed and should have the most impact on overall satisfaction.

The ACSI enables the EOSDIS to assess changes from year to year. For example, from 2014 to 2015 customer support went from 89 to 86, with drops in professionalism, technical knowledge, helpfulness in correcting a problem, and timeliness of response (all statistically significant). Many changes likely reflect the fact that the pool of survey respondents changes over time, as do their expectations, rather than actual drops in service provision. But for individual DAACs, declining scores in certain areas, in combination with free-text responses to open-ended questions, can help to flag issues that are in need of attention.

For example, the ACSI scores and free-text responses to open-ended questions helped our DAAC—the Socioeconomic Data and Applications Center (SEDAC) [WDS Regular Member]—in undertaking a major website overhaul in 2011. From a disparate set of pages with different designs, we created a coherent site with consistent navigation. The resulting site was evaluated very favorably by Blink UX, a user experience evaluation firm that reviewed all of the DAAC websites. Deficiencies in data documentation for selected datasets have also been pointed out by survey respondents, and we are now reviewing our guidelines for documentation to ensure that all datasets meet a minimum standard. Some users indicated difficulty in finding the latest dataset releases, so we are developing an email alert system for new data releases.

At the Alaska Satellite Facility (ASF) DAAC [WDS Regular Member], the ACSI results have been very helpful in getting a sense of how people are using ASF DAAC data and services. The free-text responses to questions regarding new data, services, search capabilities, and data formats are particularly informative. For example, one user suggested that it would be useful to have quick access to Synthetic Aperture Radar data for specific regions in the world for disaster response. A data feed was developed after the recent Nepal earthquake that notified users of any new Sentinel-1A data received at ASF DAAC for that specific area. This data feed quickly provided additional data for disaster responders and researchers studying this event. Data feeds are now available for several seismically active areas of the world that have been designated by the scientific community (i.e., Supersites).

Overall, the strong EOSDIS ACSI scores have been important in objectively demonstrating and documenting the continuing value of EOSDIS and the individual DAACs to the broad user community. The annual score is reported as one of NASA’s annual performance metrics, supporting NASA’s goal to provide results-driven management focused on optimizing value to the American public.

Although surveys can be costly, and the response rates low, WDS Members would do well to consider periodic surveys of users. We find that highly motivated users do respond and provide really useful suggestions, especially if they find that their responses actually lead to tangible changes in their user experience. While annual surveys may be more than is needed, surveys every 2–3 years could provide your data service with valuable feedback on its content and services. And of course, none of this should supplant other mechanisms for gathering user feedback, such as help desk software (e.g., UserVoice used by SEDAC or Kayako used by NASA’s EarthData), email, and telephone helplines. Through these multiple mechanisms, our user communities can help drive significant improvements in the services offered by WDS Members and the successful use of our valuable data by growing numbers of users.

WGMS Glacier App – Worldwide Glacier Information System To Go!

A Blog post by Nico Mölg (Glaciology and Geomorphodynamics Group, WGMS)

Which glaciers are still advancing?
How many are melting?
Which glaciers are being monitored in your country?

A new smartphone application from the World Glacier Monitoring Service (WGMS; WDS Regular Member) shows how glaciers have evolved around the globe. It provides easy and public access to glacier observation data and photographs of more than 3700 glaciers. The wgms Glacier Apprecently launched at a side-event of COP21—is based on a comprehensive research database and aims at bringing corresponding facts and figures to decision makers, to outdoor people, researchers, and anybody interested in the topic, in order to provide information and raise awareness of ongoing climatic changes. 

The wgms Glacier App shows all observed glaciers on a satellite map. Basic information is provided for each glacier, including photographs and general information on size and elevation. A text search allows users to filter the glaciers by name, country, region, and measurement type. For example, one can find out which glaciers have gained or lost ice over the past decade. A compass shows the closest observed glaciers in all directions from the user’s current position, and a 'card game' (Glacier Top Trumps) enables users to compare the best observed glaciers in the world and compete against the computer. In addition, graphs with observation data illustrate the glacier's development, along with information on local investigators, and detailed explanations of measurement types. WGMS wants to increase the visibility of the hundreds of glacier observers around the globe whose work documents the impact of climate change on glaciers. 

Jointly developed by the WGMS and Ubique – Apps and Technology, the app is available free of charge for Android and iOS in English, German, Russian, and Spanish.

More »