The ICSU World Data System and the Data Seal of Approval (DSA) both offer core certification standards for trustworthy digital repositories across all of the Sciences. However, whilst the catalogues of criteria and review procedures of both organizations are based on the same principles, the two standards have evolved and operated independently. Since 2012, a Working Group consisting of representatives from both organizations has explored and developed a DSA–WDS Partnership with the objectives of realizing efficiencies, simplifying assessment options, stimulating more certifications, and increasing impact on the community. Its outputs have been a set of harmonized Common Requirements for core certification of digital repositories that is drawn from the DSA and WDS catalogues of criteria, as well as Common Procedures for their implementation. The new Common Certification Standard is to be adopted by the two organizations, and integrated into their tools and systems, before the end of 2016, and a joint announcement of this is planned during International Data Week 2016 (11–17 September 2016; Denver, Colorado). In this Webinar, Ingrid Dillo (WDS-SC Vice-chair) will give a background and introduction to the Common Requirements; in particular, WDS Regular Members will transition to the new DSA–WDS Common Requirements as they renew their certifications from October.
The more value that is placed on research data as a commodity to be shared, sustained, and reused, the greater the need to assure the quality of those data. Data repositories—whether domain-specific or generic across domains—are essential gatekeepers of data sustainability. Data quality is a consideration throughout the research process. To what extent should responsibility for assuring data quality be the responsibility of the investigators; of publishers, editors, and peer reviewers; of data repositories; of data librarians or data scientists; or of later reusers of those data? Considerations for data quality vary throughout the lifecycle of data handling. These questions have neither simple nor generic answers. In this Webinar, Prof Christine Borgman (UCLA), author of 'Big Data, Little Data, No Data: Scholarship in the Networked World' (MIT Press, 2015), will explore these issues of responsibility for data quality in conversation with Dr Andrea Scharnhorst, head of the research and innovation group at DANS, an institute of the Royal Netherlands Academy of Arts and Sciences.
Volunteered Geographic Information (VGI) is becoming increasingly important in a number of scientific and development domains, and OpenStreetMap (OSM) represents the largest effort to date to harness the power of the internet for crowd-sourced spatial data generation. OpenStreetMap was launched in 2004, initially focusing on mapping the United Kingdom. At that time, Ordnance Surveys and National Mapping Agencies developed roads data sets but did not freely distribute them, and openly available roads data such as VMAP0 were of poor quality. In April 2006, the OpenStreetMap Foundation was established to encourage the growth, development and distribution of free geospatial data and provide geospatial data for anybody to use and share. OSM now has more than 2.3 million registered users, and has mapped more than 34 million km of roads in all countries. From the early days there has been a need to ensure the quality and accuracy of member contributions. In his presentation, Mikel Maron will describe the evolution of approaches to provide guidance to members and of applying QA/QC to the data set, including recent work by MapBox. In a second portion, Paola Kim-Blanco will summarize the literature on independent validation of OSM and describe CIESIN’s efforts to validate the data in low income countries.
The youth of seismology as a science, compared to the typical duration of seismic cycles, results in a relative scarcity of records of large earthquakes available for processing by modern analytical techniques, which in turn makes archived datasets of historical seismograms extremely valuable in order to enhance our understanding of the occurrence of large, destructive earthquakes. Unfortunately, the value of these datasets is not always perceived adequately by decision-making administrators, which has resulted in the destruction (or last-minute salvage) of irreplaceable datasets. We present a quick review of the nature of the datasets of seismological archives, and of specific algorithms allowing their use for the modern retrieval of the source characteristics of the relevant earthquakes. We then describe protocols for the transfer of analog datasets to digital support, including by contact-less photography when the poor physical state of the records prevents the use of mechanical scanners. Finally, we give some worldwide examples of existing collections, and of successful programs of digital archiving of these valuable datasets.
Over the past fifteen years, my perspective on tackling information interoperability problems for web-based scholarship has evolved significantly. My initial work, including OAI-PMH and OpenURL, started from a repository-centric approach. It took into account the existence of the Web, but merely piggyback on it. Starting with OAI-ORE, the approach became web-centric and started to fundamentally embrace the Architecture of the World Wide Web and related technologies. This shift is characterized by an approach that consists of first translating a problem related to scholarly information interoperability to a problem for the Web at large, next devising a Web-centric solution at that level, and then bringing it back to the scholarly Web. I have come to consider this Web-centric approach not just as a design choice but rather as an essential component for sustainable Web-based scholarly infrastructure. In this webinar, I will illustrate this shift by means of design patterns from various interoperability efforts, including Open Annotation, Memento, ResourceSync, Signposting the Scholarly Web, and Robust Links.
Scientific data management has requirements that derive from the fundamental nature of the scientific method. This includes the need for reproducibility of results and comparative analysis within and between scientific disciplines. While these are fundamental to scientific progress, there is also an evolution toward scholarly data publication that is being realized by multiple organizations around the world; some with open-access motivations and some with commercial motivation. The technologies that are employed to satisfy these requirements and achieve the goals of improving scientific progress through data-sharing and re-use have implications that should be considered in any reasoned implementation of a scientific data management strategy. Intellectual property rights, cyberinfrastructure interoperability and portability also have to be considered in light of long-term sustainability and open-access goals.
Webinar #4: Combining High Performance Computing with High Performance Data to enable Data Intensive Science (April 2015)
This webinar presented experiences and lessons learned at the National Computational Infrastructure (NCI) of the Australian National University (ANU) to manage and make major research data collections discoverable and interoperable. Use of international standards for discovery and interoperability allow complex interactions within and between the collections. Efficiently scaling and adapting data and software systems to petascale infrastructures requires the collaborative development of an architecture that is designed, programmed and operated to enable users to interactively invoke different forms of in-situ computation over complex and large scale data collections. This design facilitates a transdisciplinary approach to research and enables a shift from small scale, ‘stove-piped’ science efforts to large scale, collaborative systems science.
This free webinar was organized for ICSU World Data System Members and interested Data Repositories and Services on the Publishing Data initiatives undertaken by ICSU World Data System and Research Data Alliance. The presentations cover how data centres, publishers, and research institutes are working together to make Publishing Data an integral part of the scholarly record: establishing better workflows, defining appropriate bibliometrics, designing an infrastructure to link data and publications, and defining cost recovery and business models.
This webinar was organized primarily for members of STM and their colleagues on the new initiatives undertaken within and in close collaboration with WDS (ICSU World Data System) and RDA (Research Data Alliance). The presentations covered how publishers, data centers and research institutes are working together on Data Publishing initiatives. From establishing better workflows, to business models and including the design for an infrastructure to link data and publications wherever they are.
The Peer REview for Publication & Accreditation of Research Data in the Earth sciences (PREPARDE) project aimed to investigate and provide guidance on the policies and procedures required for the formal publication of research data, ranging from ingestion into a data repository, through to formal publication in a data journal. It also addressed key issues arising in the data publication paradigm, such as how does one peer-review a dataset, what criteria are needed for a repository to be considered objectively trustworthy, and how can datasets and journal publications be effectively cross-linked for the benefit of the wider research community.