Giving Credit for Data with Claiming Services

Researchers demand credit for the work that they do. While there are well established practices and services in place to give credit for traditional publications, these are sorely lacking for the full range of research artefacts, including data and software.

THOR partners have been busy developing data claiming services. The results are published in our latest report, ‘Services that Support Claiming of Datasets in Multiple Workflows’ (10.5281/zenodo.290649), where you can read about the successful implementation of claiming services in the databases and services of disciplinary repositories as well as PID infrastructures of several THOR partners.

The report summarises progress on facilitating researchers and other contributors to associate research artefacts with their ORCID record, a process known as claiming. The dataset claiming process involves creating, maintaining, and sharing information about the relationship between researchers and datasets.

We describe our experience implementing claiming workflows at five organisations, identifying some of the shared challenges as well as the unique issues each organisation faced developing and successfully deploying the claiming process into a live operational production system.

This is an important advance in enabling unambiguous attribution and credit for research.

While technical challenges remain, such as synchronisation of claims, technical capabilities have substantially improved. The human and social challenges are now coming to the fore: we must ensure that claiming services are widely adopted and used across the research communities.

Identifying Interpretation

Scientific research infrastructures collect large quantities of values. Values are typically numbers that result from observation, experiment, or computing activities. For instance, plant scientists collect values that result from observing fluxes of carbon dioxide on the leaf-atmosphere boundary; high-energy physicists collect values that result from observing collisions of atomic particles; social scientists observe the interactions of human populations and individuals, collecting values both qualitative and quantitative.

The interpretation of values is central to research investigations. With interpretation, values are given meaning in the context of investigations. The result of interpretation activities is information, and research infrastructures integrate information into existing bodies of knowledge. Therefore, research infrastructures are knowledge infrastructures or “robust networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds” (Borgman, 2015).

At the International Workshop on Reproducible Science, we presented the possibility of aggregating machine readable information in Research Objects. We have proposed to extend a Research Object Model (Belhajjame et al., 2012) with a new Resource called Interpretation. Existing Resource types include Dataset, Software, and Paper. In our proposal, machine readable interpretations are additional research artefacts created in scientific investigations. Research Objects thus capture also the interpretations given to observational, experimental, or computational values in research investigations.

Just as with other research artefacts, interpretations could be unambiguously and persistently identified in a global way. DataCite digital object identifiers (DOIs) could be used to enable unambiguous reference to interpretations, and the resolution to human and machine readable interpretation descriptions. The approach would also enable the citation of interpretations, and thus the recognition of contributions toward interpretations. Cross-linking of interpretations with the ORCID iD of contributors would enable unambiguous attribution.

Between the numerical values and the abstract high-level information reported in scientific articles, the primary information obtained by interpretation is generally refined into secondary and tertiary information. For example, primary information about individual events occurring in the environment, such as event date, location and duration, may be refined into secondary information about the seasonal mean event duration, and into tertiary information about the statistical significance in difference of seasonal mean duration. Curating such information and its provenance arguably supports the reproducibility of scientific investigations, from numerical values to the natural language text in scientific articles. Advanced knowledge infrastructure may increasingly capture, curate, and provide access to such information, in standardised and unambiguously identified form.

Belhajjame, K., et al. (2012). Workflow-Centric Research Objects: A First Class Citizen in the Scholarly Discourse. In Proceedings of the ESWC2012 Workshop on the Future of Scholarly Communication in the Semantic Web (SePublica2012), Heraklion, Greece.
Borgman, C.L. (2015). Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge, MA: MIT Press. ISBN 978-0-262-02856-1

ORCID Integration Series: PANGAEA

This is the first in a series of posts describing how THOR partners have recently integrated ORCID in their disciplinary data repositories. This post describes ORCID integration in PANGAEA, the Data Publisher for Earth & Environmental Science.

PANGAEA is rolling out a new version of its website. Developers and designers are currently ironing out a few remaining open issues. The release is expected for autumn 2016. Among major improvements in search, design, and usability, a key new feature is the integration of ORCID.

The new feature enables existing PANGAEA users to connect their PANGAEA profile with their ORCID iD, as demonstrated in the video below. 

With this connection, PANGAEA obtains the validated ORCID iD of its users from ORCID. By connecting their ORCID iD, users can also choose to sign in to PANGAEA using ORCID, as an alternative to signing in using PANGAEA user credentials. This can be handy when a user is already signed in to ORCID, or it is quicker to recall ORCID credentials.

Obtaining the validated ORCID iDs of its users is significant for PANGAEA as, contrary to a researcher’s name, the iD is unambiguous: two researchers with the same name can be distinguished by their respective iDs. The iD is also persistent through possible changes in a person’s name: the same researcher may change marital status, or their name may appear in different permutations, at times appear with full name, initials for first name, and with or without middle name (initial). Furthermore, the iD is actionable and can be used to discover information about the researcher.

For researchers, the greatest advantage of connecting their ORCID iD to their PANGAEA profile is that PANGAEA can then record the relationships between dataset publication DOIs and contributor ORCID iDs. This information is then shared with the global network of PID infrastructures, and researchers benefit from automated updates to their ORCID Record for data published at PANGAEA, gaining unambiguous attribution for published datasets and benefiting from greater credit for sharing data early.

Let’s take a look at how the ORCID integration in PANGAEA is making a difference to Dr Alice Lefebvre, GLOMAR Associate Scientist at the MARUM Center for Marine Environmental Sciences of the University of Bremen.

Alice has recently joined ORCID and decided to claim the 14 data publications deposited at PANGAEA that she has authored. As a consequence, Alice gains a more complete ORCID Record, one that does not just include her journal article publications but also her authorship in data publications a record that better reflects her true contribution to the scientific record. Alice was also surprised to learn about DataCite and the overview DataCite provides about her contributions.

The upcoming release of the PANGAEA website automates the sharing of information with the global network of PID infrastructures. Authors of datasets published at PANGAEA who have connected their ORCID iD, like Alice, will benefit from a workflow that ensures information appears automatically and accurately on their ORCID Record.

This shows how far the integration between disciplinary repositories and the global network of PID infrastructures has come over the past years, and how the persistent identification of contributors and research artefacts together with infrastructures that aggregate, process, and share information about persistently identified resources are driving and shaping 21st-century attribution, credit, communication, and measurement of scholarly activity.

Want to Know More?
Readers interested in performing an ORCID integration in their own disciplinary repository can find more information in our recent report, ‘Demonstration of Services to Integrate ORCIDs into Data Records and Database Systems.

ORCID Integration in Disciplinary Data Repositories

Researchers need to be linked to their data. Within THOR, we’ve been busy developing approaches to support the inclusion of ORCID iDs in disciplinary data repositories and data publication workflows.

The results are published in our latest report, ‘Demonstration of Services to Integrate ORCIDs into Data Records and Database Systems’ (10.5281/zenodo.58971), where you can read about the successful integration of ORCID in the databases and services of three THOR partners, each serving a distinct discipline: PANGAEA for Earth and Environmental Sciences, EMBL-EBI for Life Sciences, and CERN for High-Energy Physics.

These integrations were applied to live and operational production systems. This means that researchers in these disciplines are already benefiting from automated persistent identifier linking and linkage-information sharing within the global network of persistent identifier infrastructures.

The report describes the common experiences and challenges as well as the specific concerns each institution faced. These case studies can therefore serve as models for other institutions looking at integrating ORCID in their own systems and workflows.

As a companion to the report, over the next month PANGAEA, CERN, and EMBL-EBI will contribute to a series of posts on the THOR blog that summarise their recent advancements with ORCID integration. We will demonstrate the benefits of ORCID integration, and offer a practical guide to performing your own integrations. 

If you have any questions, please email for more information.