Giving Credit for Data with Claiming Services

Researchers demand credit for the work that they do. While there are well established practices and services in place to give credit for traditional publications, these are sorely lacking for the full range of research artefacts, including data and software.

THOR partners have been busy developing data claiming services. The results are published in our latest report, ‘Services that Support Claiming of Datasets in Multiple Workflows’ (10.5281/zenodo.290649), where you can read about the successful implementation of claiming services in the databases and services of disciplinary repositories as well as PID infrastructures of several THOR partners.

The report summarises progress on facilitating researchers and other contributors to associate research artefacts with their ORCID record, a process known as claiming. The dataset claiming process involves creating, maintaining, and sharing information about the relationship between researchers and datasets.

We describe our experience implementing claiming workflows at five organisations, identifying some of the shared challenges as well as the unique issues each organisation faced developing and successfully deploying the claiming process into a live operational production system.

This is an important advance in enabling unambiguous attribution and credit for research.

While technical challenges remain, such as synchronisation of claims, technical capabilities have substantially improved. The human and social challenges are now coming to the fore: we must ensure that claiming services are widely adopted and used across the research communities.

ORCID Integrations in Environmental Research Infrastructures

A THOR-ENVRIplus Bootcamp

Are you working in a technical or leading role within an Environmental Research Infrastructure? Join us at Aalto University in Finland on March 28-29, 2017 to learn more about ORCID integrations and discuss best practices with colleagues from other Environmental Research Infrastructures.

Project THOR supports seamless integration between articles, data and researchers across the research lifecycle. ENVRIplus brings together Environmental and Earth System Research Infrastructures, projects and networks to create an interoperable cluster of Environmental Research Infrastructures across Europe. These two H2020 projects are joining forces by organising a bootcamp focused on ORCID integrations in Environmental Research Infrastructures.

The two day event offers a unique opportunity for knowledge exchange between persistent identification experts from THOR partner organisations (in particular ORCID and DataCite) and the managers, as well as developers, of Environmental Research Infrastructures, in particular ENVRIplus partners.

The bootcamp has a strong emphasis on ORCID integrations and will touch upon the specific challenges Environmental Research Infrastructures are facing in regard of such integrations. The bootcamp also focuses on the technical aspects of implementing ORCID integrations. In addition to infrastructure managers, we thus strongly encourage developers to participate as well.

On March 28, we will give an introduction to ORCID and concepts. We also demonstrate various types of integrations in systems. The second day (March 29) is structured in two separated tracks: Research Infrastructure Developer and Research Infrastructure Manager. We still encourage participants to provide us with input on bootcamp topics. You can enter suggestions when you register for the bootcamp hereThe preliminary agenda topics are included below:

Tuesday March 28

  • Introductions to ORCID and concepts
  • ORCID Integrations in THOR partner systems
  • ORCID Integrations in environmental research infrastructures
  • Challenges and opportunities at the different research infrastructures
  • Q & A and discussion: ORCID integration, Metadata, identification of co-authors, crosslinking PIDs etc.

Wednesday March 29

Parallel track 1: Hands-on exercises for RI Developer

  • Coding ORCID integrations
  • Mining ORCID data dump
  • ORCID iDs in research infrastructure metadata (e.g. SensorML)
  • PID linking and link information exchange
  • ORCID and the ENVRI Reference Model

Parallel track 2 : Discussion/presentation sessions for RI Manager

  • Crosslinking data, author, publication
  • PIDs for instruments, platforms, deployments
  • Dynamic data identification
  • Cost of integrations
  • PIDs in workflows involving research infrastructures and e-infrastructures

The THOR-ENVRIplus team is looking forward to seeing you in Finland!


Hasta la Vista, THOR Bootcamp

With local support from THOR ambassador Eva Mendez, the first edition of the THOR Bootcamp was successfully carried out in Madrid, at Universidad Carlos III de Madrid on November 16-18. The Bootcamp is part of THOR’s outreach effort to engage and train local scholarly communication communities to further adoption of PID services. The full set of slides used can be found on the THOR Knowledge Hub.

THOR colleagues from different partner organizations and guest speakers from local research organizations joined forces to present a full curriculum on PID topics, from existing tools and services to technical and policy implementation. The event attracted more than 130 registrants in total and yielded valuable experience for both the attendees and the THOR project.

The Bootcamp consisted of 3 modules to cater tailored content to different audiences. The first half-day was organized as an integral part of research training for Ph.D. students and other young researchers at UC3M, focusing on Open Science recommendations and the incorporation of PIDs in existing research workflow. Students came from different disciplinary backgrounds and brought with them distinct questions, the Bootcamp provided a great opportunity for us to engage the young researchers’ community and address their concerns directly . 

“I consider the instruments presented along with the seminar an extremely powerful way to collect, share, exploit and advertise the work of a researcher in a way which is mostly new and free from older constraints. The value of the research itself is so, enhanced and collaborations are made way easier in benefits of the results.”

— Rocco Bombardieri, Ph.D. student at UC3M

Ph.D. Students attending THOR Bootcamp at UC3M

The second day was reserved for local information professionals and research data service stakeholders (librarians, researchers, research administrators and policy makers). Their day followed an intense schedule consisting of talks and a mini panel with service implementation experience by ORCID, DataCite and CERN.

Local information professionals at THOR Bootcamp General Day at UC3M

The final half-day offered a more technical tutorial. The self-contained programming module enabled participants to build a metrics dashboard that visualized data interactively, based on the technology used in the THOR dashboard. As a hands-on session designed for non-technical and technical savvy attendees alike, it was great to see how people from a variety of technical backgrounds approached the tutorial and contributed to the ensuing discussion.

Instructors of the Hands-0n Day, Ioannis Tsanaktsidis (left) and Kristian Garza (right).

We aim to establish ties with research organizations and institutions by providing tailored PID content via the Bootcamp series — two more Bootcamps will be held in March and May next year (2017). Stay tuned to find out if we are coming to your neighborhood soon! Or better yet, if you want to organize your own Bootcamp, sign up to be an ambassador and we will provide all the materials that are ready to be reused, plus event planning tips for bringing your local community up to speed with PIDs.

THOR at PIDapalooza

If November taught us anything, it’s that open identifiers clearly do deserve their own festival. On 9th and 10th November 2016, people from all over the world gathered in Reykjavik to share PID stories, demos, use cases, victories, horror stories, and new frontiers at PIDapalooza, the first conference dedicated to PIDs. The THOR team travelled to the country of glaciers and volcanoes to talk about project identifiers, persistent identifiers for instruments, PIDagogy and measuring PID adoption.

PIDs for Projects

Martin Fenner (DataCite) and Tom Demeranville (ORCID) presented their work on project identifiers to a full house. They proposed that project IDs should be used to link participants, outputs and funding. But the most suitable identifiers to describe projects? That was left open for discussion – a discussion that quickly turned heated. What, even, is the exact definition of a project? What would persist if the project ends? Would researchers be willing to share the information needed for the project ID? How would we describe the metadata, given that a project does not have a publication date? Clearly more research needs to be done to answer these important questions. Keep an eye out for the announcement of a THOR webinar on Project identifiers, which will be held early 2017, in which we will be resuming this discussion.


Tom Demeranville leading the discussion on PIDs for projects

Persistent Identification of Instruments

Markus Stocker (PANGAEA) continued to explore new frontiers with a presentation on PIDs for instruments, instrument platforms and their deployments. Beyond enabling the unambiguous identification of these entities as well as reference to them in articles and other research artefacts, Markus suggested that metadata preservation about these entities is critical for researchers to judge the fitness of observation data for reuse. He presented two examples for systems that already assign DOIs to deployments and platforms. A key challenge for the community is to decide on the required metadata for preservation.


Twitter activity during Markus Stocker’s presentation on PIDs for instruments

The Human Perspective

Building the technical infrastructure for open research was a clear theme at the conference, but how do we move from infrastructure to adoption? How do you teach, learn, persuade, discuss and grow the uptake of PIDs in everyday research practice? My presentation showcased the contribution that the THOR ambassador network is making to the human infrastructure around PIDs. By organising training activities within their own communities and sharing training materials, THOR ambassadors are helping to overcome the cultural barriers to PID adoption. These forms of collaboration are not only critical between THOR partners and ambassadors, but need to extend to other organisations and projects in order to integrate PIDagogy within the Research Data Science Curriculum. The importance of communication was also reiterated in other sessions on PIDagogy, in which participants designed infographics to promote and explain PIDs to different stakeholder groups. These materials will be developed further and made available for the community to (re-)use.


PIDapalooza crowd developing videos, infographics and quizzes for PID adoption

Challenges of Measuring PID Adoption

Salvatore Mele (CERN) discussed the challenges of measuring PID adoption. THOR has already developed a comprehensive dashboard, which shows ORCID and DOI uptake over time. But the ways in which we evaluate and interpret the results remain open for discussion. Salvatore explained that it is difficult not to get philosophical when talking about measurement of PID uptake. What information is missing? What do we not (yet) know? And what further steps can we take to know the unknowable?


Salvatore Mele explaining the THOR Dashboard

PIDapalooza definitely generated as many questions for THOR as we brought to the table. Participating and presenting at this event was a great opportunity for the team to discuss ideas and generate more thought for further research and future collaboration, complementing the PID frontiers already being explored by other organisations. And yes, THOR definitely believes identifiers deserve their own festival and is looking forward to PIDapalooza 2017!

Identifying Interpretation

Scientific research infrastructures collect large quantities of values. Values are typically numbers that result from observation, experiment, or computing activities. For instance, plant scientists collect values that result from observing fluxes of carbon dioxide on the leaf-atmosphere boundary; high-energy physicists collect values that result from observing collisions of atomic particles; social scientists observe the interactions of human populations and individuals, collecting values both qualitative and quantitative.

The interpretation of values is central to research investigations. With interpretation, values are given meaning in the context of investigations. The result of interpretation activities is information, and research infrastructures integrate information into existing bodies of knowledge. Therefore, research infrastructures are knowledge infrastructures or “robust networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds” (Borgman, 2015).

At the International Workshop on Reproducible Science, we presented the possibility of aggregating machine readable information in Research Objects. We have proposed to extend a Research Object Model (Belhajjame et al., 2012) with a new Resource called Interpretation. Existing Resource types include Dataset, Software, and Paper. In our proposal, machine readable interpretations are additional research artefacts created in scientific investigations. Research Objects thus capture also the interpretations given to observational, experimental, or computational values in research investigations.

Just as with other research artefacts, interpretations could be unambiguously and persistently identified in a global way. DataCite digital object identifiers (DOIs) could be used to enable unambiguous reference to interpretations, and the resolution to human and machine readable interpretation descriptions. The approach would also enable the citation of interpretations, and thus the recognition of contributions toward interpretations. Cross-linking of interpretations with the ORCID iD of contributors would enable unambiguous attribution.

Between the numerical values and the abstract high-level information reported in scientific articles, the primary information obtained by interpretation is generally refined into secondary and tertiary information. For example, primary information about individual events occurring in the environment, such as event date, location and duration, may be refined into secondary information about the seasonal mean event duration, and into tertiary information about the statistical significance in difference of seasonal mean duration. Curating such information and its provenance arguably supports the reproducibility of scientific investigations, from numerical values to the natural language text in scientific articles. Advanced knowledge infrastructure may increasingly capture, curate, and provide access to such information, in standardised and unambiguously identified form.

Belhajjame, K., et al. (2012). Workflow-Centric Research Objects: A First Class Citizen in the Scholarly Discourse. In Proceedings of the ESWC2012 Workshop on the Future of Scholarly Communication in the Semantic Web (SePublica2012), Heraklion, Greece.
Borgman, C.L. (2015). Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge, MA: MIT Press. ISBN 978-0-262-02856-1

Persistent Identifier Services for the Humanities

Persistent identifiers (PIDs) are increasingly embedded in the services that researchers use every day, enabling unambiguous attribution of the full range of scholarly outputs. This makes it easier for data producers and researchers to get credit for their contributions; for data centres, universities and funders to track the impact of the research they facilitate; for publishers to incorporate data into scholarly writing; and for researchers to discover and cite data through clear provenance of information and ideas. In short, they support an entirely new research infrastructure.

Within THOR we are working to realise this vision by improving interoperability and integration of PID services, and addressing the cultural barriers to adoption. Now over a year into the project, we have found that uptake in the humanities, in particular, lags behind other disciplines. In response to this, we will be running a series of workshops through which we hope to better understand the potential for persistent identifier services in the humanities, identifying requirements for and barriers to uptake, and creating a roadmap to guide future development.

The first workshop will take place at the British Library on Friday 9 December 2016, in which we will have a focused discussion around the role of PIDs in research using historical sources – fields in which digital data has taken on an increasingly important role. The workshop is by invitation only. However, we’re especially keen to hear from humanities researchers who are working with research data products. If you’re making data available or reusing historical data and are interested in attending, please contact us at for more information.

THOR and the EC Catalogue of Services Framework

In November 2015 the “eInfrastructure” Unit at the European Commission Directorate General for Communication Networks, Content and Technology asked several e-Infrastructure providers to develop a framework for a service portfolio to describe services developed with funding from the directorate. The THOR project participated in the definition of the concepts underlying such a portfolio. The resulting framework can be found here.


One of the goals of THOR is to ensure access to the scholarly record. Research support services are an important component in the overall production of research outputs. They should be preserved, cited, credited, reused and validated just like the other pieces of the research landscape. A service portfolio can play an important role in this.

A central or distributed shared service portfolio can also:

  • assist users by
    • making services easier to discover and compare;
    • making it possible to determine the services’ relevance; and
    • identifying overlapping efforts or gaps in the catalogued service landscape. This is particularly true as the portfolio is to be linked to Key Performance Indicators (KPIs) that enable some evaluation of the services.
  • enable funding bodies and commercial providers to
    • understand needs for and availability, quality and impact of tools;
    • improve the visibility of their investments; and
    • improve their uptake.
  • assist service providers, such as THOR partners, by
    • providing a common interoperable language for our own service descriptions to be shared with others, and, in turn,
    • offering a competitive advantage by being able to showcase our products and services together with other EC-funded service providers.

Together with EGI, EUDAT, GEANT, OpenAIRE, and BlueBRIDGE, we have organised two workshops at which we presented the framework. Our workshop at the EGI annual conference in April 2016 was aimed at sharing our current practices, discussing how to harmonise them, and how they and our framework fit with the FitSM standard for IT service management. At DI4R 2016 in September, we continued the discussion by gathering current user experience and requirements for future portfolio development from different communities. This resulted in a set of recommendations to help shape future activities. The workshop at DI4R also enabled us to explore synergies with the MERIL project, which aims to develop a catalogue of openly accessible European research infrastructures (RIs) across disciplines and countries, and tools to analyse the described resources.

The Catalogue of Services framework can feed into the newly funded eInfraCentral H2020 project. eInfraCentral will develop an implementation of a common service catalogue, not just aimed at researchers, but also at industry, government, educators, and citizens; develop access and monitoring tools; and draw policy lessons.

Science today is “Open Science” − a global collaboration across institutions, borders and disciplines, underpinned by sharing scientific artefacts and resources at a scale hitherto inconceivable. Shared digital services are crucial to its success. They amount to a huge investment which must be responsibly developed. A service portfolio will be an important tool in improving the effectiveness and efficiency of service development and uptake.

THOR Ambassador Update

On October 13, we held a webinar for our ambassadors to update them on recent THOR activities. Tom Demeranville explained more about ORCID integrations at the THOR disciplinary partners, ORCID Work Identifier types and other recent technological developments, such as Datacite’s Event Data and ORCID Auto Update. A preview of what THOR is planning for the remainder of 2016 and in 2017 was given by Josh Brown. Plans include:

  • Further integration of PIDs in production services
  • Improvement of data citation
  • Continuing the research into the best solutions for missing PIDs.

The webinar slides can be found on the THOR Knowledge Hub.

We rely on our ambassadors to help facilitate and spread discussion about recent PID developments. Some of our ambassadors are very active on Twitter and increase THOR’s visibility by (re-)tweeting. Others spread the word about PIDs amongst their networks in person, promoting their benefits at conferences and workshops. And we are also organising our first bootcamp with one of our ambassadors in Spain. In return, we help you keep up-to-date with recent PID developments through email, webinars and newsletters.

It’s great to see how our ambassadors are contributing to achieving THOR’s mission: connecting people, places and things. And the number of ambassadors is still growing. Not just in Europe but on other continents as well. This week, we welcomed another ambassador in Australia. Click here, for an overview of our ambassadors. If you’d like to be part of this community, please get in touch. We’ll be organising an informal get together over lunch on Thursday 10 November at Pidapalooza. If you’ll be attending and would like to find out more about our ambassador programme, please join us!


THOR at Digital Infrastructures for Research

The last week of September 2016, several THOR partners headed to the city of churches, Krakow, to participate in the Digital Infrastructures for Research conference (DI4R). DI4R was an event organised by Europe’s leading e-infrastructures, EGI, EUDAT, GÉANT, OpenAIRE and the Research Data Alliance (RDA) Europe, in which researchers, developers and service providers brainstormed and discussed adoption of digital infrastructure services and promote user-driven innovation. Adam Farquhar (British Library), Josh Brown (ORCiD), Robin Dasler (CERN) and myself, Kristian Garza (DataCite), closed the first day of activities with a talk that emphasised that PIDs are a set of tools and systems to be integrated and promoted in infrastructures and services for researchers.

Our session was divided into short presentations that showcased how ORCiD iDs and DataCite DOIs are integrated into research systems and connected with other platforms. After that, we presented the case of CERN for PID integration which showcased how PIDs enabled linking, attribution, claiming and citation of contributors and datasets.

The session was followed by a discussion on ORCiD nationwide use cases and the need for improving metadata capturing compliance of DOIs. Finally, the DI4R audience shot the THOR panel with a provocative series of questions. For example:

    – “How should we deal with credit attribution of collections of datasets? When in some areas data collections are created by a contributor but each item in the collection has a different producer.”  

    – “Do we need PIDs for machines and instrumentation?”

    – “What about PIDs for projects?”

Certainly, some those questions need further thought and exploration by the THOR members and the community at large. Join us at Pidapalooza if you want to be part of this discussion.

Overall the THOR session at DI4R highlighted the project’s work (specifically DataCite’s Event-Data and ORCiD’s auto-update) and ended up with a good discussion about future lines of work to be developed.


THOR Bootcamp in Madrid

Bootcamp the THOR en Madrid(Spanish version below)

Want to learn more about Persistent Identifiers (PIDs) and how to harness their potential to advance your work? THOR is launching its first Bootcamp in Madrid at UC3M on 17-18 November. We will look at topics like PID service integration, research data management, and research output compliance/ impact tracking – bring your questions and let’s crack them together.

THOR is an EC funded project set out to investigate and push the interoperability of scholarly infrastructure through PIDs. By bringing together PID stakeholders from all sides – PID issuers, research organizations, data centers, and researchers – THOR leads the development of PID solutions and gains first-hand experience of local integration through early adopters.

The bootcamp aims to transfer this knowledge to the wider community – funders, publishers, librarians, tool builders, researchers, etc. – and fuel PID related strategic planning and technical integration with practical guidance. The two-day event will include both talks on the current development of PID hot topics and a hands-on tool-building component. We will take the community response to our pre-event questionnaire into consideration, and tailor the content for you – prepare to advance your PID agenda by the end of the bootcamp!

Registration to the bootcamp is open, let us know what you want to learn most about PIDs during the registration process, and see you in Madrid!

¿Quieres saber más sobre identificadores persistentes (PIDs) y cómo aprovechar su potencial? THOR lanza el primero de una serie de Bootcamps internacionales en Madrid el próximo mes de Noviembre. Indagaremos sobre identificadores, integración de sistemas, gestión de datos de investigación, conformidad con estándares y seguimiento del impacto. Ven con tus preguntas y buscaremos respuestas todos juntos.

THOR es un proyecto financiado por la Comisión Europea diseñado para investigar las necesidades de interoperabilidad y empujar nuevos desarrollos que asienten las infraestructuras académicas y de investigación a través del uso de identificadores persistentes. THOR reúne a todas las partes — proveedores de servicios, organismos de investigación, centros de datos e investigadores — para desarrollar una infraestructura sólida y compartida, que genere experiencia y que apoye a los nuevos integradores.

Este Bootcamp tiene como objetivo compartir la experiencia de THOR con una comunidad más amplia — bibliotecas, servicios de investigación, editoriales, organismos de evaluación, investigadores, alentar la planificación estratégica y alimentar el desarrollo de nuevos servicios en España, particularmente los relacionados con datos de investigación. El evento, de dos días de duración, incluirá tanto charlas informativas como discusiones y trabajo práctico. Utilizaremos las preferencias de los asistentes para dar forma a una agenda adaptada a las necesidades de la comunidad y que permita a todos obtener lo máximo posible de la sesión.

¡El registro para el primer Bootcamp the THOR en Madrid ya está abierto! Rellena el formulario y déjanos conocer tus intereses para cerrar la agenda. ¡Nos vemos en Noviembre!