Register now for THOR Final Event in Rome, 15 November 2017!

Come and join us at the Università degli Studi di Roma “La Sapienza” in Rome, Italy, on November 15th 2017 to learn about how we have advanced the state of the art in persistent identifiers. You will find out about how THOR partners have developed new tools to connect identifiers across systems, to link people, datasets, samples, reactions and more.

Keynote speeches by Fiona Murphy and Herbert van de Sompel will show the importance of persistent identifiers in the broader scholarly comms infrastructure. A mix of demos, expert talks and discussion panels will highlight THOR’s achievements, PID integrations and services.

If you would like to hear everything about the latest PID developments, join us in Rome and register here! The detailed programme will be included soon!

Keynote speakers:

Fiona Murphy is an independent research data and publishing consultant advising institutions, learned societies and commercial companies, Fiona is also an Associate Fellow at the University of Reading, a Board Member for the Dryad Data Repository and has written and presented widely on data publishing, open data and open science. She is a past and current member of several research projects including PREPARDE (Peer Review of Research Data in the Earth Sciences), data2paper (a cloud-based app for automating the data article submission process) and the Scholarly Commons Working Group (a FORCE11 project devising principles and practices for open science systems).


Herbert Van de Sompel is an Information Scientist at the Los Alamos National Laboratory where he leads the Digital Library Research & Prototyping Team. The Team does research regarding various aspects of scholarly communication in the digital age, including information infrastructure, interoperability, and digital preservation. Herbert has played a major role in creating OAI-PMH, OAI-ORE, OpenURL, the SFX linking server, the bX scholarly recommender service, info URI, Web Annotation, ResourceSync, Memento “time travel for the Web”, and, more recently, Robust Links and Signposting the Scholarly Web. He graduated in Mathematics and Computer Science at Ghent University, Belgium, and holds a Ph.D. in Communication Science (2000) from the same university.

Herbert Photo by: Elena Giglia

Putting PIDs to the test at Bootcamp Budapest

The third and final edition of the THOR Bootcamp will start in 3 weeks in the beautiful city Budapest. We gathered PID experts from THOR project partner organizations, Library and Information Centre of Hungarian Academy of Science, Centre for Social Sciences of Hungarian Academy of Sciences and the University of Debrecen, to bring a full agenda for our participants.

agenda web

There is still time to sign up for the event! Join us to learn about PID from the best in Budapest!

Save the date! THOR Final Event: 15 November 2017

The THOR project team are delighted to announce that the project final event will take place at the Università degli Studi di Roma “La Sapienza” in Rome, Italy, on November 15th 2017.

Come and join us in the Eternal City to learn about how we have advanced the state of the art in persistent identifiers. You will find out about how we have developed new tools to connect identifiers across systems, to link people, datasets, samples, reactions and more. You will see how identifiers are being used in access and authentication. You will see what we have done to drive new developments in the identifier world, and learn of our research into how identifiers are being used and by whom.

This event will be presented in partnership with our hosts and local supporters La Sapienza and Cineca, and will include keynotes from data publishing expert, open science champion and THOR ambassador Fiona Murphy and another expert in the field, as well as presentations from the THOR partners.

If you care about how identifiers can be used to make research more open, FAIR and effective, then you should not miss this conference.

More information about the speakers, programme and registration details will be announced soon!

ORCID at Leiden University

This guest blog post has been written by Mieneke van der Salm, Information Specialist at Leiden University Libraries in the Netherlands and has been cross-posted from The Connected Leiden Researcher.

On June 22, Leiden University Libraries (UBL) organised an information session on the Open Researcher and Contributor Identifier (ORCID) and its benefits and uses. The event was held to kick off the campaign to get as many Leiden researchers as possible to claim their ORCID iD and to add it to our local research information system, LUCRIS.

The speakers were Maaike Duine, THOR Training and events officer at ORCID, Matthew Buys, ORCID’s Regional Director EMEA, Peter Verhaar, the UBL’s ORCID implementation project team leader, and Mieneke van der Salm, ORCID implementation project team member and author of this blog post. We had about 18 people in attendance and what was particularly encouraging was that those present represented a broad range of academia, including staff from the Faculties of Law, Social and Behavioural Sciences, Medicine, Sciences, and Humanities.

The afternoon started off with Peter giving an introduction on ORCID and what the current possibilities are in Leiden. He explained briefly why ORCID was necessary and how it has become an increasingly standard pillar in the scholarly infrastructure. ORCID is steadily being integrated in the information flows of funders, publishers, and research institutions. Working from the credo “Enter Once, Reuse Often” ORCID can be used to link an author to their work from manuscript submission to registration in their institution’s research information system. An example of how such a work flow might look can be seen in Peter’s slide on the crossref auto-updater. Peter also explained how linking your ORCID iD to the Leiden University research information system (LUCRIS) will allow you to easily update the CRIS with newly published works through ORCID.

This last thing is also one of the main objectives of the ORCID implementation project at Leiden University. We want to stimulate authors to claim their ORCID iD and to register it in LUCRIS. Next to the fact that it makes an important contribution to the central goals of Open Science, it will also allow Leiden University to have a more complete coverage of Leiden research and publications in our CRIS. Our project has currently landed in the phase of raising awareness and together with our student assistant we hope to be able to reach out to all of our Leiden researchers and help them claim an ORCID iD and link it to the CRIS. An additional benefit to linking your ORCID id to LUCRIS is that it will allow you to login into ORCID with your institutional account, which means you don’t have to remember yet another password.

After Peter concluded his introduction, Maaike Duine explained more about persistent identifiers (PIDs) in general and the work of Project THOR in particular. THOR stands for Technical and Human Infrastructure for Open Research. It aims to make PIDs more pervasive in the research cycle and familiarise researchers with their uses. Funded under Horizon2020, THOR has numerous partners, both non-profit and commercial. One of the products they’ve created is the THOR Knowledge Hub, which gives a great overview of all available PIDs and their possible uses. They’ve also created a THOR Dashboard which shows the adoption of the various PIDs during the project’s existence, which shows the steady increase of their usage. But it also showed that growth across global regions is uneven and that not all disciplines are represented equally.

After Maaike had given us an overview of Project THOR, Matthew Buys went into more depth about ORCID, its properties and applications. He broke down the problems ORCID aims to solve and explained how. He showed how ORCID makes information more dependable through authentication, and that the system collects, displays, connects, and synchronises data across different information systems.

After Matthew’s in-depth presentation on ORCID, it was my turn to conclude the session with a discussion where I turned the tables and asked our audience a question: how would they suggest we would best be able to reach their colleagues and convince them to claim their ORCID iD if they haven’t yet? The suggestions ranged from organising lunch meetings, to visiting research groups, giving practical suggestions on how to fill your ORCID quickly with all your publications instead of having to manually enter all of them, to just being where the researchers are.

The questions that were asked of us often had to do with privacy and what happened with all the data that ORCID collected. As far as privacy is concerned, the answer is simple; it is all in your control. From your settings you can control who can view your information, who can update your record and rescind that permission at any time

All in all, it was an interesting gathering that provided us with a lot of food for thought on how to better serve our researchers. While we already offer support in the form of practical help with registration, whether via email, phone or in person, we certainly hope to be able to extend the integration of ORCID into our systems in the coming months.

If you’d like to contact us for more information you can reach us via email at

Release Note for the THOR Dashboard

Following the release of the THOR dashboard last year, conversations with internal users, stakeholders and even the project’s reviewers have highlighted additional data and features that might serve to make the dashboard more comprehensive for the wider PID community. At the same time, available resources have changed, and THOR’s own interests have evolved as the project has progressed. In light of all this, we’ve performed a series of updates to the dashboard during the second year of THOR.

A report released today outlines these updates, as well as our considerations during the process and how others can potentially benefit from the dashboard work. The most visible update is the inclusions of some basic measures from Crossref, but a number of operational improvements were made behind the scenes to improve the scalability and robustness of the dashboard. The report also highlights challenges and lessons learnt throughout the process of developing and maintaining the dashboard. These include design limitations and our experiences of designing tutorials to instruct others in creating similar services, as well as the common issue of scope creep.

It is our hope that others can follow our work on the THOR dashboard to develop other useful services for the PID community. Our experience with the dashboard indicates that there is still quite a way to go in pursuit of comprehensive PID metrics, but THOR is making big strides along this path.

The dashboard is available to view on the THOR project website.

Fostering Synergies Among Research Projects: THOR and ENVRIplus

Earlier this year, the H2020 projects THOR and ENVRIplus joined forces and organised a bootcamp focusing on integrating ORCID in environmental research infrastructures. The bootcamp was inspired by the ORCID integrations carried out by THOR partners earlier in the project.

Persistent identifiers, including ORCID iDs, are important elements of any research infrastructure. They are used to identify data, people and other entities persistently and unambiguously, such as instruments, platforms, and deployments. Most of the 20+ infrastructures represented at the bootcamp indicated that pursuing ORCID integration is urgent.

This led to great news for THOR: the topic of ORCID integrations in environmental research infrastructures and, specifically, ORCID integrations guided by the ENVRI Reference Model, was selected as a highlight in the recent ENVRIplus Mid-Term Review.

The Reference Model describes the ‘archetypical’ environmental research infrastructure from key viewpoints and serves as a common framework and specification for the description and characterisation of infrastructures. It specifies ORCID’s role in relation to research infrastructures, the required computational components and how they integrate with ORCID, and clarifies that the ORCID iD is an information object manipulated by infrastructures.  

The highlight was a direct result of the THOR-ENVRIplus bootcamp, drawing on discussions raised during the bootcamp and ongoing developments since. We think this result underscores the potential of knowledge exchange between H2020 projects. By developing PID infrastructure, THOR results offer state-of-the-art technology to research infrastructures as well as expertise on techniques and approaches. Conversely, ENVRIplus and partner infrastructures can present platforms with challenging real-world problems in which THOR, and the PID community more generally, can ground their research and development results.

We look forward to continued active exploration and exploitation of synergies between the PID community and research infrastructures, be it in earth and environmental sciences or other domains, or between THOR, ENVRIplus as well as other projects.

THOR Bootcamp in Budapest


We are happy to announce that the THOR Bootcamp tour bus is on its way to Budapest!

On September 28 and 29th, the next and final event of the series will be hosted in collaboration with our friends at the Library of the Hungarian Academy of Sciences (DataCite member and ORCID integrator), taking place on its campus by the bank of river Danube, overlooking the magnificent Széchenyi Chain Bridge.

We are excited to bring our work and expertise on interoperable research to the scholarly community in Hungary. The Bootcamp format is designed to bring a mixed audience together, provide an open forum where librarians, research funders and publishers, service providers and developers can discuss the role of PIDs in scholarly communication, consolidate PID use cases, share PID service adoption experiences and explore integration and application possibilities.

THOR aims to transfer knowledge, boost adoption and support PID related strategic planning and technical integration with practical guidance through the Bootcamp series. As usual, we will take the community response to our registration questionnaire into consideration, and tailor the content for you – we intend to have you armed to the teeth with the PID knowledge and tools you need by the end of the Bootcamp!

Sign up here,  see you there!

Persistent Identifier Services for the Humanities

THOR Disciplinary Workshop Series, part IV

As part of the THOR project, we have been monitoring persistent identifier adoption across the research landscape. Preliminary analysis suggests that the humanities lag behind other fields. To explore the reasons why, we invited a group composed mostly of historians to a workshop at the British Library.

We found that the humanities shared use cases with other disciplines, including life and physical sciences. For example, humanities researchers need to evaluate and report on the impact and influence of the outputs of their research; they need to update their institutions and funders about new publications; and maintain up-to-date research profiles online.

Some of the issues raised were also familiar: Who can provide help in choosing the right identifier system? What should PIDs be assigned to, and at what level of granularity? How can the investment in time be justified to employers and colleagues?

Other discussions revealed requirements more specific to the humanities.

Perhaps most important is that researchers in the humanities seldom use the term ‘data’ to describe the evidence that underpins their research. Like many, they tend to think of data as tabular and numerically rich; in contrast, their data may include interview transcripts, field recordings, or curated digital repositories of archival source materials. As a result, these researchers do not look to methods or services that are developed for data.

Unique physical objects are an extreme case. It can feel like a stretch to consider them as data, but they clearly benefit from precise, well-managed persistent identifiers. This is a pattern that we observe in other disciplines with physical objects, such as ice cores or fossils.

Persistent identifiers allow relations between things to be precisely specified (for example, to record that a researcher authored an article, or an article cites a dataset). These relationships are defined through controlled vocabularies; for example, an important contribution may be RightsHolder, a person or organisation that owns or manages rights in the item, or ContactPerson, a person with knowledge of how to access, troubleshoot, or otherwise field issues related to the ‘resource’

We observed recent work in the humanities that stretches the notion of contribution that connects people to articles and data. Crowd-sourced or volunteer contributions provide one such example.

At the workshop, Louise Seaward (UCL) discussed the approach taken by the Transcribe Bentham project to crowd-source transcription of unpublished manuscripts of the English philosopher Jeremy Bentham. In such projects, the controlled vocabularies that are currently in use to describe contributions do not adequately distinguish and credit the type and extent of the contribution made by researchers and volunteers.

In addition to identifying entities in the scholarly record, such as authors, articles and data, PIDs present a unique solution to describing the objects of historical research itself. This might mean identifying each individual copy of a historic text, held across different libraries and archives. Scholars study each item’s condition, its annotations, the context of the collection that it forms a part of, and the relationships and differences between different copies. New identifier types may be required to adequately meet these requirements.

The researchers that we spoke to also emphasised the importance of being able to capture metadata relating to objects of uncertain identity or authorship, such as historic texts, and historic or fictional personages. This means that identification systems should not force researchers to adopt a universal authority. Instead, they must support recording provenance of assertions around an intellectual entity as well as competing statements about it. Systems must effectively identify objects or individuals whose existence may be disputed!

Within this community, current citation practices range from citation of archival sources and creative works (as described by Jonathan Blaney, British History Online), to the bespoke entity ID systems that are now being developed as part of digital humanities projects. Faith Lawrence from King’s College London demonstrated how projects such as SNAP:DRGN have gone one step further to bring together existing identification services that link texts, places and people. SNAP:DRGN also builds a virtual authority list for ancient people by linking related data from many collaborating projects.

These examples show linking across specific projects, but do not connect digital resources via a globally interoperable infrastructure. This signals that issues of identity are important in the humanities, but not yet addressed in a standardised or interoperable way. Persistent identifiers can help here.

Ultimately we found that humanities researchers identify with the use cases for PIDs in the scholarly communications context, but have difficulty identifying with examples from other academic fields that are further along in adoption.

How do we change this? Demonstrations of the use of PIDs to automate transfer of information between systems to save researchers time filling out forms, such as reporting to funders, were seen to be a strong selling point. Realising the potential of persistent identifiers will only happen when they are seamlessly integrated into the workflows and systems of the research environment, publishing environments, and library and archival institutions. Existing implementation support for PIDs in humanities publishing systems, in particular, would provide real leverage, making it easier to link publications, funding and data and to make those connections easier to follow.

We are now seeking to have focused conversations with infrastructure providers of humanities data services to study PID adoption not only from the researcher, but also from the provider perspective.

Credit for Data with ORCID: Raising Awareness in Earth and Environmental Sciences

THOR Disciplinary Workshop Series, part III

On June 12, PANGAEA held a short workshop on ORCID and its integration in PANGAEA at MARUM, the Center for Marine Environmental Science at the University of Bremen. As a Data Publisher in Earth and Environmental Sciences, PANGAEA operates an Open Access library aimed at archiving, publishing and distributing georeferenced data from earth system research. Enabled by the THOR Project, PANGAEA has integrated ORCID in 2016. Users can link their PANGAEA account to their ORCID iD and effectively claim their datasets published with PANGAEA to their ORCID records.

While the ORCID integration in PANGAEA is completed, raising awareness and supporting the linking of ORCID iDs in PANGAEA is a continued effort. The DFG funded pilot ORCID DE is fostering ORCID adoption in Germany. As an ORCID DE member, the University of Bremen is officially encouraging researchers to obtain and use ORCID iDs. Within the University of Bremen, PANGAEA is actively raising awareness about ORCID and supporting its adoption, in particular at MARUM and among its researchers who publish data with PANGAEA.

The short workshop we held on June 12 consisted of a lecture that introduced ORCID and its integration in PANGAEA. The lecture was followed by a live demo of the integration, roughly aligned with our video on the integration. Finally, we provided laptops for on-site ORCID registration and iD linking in PANGAEA. Among the participants, about 40% already had an ORCID iD and only about half of them had already linked their iD with their PANGAEA account. None of the participants raised reservations against getting an iD or linking it in PANGAEA. Getting more researchers to link their iD in PANGAEA thus seems to be a matter of raising awareness about the possibility and explaining why it matters.

PANGAEA is doing fairly well with its ORCID statistics. Since the start of monitoring the trends in mid October 2016, the number of users with account linked to ORCID has increased by 7.4% and is currently at 9%. During the same period, the number of datasets with at least one author linked to ORCID has increased by 3.6%, and recently surpassed 20%. Naturally, we aim at datasets where all authors are unambiguous but, as the numbers show, this is much harder: 7.9% of datasets have all authors linked to ORCID, up 0.8% since October 2016.

Overall, these numbers are encouraging but there is a long way to go for a majority of datasets with unambiguous authors. This shows that continued efforts aimed at raising awareness among researchers in Earth and Environmental Sciences are important.

Data Claims to ORCID: an EMBL-EBI Perspective

THOR Disciplinary Workshop Series, part II

The European Bioinformatics Institute (EMBL-EBI) is a centre for research and services in bioinformatics. It performs basic research in computational biology and offers an extensive user training programme, supporting researchers in academia and industry.

EMBL-EBI is part of EMBL, Europe’s flagship laboratory for the life sciences, and houses a diverse range of data repositories. All of these databases link to each other and to the scientific literature, providing a deeply integrated ecosystem that reflects the natural connectivity of the life sciences. Each repository uses custom identifier systems, known in bioinformatics as accession numbers, most of which predate the use of DOIs for data in some cases by decades and are instantly recognisable to the researchers that use them.

Existing Use of ORCID at EMBL-EBI

EMBL-EBI has been proactive in promoting ORCID iDs within the organisation. In 2013, we introduced the requirement that all staff register with, and that they allow Europe PMC to update their ORCID records with their publications. The ORCID account quickly became the central place where staff tracked their publication record. In return, EMBL-EBI made sure that this information then flowed to EMBL-EBI staff pages automatically.

During the THOR project, the successful ORCID integration in Europe PMC was followed by the development of the EBI ORCID Hub, which enables EBI databases to add ORCID authentication and data claiming to ORCID to their user interfaces. The Hub has since provided a solid technical foundation for persuading those who manage the many diverse databases housed by EMBL-EBI to adopt ORCID iDs within their systems.

The technical challenge of associating a public database record with an ORCID iD has instead centred more around user interaction. This, too, was simplified by the EBI ORCID Hub, which provides standardised claiming and reporting APIs and javascript libraries, and a centralised infrastructure for storing associations between ORCID iDs and data records. As a result, various prototype services for data claiming to ORCID are now being developed across EMBL-EBI, such as the ‘Claim this study to ORCID’ button within the MetaboLights database:

Ongoing Engagement with EMBL-EBI Databases

We are now consulting with the data resources hosted at EMBL-EBI to find out more about their requirements for adopting ‘data claiming to ORCID’. Currently, this focuses more on technical assistance and problem resolution in the implementation of claiming functionality. But we still have a long way to go to convince EMBL-EBI resources and their users about the long-term value of linking data to ORCID iDs.

We need a critical mass of data claims to ORCID across multiple resources that are linked to other information, such as affiliations and funders. This semantic network of relationships could then be made available to both EMBL-EBI resources and external users for searching, along with visualisation interfaces that showcase the potential of such a network for funders, institutes and individual researchers to answer important questions about the research they produce or enable.

We are now developing a prototype retrospective batch claiming interface, which allows a given researcher to claim all their datasets identified in a search, such as in the image below, via one button click.

We plan to use EMBL-EBI’s BioStudies database as a test bed for trying out interesting search scenarios based on data collated during the THOR project. BioStudies holds descriptions of biological studies, links to data from these studies in other databases (be it within EMBL-EBI or externally), as well as data that do not fit in the structured archives at EMBL-EBI. The repository enables manuscript authors to submit supplementary information and link to it from the publication. It is this link to the publication that allows BioStudies to retrieve the funder information from Europe PMC, and in turn link it with the datasets associated with that publication.

Drawing on that connectivity, we plan to implement a faceted search by funder in BioStudies that will enable the user to find all data sets associated with a Europe PMC publication that has been tagged with that funder. Once a sufficiently large number of data claims to ORCID have been accumulated, we plan to allow the BioStudies user to search its data by ORCID iD, returning a results page faceted by funder.

Finally, with enough data-claims data available, we also plan to generate a number of Tableau visualisations offering captivating insights into the correlations between researchers, datasets (and potentially metadata), funders and possibly timelines. Our hope is that the prototypes and visualisations will act as catalysts for generation of new analytics scenarios from our users that will stimulate future PID-related development work at EMBL-EBI.


Outstanding Challenges – Managing Complexity…

To date, we have not attempted to solve a number of semantic challenges in the interpretation of data claims to ORCID. For example, does a data claim imply more than just a participative role in the generation of the data? Should specific roles be distinguished by parameters such as the original researcher, the data submitter or the subsequent data curator? Some EMBL-EBI resources accept experimental data submissions described by metadata that is curated and harmonised using ontology mapping tools, such as ArrayExpress, whereas submissions to other resources, such as Rfam are themselves results of curation, making role distinctions even more complicated.

Furthermore, databases such as the European Nucleotide Archive (ENA), PRIDE PRoteomics IDEntifications (PRIDE) and UniProt may contain secondary data protein identifications, translated reads, genome references that are derived from existing, finer-grained original data, such as sequencing reads or peptides, that may have been submitted previously and independently. A claim to ORCID with regard to the derived data may be made, but can the claim to the original data be legitimately extended to the derived data? The submitters of the original data may also have a legitimate reason to question the ethics of extending the derived data claim to the original data. In such cases, the difficulty of transferring claims from papers to data lies in recognising where those sometimes subtle and often database-specific timelines and inter-dependencies of data exist.

We are excited about the work at EMBL-EBI that will yield tangible results towards the end of THOR, but recognise our prospecting for insight riches made available by data claims to ORCID is only just beginning.