Highlights Workshop: Identifiers – Infrastructure, Impact and Innovation

On Thursday July 7 2016, project THOR organised the workshop: Identifiers – Infrastructure, Impact and Innovation to showcase the research and work done by all THOR partners during the project’s first year. The event in Amsterdam attracted a mixed audience of representatives from publishing companies, universities and research institutions.

After an introduction to the THOR project by Adam Farquhar (British Library), the day was divided into three sessions. The first one focused on persistent identifier linking, the next session on data publishing and the last one on THOR services. Slides of all presentations can be found on the THOR Knowledge Hub.

IMG_0654 (2)

Photo: Introduction to Project THOR and persistent identifiers

Persistent Identifier Linking

During the first session on persistent identifier linking, Martin Fenner (DataCite), Laura Rueda (DataCite) and Tom Demeranville (ORCID) explained more about challenges in linking data sets to other data sets, dynamic data and how to identify multiple versions of the same data set. The complexities involved in cross-linking databases and how to establish a fully interoperable system were discussed as well. Good quality metadata is crucial. Lack of standards and low adoption complicate matters even more. Despite these challenges, the THOR team has achieved a lot during the project’s first year. For example, THOR partners have contributed to the ORCiD auto update functionality and DataCite event data.

The ORCiD auto update functionality enables researchers to easily search and link their works via DataCite search to their ORCiD records and with DataCite’s event data it is possible to collect events, e.g. data citations in journal articles, around DataCite DOIs. These are great achievements and evidently, more research will be done by the THOR project to address the other challenges.

IMG_0664 (2)

Photo: Tom Demeranville, Martin Fenner and Laura Rueda presenting on persistent identifier linking

Data Publishing

The second session of the day focused on data publishing: Catriona MacCallum (PLOS), Michaela Torkar (F1000), Hylke Koers (Elsevier), presented on data policies in their respective publishing companies. A lot of data that is generated is not being published, because most authors only focus on article publication. A cultural change is needed as by the time a paper is submitted to a journal it is generally too late.

Martin Fenner (DataCite) agrees in his presentation that it is challenging to make the underlying data of a publication publically available and even if the data is made available it is not very accessible, for example because it is hidden in a file format like PDF. Other challenges for data-article linking are again the lack of good quality metadata and the fact that there is a wide range of data submission systems. Integrating persistent identifiers into the data publishing workflow might overcome these problems. However, globally unique identifiers should be used instead of local identifiers. Challenges for a centralized infrastructure are authentication and ownership for the data infrastructure management.


Photo: Josh Brown (ORCID) and Paul Groth introducing the publishing panel

After the presentations Paul Groth (Elsevier Labs) led a panel discussion on the challenges and opportunities of data publishing. Key questions that were discussed included: Should a publisher be responsible for data publishing? Or, are data repositories responsible for data publishing? These questions are not easily answered but all panellists agreed publishers should work together with researchers and other stakeholders to establish community standards for good quality data. The persistency of data accessibility is a stamp of approval, therefore good quality metadata and the use of persistent identifiers are crucial.

Next to these technical infrastructure requirements, it is evident a human infrastructure needs to be in place as well. Another question arose: Would authors commit to having their data accessible forever? According to the panel, incentives and a cultural change are needed for researchers to publish their data. In order to make this change and to achieve a shared infrastructure to push data publishing, more research and workshop discussions between the different stakeholders should take place. The THOR team will continue these discussions the coming months of the project.


Photo: Panel discussion on Data Publishing

THOR Services

In the final session of the day Florian Graef (EMBL-EBI), Markus Stocker (PANGAEA), Robin Dasler (CERN) and Laura Rueda (DataCite) presented on THOR Services. They gave demonstrations of ORCiD integration in data submission systems within their respective repositories in biological and medical sciences, earth and environmental sciences and high-energy physics.

The demonstrations of ORCiD integration within data set claiming services and workflows show clear advantages; see the example at EBI: there’s a wide variety of databases and maintenance of one single service is a lot easier. See the ORCiD integration within PANGAEA demonstration as well. Next steps for continued implementation of persistent identifiers within the research cycle across the different disciplines have been identified: claiming services for previously published data and alignment of identifiers.


Photo: Markus Stocker explaining more about ORCiD integration at PANGAEA

Of course, a lot more was discussed during the workshop so check out the presentations and please get in touch in case you were unable to join us and you have any questions! The coming year we will keep you up to date with further achievements of the THOR project through our blog posts and website. Thanks to everybody for their outstanding contributions to valuable discussions in Amsterdam and we welcome you at one of our next events!