Persistent Identifier Services for the Humanities

THOR Disciplinary Workshop Series, part IV

As part of the THOR project, we have been monitoring persistent identifier adoption across the research landscape. Preliminary analysis suggests that the humanities lag behind other fields. To explore the reasons why, we invited a group composed mostly of historians to a workshop at the British Library.

We found that the humanities shared use cases with other disciplines, including life and physical sciences. For example, humanities researchers need to evaluate and report on the impact and influence of the outputs of their research; they need to update their institutions and funders about new publications; and maintain up-to-date research profiles online.

Some of the issues raised were also familiar: Who can provide help in choosing the right identifier system? What should PIDs be assigned to, and at what level of granularity? How can the investment in time be justified to employers and colleagues?

Other discussions revealed requirements more specific to the humanities.

Perhaps most important is that researchers in the humanities seldom use the term ‘data’ to describe the evidence that underpins their research. Like many, they tend to think of data as tabular and numerically rich; in contrast, their data may include interview transcripts, field recordings, or curated digital repositories of archival source materials. As a result, these researchers do not look to methods or services that are developed for data.

Unique physical objects are an extreme case. It can feel like a stretch to consider them as data, but they clearly benefit from precise, well-managed persistent identifiers. This is a pattern that we observe in other disciplines with physical objects, such as ice cores or fossils.

Persistent identifiers allow relations between things to be precisely specified (for example, to record that a researcher authored an article, or an article cites a dataset). These relationships are defined through controlled vocabularies; for example, an important contribution may be RightsHolder, a person or organisation that owns or manages rights in the item, or ContactPerson, a person with knowledge of how to access, troubleshoot, or otherwise field issues related to the ‘resource’

We observed recent work in the humanities that stretches the notion of contribution that connects people to articles and data. Crowd-sourced or volunteer contributions provide one such example.

At the workshop, Louise Seaward (UCL) discussed the approach taken by the Transcribe Bentham project to crowd-source transcription of unpublished manuscripts of the English philosopher Jeremy Bentham. In such projects, the controlled vocabularies that are currently in use to describe contributions do not adequately distinguish and credit the type and extent of the contribution made by researchers and volunteers.

In addition to identifying entities in the scholarly record, such as authors, articles and data, PIDs present a unique solution to describing the objects of historical research itself. This might mean identifying each individual copy of a historic text, held across different libraries and archives. Scholars study each item’s condition, its annotations, the context of the collection that it forms a part of, and the relationships and differences between different copies. New identifier types may be required to adequately meet these requirements.

The researchers that we spoke to also emphasised the importance of being able to capture metadata relating to objects of uncertain identity or authorship, such as historic texts, and historic or fictional personages. This means that identification systems should not force researchers to adopt a universal authority. Instead, they must support recording provenance of assertions around an intellectual entity as well as competing statements about it. Systems must effectively identify objects or individuals whose existence may be disputed!

Within this community, current citation practices range from citation of archival sources and creative works (as described by Jonathan Blaney, British History Online), to the bespoke entity ID systems that are now being developed as part of digital humanities projects. Faith Lawrence from King’s College London demonstrated how projects such as SNAP:DRGN have gone one step further to bring together existing identification services that link texts, places and people. SNAP:DRGN also builds a virtual authority list for ancient people by linking related data from many collaborating projects.

These examples show linking across specific projects, but do not connect digital resources via a globally interoperable infrastructure. This signals that issues of identity are important in the humanities, but not yet addressed in a standardised or interoperable way. Persistent identifiers can help here.

Ultimately we found that humanities researchers identify with the use cases for PIDs in the scholarly communications context, but have difficulty identifying with examples from other academic fields that are further along in adoption.

How do we change this? Demonstrations of the use of PIDs to automate transfer of information between systems to save researchers time filling out forms, such as reporting to funders, were seen to be a strong selling point. Realising the potential of persistent identifiers will only happen when they are seamlessly integrated into the workflows and systems of the research environment, publishing environments, and library and archival institutions. Existing implementation support for PIDs in humanities publishing systems, in particular, would provide real leverage, making it easier to link publications, funding and data and to make those connections easier to follow.

We are now seeking to have focused conversations with infrastructure providers of humanities data services to study PID adoption not only from the researcher, but also from the provider perspective.

Credit for Data with ORCID: Raising Awareness in Earth and Environmental Sciences

THOR Disciplinary Workshop Series, part III

On June 12, PANGAEA held a short workshop on ORCID and its integration in PANGAEA at MARUM, the Center for Marine Environmental Science at the University of Bremen. As a Data Publisher in Earth and Environmental Sciences, PANGAEA operates an Open Access library aimed at archiving, publishing and distributing georeferenced data from earth system research. Enabled by the THOR Project, PANGAEA has integrated ORCID in 2016. Users can link their PANGAEA account to their ORCID iD and effectively claim their datasets published with PANGAEA to their ORCID records.

While the ORCID integration in PANGAEA is completed, raising awareness and supporting the linking of ORCID iDs in PANGAEA is a continued effort. The DFG funded pilot ORCID DE is fostering ORCID adoption in Germany. As an ORCID DE member, the University of Bremen is officially encouraging researchers to obtain and use ORCID iDs. Within the University of Bremen, PANGAEA is actively raising awareness about ORCID and supporting its adoption, in particular at MARUM and among its researchers who publish data with PANGAEA.

The short workshop we held on June 12 consisted of a lecture that introduced ORCID and its integration in PANGAEA. The lecture was followed by a live demo of the integration, roughly aligned with our video on the integration. Finally, we provided laptops for on-site ORCID registration and iD linking in PANGAEA. Among the participants, about 40% already had an ORCID iD and only about half of them had already linked their iD with their PANGAEA account. None of the participants raised reservations against getting an iD or linking it in PANGAEA. Getting more researchers to link their iD in PANGAEA thus seems to be a matter of raising awareness about the possibility and explaining why it matters.

PANGAEA is doing fairly well with its ORCID statistics. Since the start of monitoring the trends in mid October 2016, the number of users with account linked to ORCID has increased by 7.4% and is currently at 9%. During the same period, the number of datasets with at least one author linked to ORCID has increased by 3.6%, and recently surpassed 20%. Naturally, we aim at datasets where all authors are unambiguous but, as the numbers show, this is much harder: 7.9% of datasets have all authors linked to ORCID, up 0.8% since October 2016.

Overall, these numbers are encouraging but there is a long way to go for a majority of datasets with unambiguous authors. This shows that continued efforts aimed at raising awareness among researchers in Earth and Environmental Sciences are important.

Data Claims to ORCID: an EMBL-EBI Perspective

THOR Disciplinary Workshop Series, part II

The European Bioinformatics Institute (EMBL-EBI) is a centre for research and services in bioinformatics. It performs basic research in computational biology and offers an extensive user training programme, supporting researchers in academia and industry.

EMBL-EBI is part of EMBL, Europe’s flagship laboratory for the life sciences, and houses a diverse range of data repositories. All of these databases link to each other and to the scientific literature, providing a deeply integrated ecosystem that reflects the natural connectivity of the life sciences. Each repository uses custom identifier systems, known in bioinformatics as accession numbers, most of which predate the use of DOIs for data in some cases by decades and are instantly recognisable to the researchers that use them.

Existing Use of ORCID at EMBL-EBI

EMBL-EBI has been proactive in promoting ORCID iDs within the organisation. In 2013, we introduced the requirement that all staff register with orcid.org, and that they allow Europe PMC to update their ORCID records with their publications. The ORCID account quickly became the central place where staff tracked their publication record. In return, EMBL-EBI made sure that this information then flowed to EMBL-EBI staff pages automatically.

During the THOR project, the successful ORCID integration in Europe PMC was followed by the development of the EBI ORCID Hub, which enables EBI databases to add ORCID authentication and data claiming to ORCID to their user interfaces. The Hub has since provided a solid technical foundation for persuading those who manage the many diverse databases housed by EMBL-EBI to adopt ORCID iDs within their systems.

The technical challenge of associating a public database record with an ORCID iD has instead centred more around user interaction. This, too, was simplified by the EBI ORCID Hub, which provides standardised claiming and reporting APIs and javascript libraries, and a centralised infrastructure for storing associations between ORCID iDs and data records. As a result, various prototype services for data claiming to ORCID are now being developed across EMBL-EBI, such as the ‘Claim this study to ORCID’ button within the MetaboLights database:

Ongoing Engagement with EMBL-EBI Databases

We are now consulting with the data resources hosted at EMBL-EBI to find out more about their requirements for adopting ‘data claiming to ORCID’. Currently, this focuses more on technical assistance and problem resolution in the implementation of claiming functionality. But we still have a long way to go to convince EMBL-EBI resources and their users about the long-term value of linking data to ORCID iDs.

We need a critical mass of data claims to ORCID across multiple resources that are linked to other information, such as affiliations and funders. This semantic network of relationships could then be made available to both EMBL-EBI resources and external users for searching, along with visualisation interfaces that showcase the potential of such a network for funders, institutes and individual researchers to answer important questions about the research they produce or enable.

We are now developing a prototype retrospective batch claiming interface, which allows a given researcher to claim all their datasets identified in a search, such as in the image below, via one button click.

We plan to use EMBL-EBI’s BioStudies database as a test bed for trying out interesting search scenarios based on data collated during the THOR project. BioStudies holds descriptions of biological studies, links to data from these studies in other databases (be it within EMBL-EBI or externally), as well as data that do not fit in the structured archives at EMBL-EBI. The repository enables manuscript authors to submit supplementary information and link to it from the publication. It is this link to the publication that allows BioStudies to retrieve the funder information from Europe PMC, and in turn link it with the datasets associated with that publication.

Drawing on that connectivity, we plan to implement a faceted search by funder in BioStudies that will enable the user to find all data sets associated with a Europe PMC publication that has been tagged with that funder. Once a sufficiently large number of data claims to ORCID have been accumulated, we plan to allow the BioStudies user to search its data by ORCID iD, returning a results page faceted by funder.

Finally, with enough data-claims data available, we also plan to generate a number of Tableau visualisations offering captivating insights into the correlations between researchers, datasets (and potentially metadata), funders and possibly timelines. Our hope is that the prototypes and visualisations will act as catalysts for generation of new analytics scenarios from our users that will stimulate future PID-related development work at EMBL-EBI.

 

Outstanding Challenges – Managing Complexity…

To date, we have not attempted to solve a number of semantic challenges in the interpretation of data claims to ORCID. For example, does a data claim imply more than just a participative role in the generation of the data? Should specific roles be distinguished by parameters such as the original researcher, the data submitter or the subsequent data curator? Some EMBL-EBI resources accept experimental data submissions described by metadata that is curated and harmonised using ontology mapping tools, such as ArrayExpress, whereas submissions to other resources, such as Rfam are themselves results of curation, making role distinctions even more complicated.

Furthermore, databases such as the European Nucleotide Archive (ENA), PRIDE PRoteomics IDEntifications (PRIDE) and UniProt may contain secondary data protein identifications, translated reads, genome references that are derived from existing, finer-grained original data, such as sequencing reads or peptides, that may have been submitted previously and independently. A claim to ORCID with regard to the derived data may be made, but can the claim to the original data be legitimately extended to the derived data? The submitters of the original data may also have a legitimate reason to question the ethics of extending the derived data claim to the original data. In such cases, the difficulty of transferring claims from papers to data lies in recognising where those sometimes subtle and often database-specific timelines and inter-dependencies of data exist.

We are excited about the work at EMBL-EBI that will yield tangible results towards the end of THOR, but recognise our prospecting for insight riches made available by data claims to ORCID is only just beginning.

High-Energy Physics: how special are we?

THOR Disciplinary Workshop Series, part I

The High-Energy Physics (HEP) community is one out of four disciplinary communities that are in the focus of THOR. When it comes to addressing specific challenges within scholarly communication and Open Science in these communities, disciplinary workshops have proven to be a very effective tool to agree on community specific actions. By bringing together the key stakeholders in the community, these workshops can ensure that all relevant stakeholders (i.e. researchers, service providers, publishers, editors, repository providers, librarians, data centres, and indexing services) can discuss and work on actual solutions and concrete actions to address some of the challenges they face.  

The AAHEP workshop series brings together information providers from the High-Energy Physics, Astronomy and Astrophysics communities every 1.5 years. This year, the THOR project (Artemis Lavasa and Sünje Dallmeier-Tiessen, CERN) organised the 9th iteration of the workshop, which took place on the 4th and 5th of May 2017 in Hamburg and was hosted by the DESY laboratory.

Martin Fenner (DataCite) set the scene with a keynote about linking articles, data and software, the role of persistent identifiers, their future, and how they could help a community to improve services and practices.

To facilitate discussions, four working sessions were organised based on the participants’ interests: Data and Software (Publishing/Linking), Measuring and Visualizing Impact, ORCID (Integration and uptake), Role of Proceedings (including conference IDs). Each session was then further split into various breakout groups to allow in-depth discussions about specific topics of interest.

The working sessions highlighted some high-level challenges and possible solutions:

Data and Software

Challenges:

  • Data availability statements are an important tool to express access and links to data and code underlying the findings presented in paper, but are not yet included.
  • Data citation: give credit where credit is due, make it human and machine readable.
  • Data deposition processes and workflows are unclear for researchers.

What can we do?

  • Publishers of journals in the field to get together to discuss what works for HEP and Astrophysics and whether data availability statements should be made mandatory or recommended.
  • Review existing data availability sections to assess what “kind” works for this community.
  • Review progress on data and software citation: share lessons learnt with the community about what works and what does not.
  • Identify major conferences to organise joint events to educate the community on  data and software sharing and citation.

Role of Proceedings

Challenges:

  • Ineffective proceeding publishing workflows – cumbersome, expensive, with quality issues. But authors and contributors need the reward of such a publication/contribution.

What can we do?

  • Raise community awareness around challenges, such as quality of conference metadata and duplicate submissions.
  • Drive the community towards modern publishing outlets to share slides (including DOI and slide citation).
  • Index “new publishing outlets” in community platforms like INSPIRE (HEP information platform) and ADS (Astrophysics Data System) to give contributors credit.

Metrics

Challenges:

  • The widespread application of “impact factor”. We need to make the picture more comprehensive.

What can we do?

  • Investigate visibility of data/software on INSPIRE/ADS to “seed” data citation.
  • Share best practices: what altmetrics indicators could work for which community?

ORCID adoption and integration

Challenges:

  • Adoption is on the rise, but little information is available in community “hubs” about benefits or incentives around ORCID.
  • Information exchange across platforms needs to improve to help researchers put ORCID into good and efficient use.

What can we do?

  • Improve the benefit from collected ORCID iDs by connecting them: three concrete projects have been identified to address this.

This AAHEP workshop shows more work needs to be done to address the above-mentioned challenges within the High-Energy Physics, Astronomy and Astrophysics Community. Some concrete ideas and projects have been started through the workshop, in particular with regard to data citation. Other initiatives include the projects mentioned above and the organisation of a follow-up workshop.

THOR Webinar: PID use across regions and disciplines


Date: June 20, 2017
Time: 16.00-17.00 CET
Register here.


THOR’s goal is to connect researchers, data and articles through persistent identifier services. Who is already using these services? And where? What are the factors that affect uptake? If we have better information about gaps and successes in PID adoption, we will be able to create better services. That’s why THOR partners have been conducting a comprehensive study into the disciplinary coverage and geographic distribution of ORCID iDs and DataCite DOIs.

This webinar will give you insight into the challenges of measuring PID adoption and describe the uptake of ORCID iDs and DOIs across disciplines and regions. How did we decide to measure PID uptake across disciplines? Which disciplines are under- or overrepresented? In which regions has PID uptake increased over the past years? And what do these metrics tell us?

Join us on June 20 to find out! 

The webinar will be recorded and made available on the THOR YouTube channel.

 

 

Interim business plan for sustaining the THOR federated PID infrastructure and services

THOR’s approach to sustainability relies on our partners taking on the THOR outputs as part of their regular business activities. This means that these outputs will need to be folded into their operations and will need to be sustained by their regular business models. To facilitate this process and to provide some food for thought for other PID service providers, THOR’s sustainability team was tasked to give consideration to existing business models and our own plans for sustainability.

While we will be considering these things throughout the project, as the first half of the project wound down, our initial progress was collected into an interim business planning document. This first-stage effort outlines our approach to sustainability and discusses the factors that can influence the sustainability of PID programs and services generally. In this document we also raised several open questions that we thought were central to the issue of long term sustainability for PID services. As we progress through the project, we will develop answers to these open questions and present them as part of the final business planning document, to be released at the end of the project.

As always, our reports are archived in Zenodo and available on the THOR website.

Getting the message to academics: promoting ORCID on a university campus

This is a guest blog post by THOR ambassador Patricia Herterich. Patricia is the Research Repository Advisor at the University of Birmingham Library and a PhD candidate at the Berlin School of Library and Information Science. As part of her role at the University of Birmingham, she provides training and advice on research data management and co-ordinates the development of institutional repository services.

When I started my position as Research Repository Advisor at the University of Birmingham last year, signing up as THOR ambassador was one of the first things I did. With a background in library and information science I love persistent identifiers (PIDs), the UK ORCID consortium had just launched and I had a campus full of researchers to work with, two ePrints based repositories and a Current Research Information System (CRIS) on my hands. Perfect environment to promote the awesomeness of linking persistent identifiers and getting some easy wins you’d think, but well… not quite.

The ORCID integration in our CRIS (we use PURE) was not fully enabled (you could sign up for an ORCID iD through the CRIS, but not push information to the ORCID record); resource to develop the institutional repository existed more on paper than in reality and the institutional policy encouraging our researchers to see the benefit of signing up for an ORCID iD had only minimal practical support.

So, what to do?

First of all, join a community! Attending UK ORCID consortium meetings made me realise that we were not the only institution struggling to promote the advantages of linking PIDs when enabling features are lacking in our systems. As our CRIS implementation was enabled to push information to the ORCID records, and more and more journals were requiring ORCID iDs upon manuscript submission, it was time to get the word about ORCID’s benefits out to our researchers.

Tweet ORCID swag (2)

Equipped with some goodies from ORCID, we scheduled three one hour lunchtime sessions across campus throughout March 2017 (unfortunately, we just missed out on pi day…). The sessions provided an overview of the advantages of signing up for an ORCID iD, all the systems that use it so far, followed by demonstrations on how to link the ORCID record to a profile in PURE, exporting publications to ORCID, and other features available to curate one’s ORCID record. Afterwards, staff stayed on to help academics with creating and linking their ORCID records through PURE.

The sessions were attended by about 40 people in total: administrators who will support their researchers in getting their ORCID records curated; librarians who will include ORCID in upcoming training and consultation sessions, and researchers – from PhD students just starting their career to senior academics. As of late May 2017, we have 177 ORCID iDs registered in PURE, 41 of them pushed information to their ORCID records following our sessions: a total of 1328 works were connected to ORCID records and thus linked to one of their authors for the first time! (And this is only the public information as some of our researchers restricted visibility of the information pushed to ORCID to trusted parties only.)

What’s next?

We plan to keep on improving PID integrations in our systems through:

  • Engaging with our CRIS provider for further integrations via the UK PURE User group,
  • Becoming a DataCite member via the British Library which will allow us to assign DOIs to datasets, working papers, and dissertations produced by our researchers,
  • Introducing new features to our ePrints repositories that allow us to collect ORCID iDs and link them to DOIs for our material upon minting ,
  • Continue communicating the benefits of PIDs to our academics through embedding information and raising awareness in existing training sessions,
  • Engaging with the UK community via the UK ORCID consortium.

As we’re just getting started with joining the dots on campus, it will be crucial for us to keep an eye on THOR results and outputs and learn from the project’s work whenever we can.

Reuse of Research Data: The Shape of Things to Come

Open Science needs incentives. Tracking data and software citation can be one of them.

To facilitate that, THOR strives to improve the usage of persistent identifiers for all scholarly objects. Tracking citations has proven difficult since there has been little reuse so far, but a first step in that direction can be improving the visibility and findability of data and software.

The CERN Open Data portal publishes data and code, and assigns DOIs to them. The boundary conditions for that publishing process are a bit “special”, as the data and code can be fairly complex and big (e.g. the last big release was about 320TB). Recently, this activity reached an important milestone with the first independent research paper based on the opened data, ‘Exposing the QCD Splitting Function with CMS Open Data‘. The publication of this paper highlights the potential of sharing data publicly,  and the benefits of enabling reproducible research.

The article reference list includes the recommended data citation along with the DOI, so that community portals can track the connections and impact of the data and software. Hint: check reference 72!

It is very important to create more milestones like this: share data and code, encourage reuse, citation and attribution of credit for the hard work that goes into data production and software development. We are only at the beginning of an interesting path.

PIDs in Poland: let’s link research!

The ongoing drive within the THOR project to identify and connect the research landscape reached Warsaw on Monday 24 April. Organised in collaboration with Crossref, the workshop focused on the ways in which persistent identifier (PID) services, such as those provided by members of the THOR consortium and Crossref, can represent ‘much more than infrastructure’ by ‘working together to connect research’. Hosted by the Digital Humanities Centre at the prestigious Institute of Literary Research of the Polish Academy of Sciences, the workshop brought together a packed audience of publishers, data managers, researchers, librarians and administrators for a day of knowledge-sharing and discussion centred on increasing access to research output.

Professor Łukasz Szumowski (Under Secretary of State with the Ministry of Science and Higher Education) opened the event with a recognition of just how quickly the digital world is changing. He stressed the need for developing new mechanisms in bibliometrics to enable objective evaluation that can guide public funding of research.

Professor Paweł Rowiński also extended a welcome as Vice President of the Polish Academy of Sciences, home to 69 institutions spanning a multitude of disciplines. Introducing a thread that ran throughout the day, Rowiński highlighted the fact that persistent identifiers not only make research more accessible, they can provide an incentive for scientists to share data, safe in the knowledge that their achievements will be more visible and attributed to them.

In the morning sessions, Rachael Lammey (Crossref), Ginny Hendricks (Crossref), Josh Brown (ORCID), Laura Rueda (DataCite), Rachael Kotarski (British Library) and I (Ade Deane-Pratt, ORCID EU) gave an overview of the persistent identifier landscape and the services that are being developed to support them as this landscape evolves. Crossref recently gained 26 new members from Poland alone, making it one of their fastest growing countries.

Some common themes emerged from the presentations and discussion: achieving persistence is a process, one that involves constant evaluation and adaptation. The challenges can be highly domain specific, and a number of questions also remain unresolved.

But permissions and privacy are key. Services such as EThOS, the British Library’s repository of doctoral theses, can make it easier to track career paths, but at the same time throw up the challenge of claims for legacy theses.

And more broadly: is it possible to enact a cultural shift away from the citation of physical objects to their digital representations? When is it appropriate to do so?

The afternoon sessions saw some interesting case studies from Polish industry and academia, and some robust discussion, with contributions from Dr Eng. Jakub Koperwas (Warsaw University of Technology), Marcin Werla (Poznań Supercomputing and Networking Centre) and Dr Marta Hoffman-Sommer (Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, RepOD Repository for Open Data, OpenAIRE NOAD for Poland).

We heard about the effort to build from scratch a university knowledge base with clear and consistent metadata that semantically links the full spectrum of academic activity, encompassing conception and funding, the research process, publications, implementations, practical applications, patents and results. The motivation was that it should be possible, for example, as a researcher, manager, funder, administrator or librarian, to interrogate the system to find experts in a field. We also heard about the work and challenges involved in providing an infrastructure − the PIONIER Network in this case − to support research via PID uptake. An outstanding question is how to prevent the duplication of DOIs assigned to the same object.

During the discussion that closed the day, we heard from both panel and audience on what the future of research communication should look like. With the event coming hot on the heels of the deadline for contributions to a new Polish national research evaluation exercise, the topics of making research communication more effective, and capturing and sharing information were naturally of real significance to the room. One thing was abundantly clear: persistent identifiers are integral to that future.

This timely meeting was just one strand of ongoing work to improve scholarly infrastructure and make the research landscape fit for 21st-century purpose. You can download slides from the day here: https://zenodo.org/communities/2017-24-04-warsawmeeting/?page=1&size=20. And you can keep abreast of future events at our website.

Webinar Series: Persistent Identifiers: What, Why and How?

Please join the THOR Project for a series of three webinars focusing on applications of persistent identifiers (PIDs). The first webinar will explain what PIDs are and why they are important. The second will dive a bit deeper, giving more insight into how to use PIDs and what services can be built on top of identifier systems. The series will end with an introduction to the PID (s)election guide: how to determine the most appropriate identifier for your needs.

Following these three webinars you will be fully up-to-date on PID systems; all webinars will offer enough time for Q&A. You can register here:

  • Webinar 1: Overview of PID systems, Jonathan Clark, International DOI Foundation
  • Webinar 2: Persistent identifiers: current features and future properties, Juha Hakala, National Library of Finland
  • Webinar 3: Persistent Identifier (s)election guide, Marcel Ras, Netherlands Coalition for Digital Preservation (NCDD) and Remco van Veenendaal, Dutch National Archives

All three webinars will be recorded and made available on the THOR YouTube channel. Full descriptions of the webinars can be found below. 


Webinar 1: Overview of PID systems

Presenter: Jonathan Clark
Date and time: 18 May, 16.00-17.00 CET
Register here


Are you intrigued, interested or simply a bit confused by persistent identifiers and would like to know more? Then this introductory level webinar is for you! The webinar will be especially interesting if you are working with digital archives and digital collections. We hope you will get a clear understanding of what persistent identifiers are, why they are important and how trustworthy they are. We will also discuss how you can determine the most appropriate identifier for your needs. There will be plenty of time to ask questions. Note that this will not be a deeply technical webinar.

Topics that will be covered include:

  • What are persistent identifiers?
  • The case for PIDs – knowing what’s what and who’s who
  • The data architecture of PIDs
  • What is social infrastructure and why is it important?
  • Review of current identifier systems
  • How to choose a PID System
  • Case studies in documents, data, video

Jonathan Clark is the Managing Agent for the International DOI Foundation (IDF) which is a not-for-profit membership organisation that manages the DOI (Digital Object Identifier) and is the registration authority for the ISO 26324 standard. Jonathan also works as an independent advisor on strategy and innovation. Prior to this he was at Elsevier for 20 years in various positions in publishing, marketing and technology. He holds a BSc and PhD in Chemical Engineering from the University of Newcastle-upon-Tyne. He lives in the Netherlands and when not working he can most often be found refereeing rugby matches.


Webinar 2: Persistent identifiers: current features and future properties

Presenter: Juha Hakala
Date and time: 1 June, 15.00 – 16.00 CET
Register here.


You should attend this webinar if you know what persistent identifiers are but are interested in knowing much more about what you can actually do with them. In other words, what are the services that are being built on top of identifier systems that could be useful to the digital preservation community? We will cover topics such as party identification, interoperability and (metadata) services such as multiple resolution. Following on from that, we will explain more about the next generation of resolvers and work on extensions, such as specification of the URN r-component semantics.

Juha Hakala is a senior advisor at the National Library of Finland. After obtaining a degree in library and information science from Tampere University, he has held various positions in the National Library since 1987. From the beginning he has concentrated on library automation, including standardisation. His involvement with persistent identifiers started more than twenty years ago, when URN syntax was established in the Internet Engineering Task Force (IETF). He is closely involved with the revision of URN syntax and various other URN-related on-going efforts in IETF, and the maintenance of standard identifiers of ISO TC 46.


Webinar 3: Persistent Identifier (s)election guide

Presenters: Marcel Ras and Remco van Veenendaal
Date and time: 13 June, 16.00 – 17.00 CET
Register here


You should attend this webinar if you want to learn about how to choose the most suitable identifier system for your needs, and how to implement persistent identifiers in your own organisation.

Cultural heritage organisations, regardless of size, are often hesitant to implement PIDs. They lack knowledge of what PIDs are, don’t know about their capabilities and benefits, and fear a possible complex and costly implementation process as well as the maintenance costs for a sustained service. The Digital Heritage Network and the Dutch Coalition on Digital Preservation addresses these issues in three ways:

  1. By raising awareness of (the importance of) PIDs in cultural heritage organisations.
  2. By increasing the knowledge regarding the use of PIDs within cultural heritage.
  3. By supporting the technical implementation of PIDs in cultural heritage collection management systems. How we did this on a nationwide scale will be explained in the webinar.

There are multiple PID systems. But which system is most suited to your situation: Archival Resource Keys (ARKs), Digital Object Identifiers (DOIs), Handle, OpenURL, Persistent Uniform Resource Locators (PURLs) or Uniform Resource Names (URNs)? Each system has its own particular properties, strengths and weaknesses. The PID Guide from the Digital Heritage Network’s Persistent Identifier project helps you learn and think about important PID subjects, and guides your first steps towards selecting a PID system.

Marcel Ras is the Program Manager for the Netherlands Coalition for Digital Preservation (NCDD). The NCDD was established in 2008 to promote national collaboration to ensure the long-term availability of digital information in The Netherlands. Marcel is NCDD’s Program Manager since January 2014 but has some years of experience in digital preservation. He started his digital preservation career at the National Library of The Netherlands (Koninklijke Bibliotheek), where he set up a web archiving program. From 2007 to 2011 Marcel was manager of the e-Depot department at the KB and responsible for acquisition, ingest and long term storage of digital publications in the library. As program manager for the International e-Depot, he was responsible for the development of the international e-journals archiving program of the KB in 2011.

Remco van Veenendaal is a Preservation Advisor for the Dutch National Archives (Nationaal Archief) in The Hague. He contributes to the (digital) preservation policies of the Nationaal Archief, and to the development and implementation of the e-Depot. In March 2015 he established Veenentaal. Veenentaal advises on applications on the intersection of language and computers, and how those applications can improve organisations’ opportunities. Remco has acquired more than fifteen years’ experience on the intersection of language and computers. Before joining the Dutch National Archives he was project manager of the Flemish-Dutch Human Language Technology Agency (HLT Agency or TST-Centrale), a repository for digital Dutch language resources.