Persistent Identifier Services for the Humanities

THOR Disciplinary Workshop Series, part IV

As part of the THOR project, we have been monitoring persistent identifier adoption across the research landscape. Preliminary analysis suggests that the humanities lag behind other fields. To explore the reasons why, we invited a group composed mostly of historians to a workshop at the British Library.

We found that the humanities shared use cases with other disciplines, including life and physical sciences. For example, humanities researchers need to evaluate and report on the impact and influence of the outputs of their research; they need to update their institutions and funders about new publications; and maintain up-to-date research profiles online.

Some of the issues raised were also familiar: Who can provide help in choosing the right identifier system? What should PIDs be assigned to, and at what level of granularity? How can the investment in time be justified to employers and colleagues?

Other discussions revealed requirements more specific to the humanities.

Perhaps most important is that researchers in the humanities seldom use the term ‘data’ to describe the evidence that underpins their research. Like many, they tend to think of data as tabular and numerically rich; in contrast, their data may include interview transcripts, field recordings, or curated digital repositories of archival source materials. As a result, these researchers do not look to methods or services that are developed for data.

Unique physical objects are an extreme case. It can feel like a stretch to consider them as data, but they clearly benefit from precise, well-managed persistent identifiers. This is a pattern that we observe in other disciplines with physical objects, such as ice cores or fossils.

Persistent identifiers allow relations between things to be precisely specified (for example, to record that a researcher authored an article, or an article cites a dataset). These relationships are defined through controlled vocabularies; for example, an important contribution may be RightsHolder, a person or organisation that owns or manages rights in the item, or ContactPerson, a person with knowledge of how to access, troubleshoot, or otherwise field issues related to the ‘resource’

We observed recent work in the humanities that stretches the notion of contribution that connects people to articles and data. Crowd-sourced or volunteer contributions provide one such example.

At the workshop, Louise Seaward (UCL) discussed the approach taken by the Transcribe Bentham project to crowd-source transcription of unpublished manuscripts of the English philosopher Jeremy Bentham. In such projects, the controlled vocabularies that are currently in use to describe contributions do not adequately distinguish and credit the type and extent of the contribution made by researchers and volunteers.

In addition to identifying entities in the scholarly record, such as authors, articles and data, PIDs present a unique solution to describing the objects of historical research itself. This might mean identifying each individual copy of a historic text, held across different libraries and archives. Scholars study each item’s condition, its annotations, the context of the collection that it forms a part of, and the relationships and differences between different copies. New identifier types may be required to adequately meet these requirements.

The researchers that we spoke to also emphasised the importance of being able to capture metadata relating to objects of uncertain identity or authorship, such as historic texts, and historic or fictional personages. This means that identification systems should not force researchers to adopt a universal authority. Instead, they must support recording provenance of assertions around an intellectual entity as well as competing statements about it. Systems must effectively identify objects or individuals whose existence may be disputed!

Within this community, current citation practices range from citation of archival sources and creative works (as described by Jonathan Blaney, British History Online), to the bespoke entity ID systems that are now being developed as part of digital humanities projects. Faith Lawrence from King’s College London demonstrated how projects such as SNAP:DRGN have gone one step further to bring together existing identification services that link texts, places and people. SNAP:DRGN also builds a virtual authority list for ancient people by linking related data from many collaborating projects.

These examples show linking across specific projects, but do not connect digital resources via a globally interoperable infrastructure. This signals that issues of identity are important in the humanities, but not yet addressed in a standardised or interoperable way. Persistent identifiers can help here.

Ultimately we found that humanities researchers identify with the use cases for PIDs in the scholarly communications context, but have difficulty identifying with examples from other academic fields that are further along in adoption.

How do we change this? Demonstrations of the use of PIDs to automate transfer of information between systems to save researchers time filling out forms, such as reporting to funders, were seen to be a strong selling point. Realising the potential of persistent identifiers will only happen when they are seamlessly integrated into the workflows and systems of the research environment, publishing environments, and library and archival institutions. Existing implementation support for PIDs in humanities publishing systems, in particular, would provide real leverage, making it easier to link publications, funding and data and to make those connections easier to follow.

We are now seeking to have focused conversations with infrastructure providers of humanities data services to study PID adoption not only from the researcher, but also from the provider perspective.