Persistent Identifier Services for the Humanities

THOR Disciplinary Workshop Series, part IV

As part of the THOR project, we have been monitoring persistent identifier adoption across the research landscape. Preliminary analysis suggests that the humanities lag behind other fields. To explore the reasons why, we invited a group composed mostly of historians to a workshop at the British Library.

We found that the humanities shared use cases with other disciplines, including life and physical sciences. For example, humanities researchers need to evaluate and report on the impact and influence of the outputs of their research; they need to update their institutions and funders about new publications; and maintain up-to-date research profiles online.

Some of the issues raised were also familiar: Who can provide help in choosing the right identifier system? What should PIDs be assigned to, and at what level of granularity? How can the investment in time be justified to employers and colleagues?

Other discussions revealed requirements more specific to the humanities.

Perhaps most important is that researchers in the humanities seldom use the term ‘data’ to describe the evidence that underpins their research. Like many, they tend to think of data as tabular and numerically rich; in contrast, their data may include interview transcripts, field recordings, or curated digital repositories of archival source materials. As a result, these researchers do not look to methods or services that are developed for data.

Unique physical objects are an extreme case. It can feel like a stretch to consider them as data, but they clearly benefit from precise, well-managed persistent identifiers. This is a pattern that we observe in other disciplines with physical objects, such as ice cores or fossils.

Persistent identifiers allow relations between things to be precisely specified (for example, to record that a researcher authored an article, or an article cites a dataset). These relationships are defined through controlled vocabularies; for example, an important contribution may be RightsHolder, a person or organisation that owns or manages rights in the item, or ContactPerson, a person with knowledge of how to access, troubleshoot, or otherwise field issues related to the ‘resource’

We observed recent work in the humanities that stretches the notion of contribution that connects people to articles and data. Crowd-sourced or volunteer contributions provide one such example.

At the workshop, Louise Seaward (UCL) discussed the approach taken by the Transcribe Bentham project to crowd-source transcription of unpublished manuscripts of the English philosopher Jeremy Bentham. In such projects, the controlled vocabularies that are currently in use to describe contributions do not adequately distinguish and credit the type and extent of the contribution made by researchers and volunteers.

In addition to identifying entities in the scholarly record, such as authors, articles and data, PIDs present a unique solution to describing the objects of historical research itself. This might mean identifying each individual copy of a historic text, held across different libraries and archives. Scholars study each item’s condition, its annotations, the context of the collection that it forms a part of, and the relationships and differences between different copies. New identifier types may be required to adequately meet these requirements.

The researchers that we spoke to also emphasised the importance of being able to capture metadata relating to objects of uncertain identity or authorship, such as historic texts, and historic or fictional personages. This means that identification systems should not force researchers to adopt a universal authority. Instead, they must support recording provenance of assertions around an intellectual entity as well as competing statements about it. Systems must effectively identify objects or individuals whose existence may be disputed!

Within this community, current citation practices range from citation of archival sources and creative works (as described by Jonathan Blaney, British History Online), to the bespoke entity ID systems that are now being developed as part of digital humanities projects. Faith Lawrence from King’s College London demonstrated how projects such as SNAP:DRGN have gone one step further to bring together existing identification services that link texts, places and people. SNAP:DRGN also builds a virtual authority list for ancient people by linking related data from many collaborating projects.

These examples show linking across specific projects, but do not connect digital resources via a globally interoperable infrastructure. This signals that issues of identity are important in the humanities, but not yet addressed in a standardised or interoperable way. Persistent identifiers can help here.

Ultimately we found that humanities researchers identify with the use cases for PIDs in the scholarly communications context, but have difficulty identifying with examples from other academic fields that are further along in adoption.

How do we change this? Demonstrations of the use of PIDs to automate transfer of information between systems to save researchers time filling out forms, such as reporting to funders, were seen to be a strong selling point. Realising the potential of persistent identifiers will only happen when they are seamlessly integrated into the workflows and systems of the research environment, publishing environments, and library and archival institutions. Existing implementation support for PIDs in humanities publishing systems, in particular, would provide real leverage, making it easier to link publications, funding and data and to make those connections easier to follow.

We are now seeking to have focused conversations with infrastructure providers of humanities data services to study PID adoption not only from the researcher, but also from the provider perspective.

Persistent Identifier Services for the Humanities

Persistent identifiers (PIDs) are increasingly embedded in the services that researchers use every day, enabling unambiguous attribution of the full range of scholarly outputs. This makes it easier for data producers and researchers to get credit for their contributions; for data centres, universities and funders to track the impact of the research they facilitate; for publishers to incorporate data into scholarly writing; and for researchers to discover and cite data through clear provenance of information and ideas. In short, they support an entirely new research infrastructure.

Within THOR we are working to realise this vision by improving interoperability and integration of PID services, and addressing the cultural barriers to adoption. Now over a year into the project, we have found that uptake in the humanities, in particular, lags behind other disciplines. In response to this, we will be running a series of workshops through which we hope to better understand the potential for persistent identifier services in the humanities, identifying requirements for and barriers to uptake, and creating a roadmap to guide future development.

The first workshop will take place at the British Library on Friday 9 December 2016, in which we will have a focused discussion around the role of PIDs in research using historical sources – fields in which digital data has taken on an increasingly important role. The workshop is by invitation only. However, we’re especially keen to hear from humanities researchers who are working with research data products. If you’re making data available or reusing historical data and are interested in attending, please contact us at events@project-thor.eu for more information.

THOR is Hiring

The THOR Project is looking for an early-career library science specialist to work with our team at CERN on the forefront of Open Science services for the High Energy Physics community, focusing on persistent identifiers and metadata requirements.

As a large scale scientific laboratory, CERN produces research data in high volume and demands sophisticated data management and preservation efforts. Working at CERN is an opportunity to get involved in the Open Science movement from a unique disciplinary perspective, where tangible impact can be made within the community and beyond.

For more information on the THOR project, position eligibility requirements and application procedures, please take a look at the job posting. The application deadline is 20 June 2016.

Diamonds are forever. What about research data?

THOR’s Project Coordinator Adam Farquhar (British Library) recently delivered a keynote at Our Digital Future. The conference, which was held in Cambridge on 14−15 March 2016, addressed challenges in long term preservation and archiving of digital data across a wide range of disciplines. His talk, titled “Diamonds are forever. What about research data?”, looked at some of the challenges to re-using data in the future and the fracture lines in the scholarly record between articles and data. In this talk, Adam considered ways to close these gaps and identified recent technical developments that may help researchers without radically changing their workflows. As Adam proposed, the emerging THOR infrastructure promises to help researchers get appropriate credit for the additional work that they do to make data re-usable now and in the future.

The presentation is on line at: https://sms.cam.ac.uk/media/2206135