Organisation IDs for scholarly communications: where next?

On April 17, as part of FORCE 2016 in Portland Oregon, Crossref, and THOR partners DataCite and ORCID convened a workshop to discuss the current state of the art in organisation identifiers. We discussed this issue previously in a post on the ORCID blog, and we’re pleased to report back to you all that the workshop was a big success. Since then, we’ve been pulling together our notes and thoughts on the issue of organisation identifiers, and we’d like to share the headlines with you.

The community represented at the meeting agreed strongly with our conclusions that there is no solution available today that meets all the scholarly communications community’s needs. It is clear that the community needs a solution based on open data (for a community infrastructure such as this, CC-Zero is really the only appropriate license for the data). We need  a robust, high-volume API if we are to build infrastructure around organisation identifiers. This infrastructure needs to have transparent, community-led governance, and a responsive, properly resourced entity to maintain all of this.

The workshop was underpinned by a discussion document which gathered together existing work undertaken by NISO, Jisc and CASRAI, and others. These outlined the shortcomings of current approaches, and set out core requirements which any solution aiming to provide organisation identifiers for scholarly communications should address. While we acknowledge that there are commercial and community-led initiatives that offer partial solutions to the problems we face, they are focussed, naturally enough, on the needs of their sub-section of the community. For them to broaden their offer or to change their practice might not make commercial sense, or might not be possible (thanks to a lack of staff or technical infrastructure for example). That said, whatever comes next will need to work alongside these providers as a partner facing similar challenges and, with a bit of luck, sharing solutions.

What emerged from the workshop was a consensus that a detailed use-case-driven approach is a useful way to understand the core issues at work with identifying organisations, and more than this provides a good way to spot common issues. By placing these at the heart of a new organisation identifier infrastructure, we can help to create a service that will help to meet the needs of the widest possible section of our community.

We took a number of ideas away from the workshop:

  • There needs to be a collective action plan for the next three years to help to implement an organisation identifier solution for scholarly communications.
  • We need to think about the structure and governance (both for data and the parent organisation) that will best serve the community
  • We need a solid, core ID technology for both the highest level organisation and hierarchies beneath them
  • We need to define a robust, low-barrier, accessible mechanism for organisation to take ownership of their IDs, and to update them (each organisation needs to KNOW that they have an ID, to USE it, and to KEEP IT UP-TO-DATE for it to succeed).

We are starting this initiative by gathering as broad a sample as possible of the use cases you, the scholarly communications community, need organisation identifiers to address.

We invite interested members of the community to read our discussion document and to send their comments and use cases to us using our survey.

We’ll digest and analyse this information, and keep you all up to date. We’ll gather together and support the work of task groups where appropriate, to bring in expertise from the community. We’ll present reports, updates and proposals publicly to gather your feedback, and we will meet again at a co-convened persistent identifier-themed event in Reykjavik, Iceland, in the week beginning November 7. Hold the date, and watch this space!
Geoffrey Bilder, Josh Brown, and Patricia Cruse.

Knowledge hub set to soft-launch, join our webinar on January 28th!

We’re really excited to announce the upcoming soft-launch of the THOR knowledge hub. A skeleton resource at the moment, the knowledge hub will evolve into the first point of call for information about persistent identifiers.

We’d love to show you what we’ve built so far, share our plans for the hub and gather your feedback, suggestions and contributions.  We need to understand the direction people would like the hub to take and identify the pieces you’d like us to add. This resource will showcase the work we’ve done in the project, so you can suggest content or topics we should be thinking about. The hub will also point to external resources and useful tools, so if you can recommend resources that we should be pointing people to, we’ll be glad to get your input.

If you’d like to have your say and find out more, register for the webinar.  It’s at 1400 GMT / 15:00 CET on Thursday the 28th of January.

 

Differences between ORCID and DataCite Metadata

(cross posted from the Datacite blog)

One of the first tasks for DataCite in the European Commission-funded THOR project that started in June was to contribute to a comparison of the ORCID and DataCite metadata standards. Together with ORCID, CERN, the British Library and Dryad we looked at how contributors, organizations and artefacts – and the relations between them – are described in the respective metadata schemata, and how they are implemented in two example data repositories, Archaeology Data Service and Dryad Digital Repository. The focus of our work was on identifying major gaps. Our report was finished and made publicly available via http://doi.org/10.5281/zenodo.30799 last week . The key findings are summarized below:

  • Common Approach to Personal Names
  • Standardized Contributor Roles
  • Standardized Relation Types
  • Metadata for Organisations
  • Persistent Identifiers for Projects
  • Harmonization of ORCID and DataCite Metadata

Common Approach to Personal Names

While a single input field for contributor names is common, separate fields for given and family names are required for proper formatting of citations. As long as citations to scholarly content rely on properly formatted text rather than persistent identifiers, services holding bibliographic information have to support these separate fields. Further work is needed to help with the transition to separate input fields for given and famliy names, and to handle contributors that are organizations or groups of people.

Standardized Contributor Roles

The currently existing vocabularies for contributor type (DataCite) andcontributor role (ORCID) provide a high-level description, but fall short when trying to describe the author/creator contribution in more detail. Project CRediT is a multi-stakeholder initiative that has developed a common vocabulary with 14 different contributor roles, and this vocabulary can be used to provide this detail, e.g. who provided resources such as reagents or samples, who did the statistical analysis, or who contributed to the methodology of a study.

CRediT is complementary to existing contributor role vocabularies such as those by ORCID and DataCite. For contributor roles it is particularly important that the same vocabulary is used across stakeholders, so that the roles described in the data center can be forwarded first to DataCite, then to ORCID, and then also to other places such as institutional repositories.

Standardized Relation Types

Capturing relations between scholarly works such as datasets in a standardized way is important, as these relations are used for citations and thus the basis for many indicators of scholarly impact. Currently used vocabularies for relation types between scholarly works, e.g. by CrossRef and DataCite, only partly overlap. In addition we see differences in community practices, e.g. some scholars but not others reserve the term citation for links between two scholarly articles. The term data citation is sometimes used for all links from scholarly works to datasets, but other times reserved for formal citations appearing in reference lists.

Metadata for Organisations

Both ORCID and DataCite not only provide persistent identifiers for people and data, but they also collect metadata around these persistent identifiers, in particular links to other identifiers. The use of persistent identifiers for organisations lags behind the use of persistent identifiers for research outputs and people, and more work is needed.

Persistent Identifiers for Projects

Research projects are collaborative activities among contributors that may change over time. Projects have a start and end date and are often funded by a grant. The existing persistent identifier (PID) infrastructure does support artefacts, contributors and organisations, but there is no first-class PID support for projects. This creates a major gap that becomes obvious when we try to describe the relationships between funders, contributors and research outputs.

Both the ORCID and DataCite metadata support funding information, but only as direct links to contributors or research outputs, respectively. This not only makes it difficult to exchange funding information between DataCite and ORCID, but also fails to adequately model the sometimes complex relationships, e.g. when multiple funders and grants were involved in supporting a research output. We therefore not only need persistent identifiers for projects, but also infrastructure for collecting and aggregating links to contributors and artefacts.

Harmonization of ORCID and DataCite Metadata

We identified significant differences between the ORCID and DataCite metadata schema, and these differences hinder the flow of information between the two services. Several different approaches to overcome these differences are conceivable:

  1. only use a common subset, relying on linked persistent identifiers to get the full metadata
  2. harmonize the ORCID and DataCite metadata schemata
  3. common API exchange formats for metadata

The first approach is the linked open data approach, and was designed specifically for scenarios like this. One limitation is that it requires persistent identifiers for all relevant attributes (e.g. for every creator/contributor in the DataCite metadata). One major objective for THOR is therefore to increase the use of persistent identifiers, both by THOR partners, and by the community at large.

A common metadata schema between ORCID and DataCite is neither feasible nor necessarily needed. In addition, we have to also consider interoperability with other metadata standards (e.g. CASRAI, OpenAIRE, COAR), and with other artefacts, such as those having CrossRef DOIs. What is more realistic is harmonization across a limited set essential metadata.

The third approach to improve interoperability uses a common API format that includes all the metadata that need to be exchanged, but doesn’t require the metadata schema itself to change. This approach was taken by DataCite and CrossRef a few years ago to provide metadata for DOIs in a consistent way despite significant differences in the CrossRef and DataCite metadata schema. Using HTTP content negotiation, metadata are provided in a variety of formats.

The THOR Ambassador Programme is GO!

Would you like to help others enjoy the full power of persistent identifiers?  A core component of the THOR project is two-way community involvement and with that in mind we’re happy to be launching the THOR project Ambassador Programme.

THOR ambassadors will help shape the way we look at the persistent identifier landscape, influence new services, and help us understand where we are doing well and where we could do better.  We’re interested in requirements, use cases, and success stories from researchers through to funders, publishers, universities, data-centres and beyond.    We’re trying to get the message out there, encourage best practice and drive adoption.  A keen set of Ambassadors will be integral to achieving these goals.

Ambassadors will spread the word within their own communities, encourage adoption of PIDs and provide community specific training and advice.  They will help their peers understand the reasoning behind and realise the full potential of PIDs such as ORCIDs and DOIs   They could even help out with implementation where appropriate.

Ambassadors will form a community of like minds with the support and backing of the THOR project partners.  They will be provided with up to date materials to help them with their work and will be able to call on the expertise of the project members when necessary.  We’ll be organising events and workshops all over Europe, so there will be plenty of opportunities to meet the team and other ambassadors in your region.

Interested?  See what’s involved and what’s in it for you.

If you have any questions, get in touch.

Interactive API docs for ORCID

Warning: This is a technical post aimed at developers and integrators.  That said, even if you’re non technical, you’ll be able to use the docs and explore the API to see what’s possible.  They’re really easy to work with.

At first glance, most REST APIs are a bit confusing.  REST is an architectural style, not a specification and people do things differently.  You can use things like HATEOS, but the overhead is a bit much for many.  So you head over to the documentation.  You fire up a text editor.  You code a few examples to try it out.  It takes a while, you can’t remember the content types, you have no idea of the schema, you realise you’re doing OAUTH wrong, etc etc etc.  But you get there eventually.

To reduce the pain, we’ve put a swagger interface in front of the latest ORCID public and member V2 APIs (note: not the 1.x API).  Swagger presents the user with a simple interface that lets them try out the various API endpoints by simply clicking a couple of buttons.  It has example requests, responses and error codes.  It generates curl statements you can cut and paste into bash.  Put simply, it makes it easy to try out the API before writing a single line of code.

orcid_swagger_ss

If you head over there now and put your ORCID into the box your can view your public record with a click of a button.  Try it out!

This work is part of THOR outreach activities and is one of the first things we’ve done to improve documentation and drive adoption.  We’ll be utilising the swagger based docs in our upcoming workshops and bootcamps to speed up the learning process.  If you’re interested in learning how to use the API there’s a quickstart guide.  Or find Tom on twitter @tomdemeranville.

Identifiers and Linked data in the research space seminar

THOR was part of a presentation last week in Melbourne. The occasion was a seminar on Identifiers and Linked data in the research space hosted by OCLC at  Latrobe University.

As we all know, linked data is becoming a more important issue for libraries, research offices and institutions to manage. Researchers are also having to deal with greater accountability for their research and having to register for different systems to track and manage their profiles, research and publications. The seminar focussed on the role of  identifiers in the research space. A number of experts discussed the work being done to help manage identification of researchers within large databases and how they are making these systems connect to information and research management systems globally.

The presentation from Andrew Treloar looked at the way in which the Australian National Data Service is working with identifiers and taking part in international projects such as ODIN and THOR.

You can watch the presentation online (requires webex plugin)

Knowledge Exchange Workshop

Researcher Identifiers – National approaches to ORCID and ISNI implementation

Members of the THOR team spent Monday and Tuesday attending the “Researcher Identifiers – National approaches to ORCID and ISNI implementation” workshop in London. The workshop focused on person identifiers, with the bulk of discussion about the use of ORCID, CRIS systems, ISNI, federated identity and organisational identifiers. Alongside learning about the challenges people face and the ways they’ve overcome them, we presented an update on ORCID adoption, and an introduction to the THOR project. The meeting was hosted by Jisc on behalf of Knowledge Exchange.

Outcomes

We’ve taken away some valuable feedback about user experiences that we can feed into the project development programme and improved our understanding of expectations and use-cases. Different approaches towards the role of IDs in federated identity, with a great example from CSC in Finland of how they are using them,  are now being actively discussed with the national federations and PID providers. We’ve also raised awareness of THOR and clarified what we’re attempting to achieve.

There was a lot of excitement about THOR putting some of the essential building blocks of PID infrastructure in place. All in all it was a valuable meeting and we’d like to thank all the attendees for their insights.

About Knowledge Exchange

KE is a group of five organisations from around Europe that are concerned with higher education and research infrastructure.  They meet regularly to swap views and produce a shared vision of interoperability.  You can find out more about them and their mission at http://www.knowledge-exchange.info. The members are:

  • CSC – IT Center for Science in Finland
  • Denmark’s Electronic Research Library (DEFF) in Denmark
  • German Research Foundation (DFG) in Germany
  • Jisc in the United Kingdom
  • SURF in the Netherlands

Slides