Dynamic Data Citation Webinar

This blog post by Martin Fenner has been cross-posted from the DataCite blog.

On July 12, 2016, DataCite invited Andreas Rauber to present the recommendations for dynamic data citation of the RDA Data Citation Working Group in a webinar.

dynamic-data.png

Andreas is one of the co-chairs of the RDA working group, and he gave a throughout overview of the recommendations, and the thinking that went into them. The final recommendations are available since last fall, and the current focus of the working group is to help with implementations.

The recommendations have to be implemented in the data center, but DataCite is happy to help coordinate the work, and to provide feedback to Andreas and the rest of the working group where needed. Of particular importance from a DataCite perspective is recommendation 8:

Query PID: Assign a new PID to the query if either the query is new or if the result set returned from an earlier identical query is different due to changes in the data. Otherwise, return the existing PID.

Assigning a persistent identifier (not only) when a dataset is originally generated, but also when a dataset is about to be cited, is central not only to the working group recommendations for dynamic data citation, but also crucial for other data citation use cases. Data exist at different levels, from raw data possibly generated by a machine, to highly processed data used in a publication. The figure below – presented by Robin Dasler from CERN at the THOR Workshop  on July 7 in Amsterdam – demostrates this for high-energy physics (HEP):

hep.png

DataCite DOIs are intended as citation identifiers. They are persistent identifiers and provide standardized metadata, including links to associated publications, contributors and funders. They thus focus on the data in the top section of the pyramid. While we can also use DataCite DOIs for the other levels of the pyramid, sometimes other identifiers are more appropriate for raw, non-persistent data generated my machines. Dynamic data citation can be seen as a variant of the process that this pyramid describes.

If you could not attend last week or you want to review the session, the recording of the webinar is available:

The THOR project will work with interested data centers on dynamic data citation in the coming 12 months, hopefully leading to important feedback and a few more implementations of the RDA working group recommendations. Please contact us if you work for a data center and are interested in participating.

Highlights Workshop: Identifiers – Infrastructure, Impact and Innovation

On Thursday July 7 2016, project THOR organised the workshop: Identifiers – Infrastructure, Impact and Innovation to showcase the research and work done by all THOR partners during the project’s first year. The event in Amsterdam attracted a mixed audience of representatives from publishing companies, universities and research institutions.

After an introduction to the THOR project by Adam Farquhar (British Library), the day was divided into three sessions. The first one focused on persistent identifier linking, the next session on data publishing and the last one on THOR services. Slides of all presentations can be found on the THOR Knowledge Hub.

IMG_0654 (2)

Photo: Introduction to Project THOR and persistent identifiers

Persistent Identifier Linking

During the first session on persistent identifier linking, Martin Fenner (DataCite), Laura Rueda (DataCite) and Tom Demeranville (ORCID) explained more about challenges in linking data sets to other data sets, dynamic data and how to identify multiple versions of the same data set. The complexities involved in cross-linking databases and how to establish a fully interoperable system were discussed as well. Good quality metadata is crucial. Lack of standards and low adoption complicate matters even more. Despite these challenges, the THOR team has achieved a lot during the project’s first year. For example, THOR partners have contributed to the ORCiD auto update functionality and DataCite event data.

The ORCiD auto update functionality enables researchers to easily search and link their works via DataCite search to their ORCiD records and with DataCite’s event data it is possible to collect events, e.g. data citations in journal articles, around DataCite DOIs. These are great achievements and evidently, more research will be done by the THOR project to address the other challenges.

IMG_0664 (2)

Photo: Tom Demeranville, Martin Fenner and Laura Rueda presenting on persistent identifier linking

Data Publishing

The second session of the day focused on data publishing: Catriona MacCallum (PLOS), Michaela Torkar (F1000), Hylke Koers (Elsevier), presented on data policies in their respective publishing companies. A lot of data that is generated is not being published, because most authors only focus on article publication. A cultural change is needed as by the time a paper is submitted to a journal it is generally too late.

Martin Fenner (DataCite) agrees in his presentation that it is challenging to make the underlying data of a publication publically available and even if the data is made available it is not very accessible, for example because it is hidden in a file format like PDF. Other challenges for data-article linking are again the lack of good quality metadata and the fact that there is a wide range of data submission systems. Integrating persistent identifiers into the data publishing workflow might overcome these problems. However, globally unique identifiers should be used instead of local identifiers. Challenges for a centralized infrastructure are authentication and ownership for the data infrastructure management.

photo

Photo: Josh Brown (ORCID) and Paul Groth introducing the publishing panel

After the presentations Paul Groth (Elsevier Labs) led a panel discussion on the challenges and opportunities of data publishing. Key questions that were discussed included: Should a publisher be responsible for data publishing? Or, are data repositories responsible for data publishing? These questions are not easily answered but all panellists agreed publishers should work together with researchers and other stakeholders to establish community standards for good quality data. The persistency of data accessibility is a stamp of approval, therefore good quality metadata and the use of persistent identifiers are crucial.

Next to these technical infrastructure requirements, it is evident a human infrastructure needs to be in place as well. Another question arose: Would authors commit to having their data accessible forever? According to the panel, incentives and a cultural change are needed for researchers to publish their data. In order to make this change and to achieve a shared infrastructure to push data publishing, more research and workshop discussions between the different stakeholders should take place. The THOR team will continue these discussions the coming months of the project.

IMG_0679

Photo: Panel discussion on Data Publishing

THOR Services

In the final session of the day Florian Graef (EMBL-EBI), Markus Stocker (PANGAEA), Robin Dasler (CERN) and Laura Rueda (DataCite) presented on THOR Services. They gave demonstrations of ORCiD integration in data submission systems within their respective repositories in biological and medical sciences, earth and environmental sciences and high-energy physics.

The demonstrations of ORCiD integration within data set claiming services and workflows show clear advantages; see the example at EBI: there’s a wide variety of databases and maintenance of one single service is a lot easier. See the ORCiD integration within PANGAEA demonstration as well. Next steps for continued implementation of persistent identifiers within the research cycle across the different disciplines have been identified: claiming services for previously published data and alignment of identifiers.

IMG_0688

Photo: Markus Stocker explaining more about ORCiD integration at PANGAEA

Of course, a lot more was discussed during the workshop so check out the presentations and please get in touch in case you were unable to join us and you have any questions! The coming year we will keep you up to date with further achievements of the THOR project through our blog posts and website. Thanks to everybody for their outstanding contributions to valuable discussions in Amsterdam and we welcome you at one of our next events!

July 7, 2016 THOR Workshop: Identifiers – Infrastructure, Impact and Innovation

Would you like to learn more about persistent identifiers, data-article linking, integration services, dynamic data citation, and much more? And how these interoperable research services will lead to opportunities for innovation and foster open science?

Join us on July 7, 2016 for our Workshop: Identifiers – Infrastructure, Impact and Innovation in Amsterdam!

Since its start in July 2015, Project THOR -Technical and Human infrastructure for Open Research – has been working towards seamless integration between articles, data, and researchers across the research lifecycle, and a lot has been accomplished. During this one-day workshop, the THOR team will demonstrate tangible outputs of the past year, how these achievements are benefiting the research community, and where we are going next.

Through demonstrations and expert talks, the THOR project partners British Library, ORCID, DataCite, Elsevier Labs, PLoS, EBI, CERN, and PANGAEA will showcase the research and concrete work done by the different disciplines. Challenges and opportunities for linking persistent identifiers as ORCID iDs and DOIs will be discussed by showing concrete examples from pilot communities (High-Energy Physics, Biological and Medical Sciences, Geoscientific and Environmental Sciences). Progress on DataCite DOI and ORCID iD auto-update, data publishing workflows, dynamic data identification and citation will be presented during a mixed programme of presentations and discussions.

The workshop is kindly hosted by Elsevier Labs, and is a great opportunity for research organisations, data scientists and publishers to get in touch with experts from different disciplines, share ideas and state-of-the-art practice and help to shape exciting new developments.

Register here!

The preliminary program:

9.00 Registration
9.30 Welcome to the day and Introduction to Project THOR

Adam Farquhar (The British Library)

10.00 THOR Research, persistent identifiers linking, ORCID auto update, presentations and Q&A

Martin Fenner (DataCite), Laura Rueda (DataCite), Tom Demeranville (ORCID EU)

11.00 Coffee break
11.30 Data Publishing Workflows, presentations followed by panel discussion

Paul Groth (Elsevier Labs), Catriona MacCallum (PLOS), Michaela Torkar (F1000), Martin Fenner (DataCite)

13.00 Lunch
14.00 THOR Services, presentations, demonstrations and Q&A

Markus Stocker (PANGAEA), Florian Graef (EMBL-EBI), Robin Dasler (CERN)

15.30 Concluding remarks and networking

Register here soon as we have limited spaces available! We hope to see you on July 7 in Amsterdam! Follow us on Twitter for updated announcements regarding the workshop. If you have any questions regarding this workshop, please get in touch!

EU Projects Collaboration Meeting – Events and Communications

A range of European and global initiatives is under way to support research practices and sharing in the digital era. To best serve diverse research communities with e-infrastructures and overarching services, Project THOR, in partnership with AARC and OpenAIRE, has organised a one-day meeting on Thursday June 2nd, at DANS in the Hague, to coordinate collaboration with other EU Horizon 2020 projects to identify co-working and cooperation opportunities. The meeting focused on the communication and outreach activities of the projects, and the following projects/organisations were represented: LEARN, INDIGO DataCloud, EGI, EDISON, EuroCRIS, FOSTER, READ, EUDAT, OpenDreamKit, PRACE, OpenMinTed, and LIBER.

It was a great opportunity to get so many representatives of EU H2020 projects altogether in one room and to discuss how to multiply the impact of all project efforts. Of course, all these different projects have different missions and objectives. For example, project THOR focuses on using persistent identifiers to enable an interoperable research infrastructure, OpenMinTed focuses on text and data mining, OpenDreamKit delivers an open digital research environment toolkit for the advancement of mathematics, and there are many more differences. However, all these projects contribute to the advancement of science by further developing and promoting wider use of research infrastructures throughout Europe and globally, and the target audiences for most of these projects overlap.

Some of the projects are already joining forces and organising webinars, workshops and conferences together. During the meeting all projects indicated their planned workshops and events on a timeline, this gave a clear picture for collaboration, and for identifying any gaps, for example if some of the research communities are underserved.

IMG_0606 (2)

Photo: Project representatives adding their workshops, webinars and events to the timeline.

Some of the upcoming events in June and July you don’t want to miss:

Next to events planning, other topics were discussed. For example the use of social media to promote events, the evaluation of events, sustainability of the projects, and how to deliver good quality trainings for all different research disciplines. Next to this, all projects will share other projects’ resources to make sure available training materrials are more widely disseminated and accessible for everybody. Some of the projects have a wide range of online training materials available, see for example the PRACE Training Portal and the EUDAT Training Programme. Keep an eye on all our websites for more training materials!

At the Digital Infrastructures for Research Conference in Krakow, the EU H2020 projects will continue discussions on co-organising events to best serve the entire research community and analyse any gaps. During this meeting, we will also discuss where the projects can support each other on a technical level. We will focus on research collaboration opportunities to make existing infrastructures better connected and interoperable and discuss possibilities of integrating services. By adding functionalities, we will strengthen what’s already there to further advance European research and innovation.

IMG_1694-1

Photo: Project representatives of the various EU H2020 projects.

If you have any suggestions or questions, please get in touch during one of the events or contact us online. Keep an eye on our websites and social media accounts for announcements of more interesting workshops and events!

 

 

 

Organisation IDs for scholarly communications: where next?

On April 17, as part of FORCE 2016 in Portland Oregon, Crossref, and THOR partners DataCite and ORCID convened a workshop to discuss the current state of the art in organisation identifiers. We discussed this issue previously in a post on the ORCID blog, and we’re pleased to report back to you all that the workshop was a big success. Since then, we’ve been pulling together our notes and thoughts on the issue of organisation identifiers, and we’d like to share the headlines with you.

The community represented at the meeting agreed strongly with our conclusions that there is no solution available today that meets all the scholarly communications community’s needs. It is clear that the community needs a solution based on open data (for a community infrastructure such as this, CC-Zero is really the only appropriate license for the data). We need  a robust, high-volume API if we are to build infrastructure around organisation identifiers. This infrastructure needs to have transparent, community-led governance, and a responsive, properly resourced entity to maintain all of this.

The workshop was underpinned by a discussion document which gathered together existing work undertaken by NISO, Jisc and CASRAI, and others. These outlined the shortcomings of current approaches, and set out core requirements which any solution aiming to provide organisation identifiers for scholarly communications should address. While we acknowledge that there are commercial and community-led initiatives that offer partial solutions to the problems we face, they are focussed, naturally enough, on the needs of their sub-section of the community. For them to broaden their offer or to change their practice might not make commercial sense, or might not be possible (thanks to a lack of staff or technical infrastructure for example). That said, whatever comes next will need to work alongside these providers as a partner facing similar challenges and, with a bit of luck, sharing solutions.

What emerged from the workshop was a consensus that a detailed use-case-driven approach is a useful way to understand the core issues at work with identifying organisations, and more than this provides a good way to spot common issues. By placing these at the heart of a new organisation identifier infrastructure, we can help to create a service that will help to meet the needs of the widest possible section of our community.

We took a number of ideas away from the workshop:

  • There needs to be a collective action plan for the next three years to help to implement an organisation identifier solution for scholarly communications.
  • We need to think about the structure and governance (both for data and the parent organisation) that will best serve the community
  • We need a solid, core ID technology for both the highest level organisation and hierarchies beneath them
  • We need to define a robust, low-barrier, accessible mechanism for organisation to take ownership of their IDs, and to update them (each organisation needs to KNOW that they have an ID, to USE it, and to KEEP IT UP-TO-DATE for it to succeed).

We are starting this initiative by gathering as broad a sample as possible of the use cases you, the scholarly communications community, need organisation identifiers to address.

We invite interested members of the community to read our discussion document and to send their comments and use cases to us using our survey.

We’ll digest and analyse this information, and keep you all up to date. We’ll gather together and support the work of task groups where appropriate, to bring in expertise from the community. We’ll present reports, updates and proposals publicly to gather your feedback, and we will meet again at a co-convened persistent identifier-themed event in Reykjavik, Iceland, in the week beginning November 7. Hold the date, and watch this space!
Geoffrey Bilder, Josh Brown, and Patricia Cruse.

THOR is Hiring

The THOR Project is looking for an early-career library science specialist to work with our team at CERN on the forefront of Open Science services for the High Energy Physics community, focusing on persistent identifiers and metadata requirements.

As a large scale scientific laboratory, CERN produces research data in high volume and demands sophisticated data management and preservation efforts. Working at CERN is an opportunity to get involved in the Open Science movement from a unique disciplinary perspective, where tangible impact can be made within the community and beyond.

For more information on the THOR project, position eligibility requirements and application procedures, please take a look at the job posting. The application deadline is 20 June 2016.

Growing number of THOR ambassadors! Join us!

Key to achieving THOR’s mission is building capacity to increase the use of persistent identifiers. To establish a sustainable e-infrastructure, we need to make sure we are reaching as many people as possible so the entire research community will see the advantages of integrated and interoperable PID systems. Not only THOR project partners are contributing to building this human infrastructure but we are receiving help from ambassadors from throughout the research community, from different countries and organisations.

The following people have already joined our group to promote THOR activities and to achieve our mission:

    • Chris Baars, DANS, The Netherlands (ORCID: 0000-0002-5228-1970)
    • Bruce Becker, CSIR Meraka Institute, South Africa (ORCID: 0000-0002-6607-7145)
    • Stephen Grace, London South Bank University, UK (ORCID: 0000-0001-8874-2671)
    • Jord Hanus, University of Antwerp, Belgium (ORCID: 0000-0002-0905-9864)
    • Patricia Herterich, University of Birmingham, UK (ORCID: 0000-0002-4542-9906)
    • Dave Lyons (ORCID: 0000-0002-2695-318X)
    • David McElroy, Birkbeck College (University of London), UK (ORCID: 0000-0002-0966-8862)
    • Eva Méndez, Universidad Carlos III de Madrid, Spain (ORCID: 0000-0002-5337-4722 )
    • Fiona Murphy (ORCID: 0000-0003-1693-1240)
    • Eduardo Olmedo, Universidad de Valladolid, Spain (ORCID: 0000-0001-7966-1423)
    • Stian Soiland-Reyes, University of Manchester, UK (ORCID: 0000-0001-9842-9718)
    • Maurice Vanderfeesten, Vrije Universiteit Amsterdam, The Netherlands (ORCID: 0000-0001-6397-4759)

THOR is delighted to have the support of these enthusiastic volunteers who can leverage their own networks to effectively communicate THOR’s mission. THOR ambassadors support and extend THOR’s outreach activities through presentations, networking, social media, conference attendance, and other methods of communication.

Discussions via e-mail on a variety of THOR topics take place regularly. This week Bruce Becker initiated a discussion on sorting and ranking works within a researcher’s ORCID record. Maurice Vanderfeesten commented: “I like the discussion that came out of this one e-mail. And it basically shows the enthusiasm and passion we have for the subject. I learned a lot. …Thank you ambassadors for meeting you all, and being a source of inspiration.”

Would you like to join our ambassador team and help promote THOR’s mission within your organisation and network? Don’t hesitate any longer and get in touch!

 

 

 

Diamonds are forever. What about research data?

THOR’s Project Coordinator Adam Farquhar (British Library) recently delivered a keynote at Our Digital Future. The conference, which was held in Cambridge on 14−15 March 2016, addressed challenges in long term preservation and archiving of digital data across a wide range of disciplines. His talk, titled “Diamonds are forever. What about research data?”, looked at some of the challenges to re-using data in the future and the fracture lines in the scholarly record between articles and data. In this talk, Adam considered ways to close these gaps and identified recent technical developments that may help researchers without radically changing their workflows. As Adam proposed, the emerging THOR infrastructure promises to help researchers get appropriate credit for the additional work that they do to make data re-usable now and in the future.

The presentation is on line at: https://sms.cam.ac.uk/media/2206135

THOR Workshop at the European Bioinformatics Institute, UK

On Thursday February 25th project THOR organised a workshop about the use of persistent identifiers (PIDs) within the biosciences community at the European Bioinformatics Institute (EMBL-EBI) in Hinxton, UK. EMBL-EBI are a THOR partner, and the event drew a capacity crowd. Team members from ORCID EU and EBI gave presentations on the scope and future of ORCID and the use of ORCID iDs and Research Object Identifiers (ROIs) at EBI. The workshop included technical sessions on ORCID dataflows in Europe PubMed Central, data submission forms and data claiming services. The workshop participants discussed integration possibilities between articles, data, and researchers across the research lifecycle; THOR’s main objective.

The first presentation introduced ORCID and described some of the latest developments in ORCID uptake and integration. How users can search using an ORCID iD in Europe PMC and the importance of linking your articles to your ORCID iD to get credit for your work was explained. The introduction to ORCID lead to a variety of general questions, for example how ORCID prevents and deals with duplicate iDs.

Questions about other benefits of  ORCID iDs, such as our peer review functionality, were discussed. Can reviewers still remain anonymous? Yes they can: publishers can, for example, publish ‘summary’ annual updates to ORCID records for their reviewers without linking them to the actual review report or manuscript. This way, reviewers of particular manuscripts remain anonymous but they do receive credit for the work they have done.

IMG_0570

Audience at EBI listening to Josh Brown (ORCID Regional Director Europe) explaining about ORCID iDs

The complexity of linking data sets and the published article was then discussed. Who is responsible for linking them? Would you like this to happen automatically? At some institutes and within other disciplines this is already happening but within the biomedical sciences sometimes the linking between multiple datasets can make this process more complex. Other connections, for example to the funding agencies, can be made as well, but they can also make the integration process more challenging.

Linking people to organisation identifiers can be challenging too as these identifiers vary in scope and actionability across different providers. On the other hand, it seems some types of identifiers are lacking, for example project identifiers. Other problems include attribution for non-traditional roles, such as data curation.

IMG_0571 (2)

Jo McEntyre (Head of Literature Services, EMBL-EBI, on the right) leading discussions on the use of persistent identifiers

Even though there are challenges to be tackled, the technical sessions of the workshop demonstrate that there are already solutions in place. A demonstration of the ORCID dataflows into Europe PMC shows that through continuous bidirectional updates using the notifications API, search discoverability is increased. When using your ORCID iD within data and article submission workflows, name ambiguity can be automatically overcome. A sample submission shown during the workshop demonstrated  that it is not complicated for researchers to submit data via submission forms and automatically link the data to their ORCID iD.

THOR will continue to bring infrastructure providers, publishers, repository managers, institutions and researchers from different disciplines together. As the discussions during this workshop demonstrated, the use of identifiers will differ for each scientific discipline. Would you like to organise an event like this within your discipline, at your institute or at your organisation? Please do get in touch with us and we can discuss the possibilities.

Webinar Data – Article linking services

On Thursday March 17, 3pm-4pm CET, OpenAIRE, THOR, Datacite, and ORCID will organise a one hour webinar on data – article linking services. Project OpenAIRE aims to support the implementation of Open Science in Europe and will demonstrate during the webinar how they collaborate with DataCite and CrossRef to establish a sustainable e-infrastructure for researchers. By linking article publications with datasets, research visibility and searchability will increase which will lead to a better organised international research infrastructure with less duplication and more opportunities for innovation.

Creating a sustainable international e-infrastructure by establishing seamless integration between articles, data, and researchers is also key within THOR’s mission. Both THOR and OpenAIRE are funded by the European Commission under the Horizon 2020 programme and during the webinar we will also discuss the differences between the projects and offer you the possibility to provide feedback on how to improve the work that is being done within these projects.

The webinar is open for anyone who is interested to learn more about data – literature linking, Open Access and how the different EU projects are collaborating to establish a sustainable e-infrastructure.

Please register here.