Reuse of Research Data: The Shape of Things to Come

Open Science needs incentives. Tracking data and software citation can be one of them.

To facilitate that, THOR strives to improve the usage of persistent identifiers for all scholarly objects. Tracking citations has proven difficult since there has been little reuse so far, but a first step in that direction can be improving the visibility and findability of data and software.

The CERN Open Data portal publishes data and code, and assigns DOIs to them. The boundary conditions for that publishing process are a bit “special”, as the data and code can be fairly complex and big (e.g. the last big release was about 320TB). Recently, this activity reached an important milestone with the first independent research paper based on the opened data, ‘Exposing the QCD Splitting Function with CMS Open Data‘. The publication of this paper highlights the potential of sharing data publicly,  and the benefits of enabling reproducible research.

The article reference list includes the recommended data citation along with the DOI, so that community portals can track the connections and impact of the data and software. Hint: check reference 72!

It is very important to create more milestones like this: share data and code, encourage reuse, citation and attribution of credit for the hard work that goes into data production and software development. We are only at the beginning of an interesting path.

PIDs in Poland: let’s link research!

The ongoing drive within the THOR project to identify and connect the research landscape reached Warsaw on Monday 24 April. Organised in collaboration with Crossref, the workshop focused on the ways in which persistent identifier (PID) services, such as those provided by members of the THOR consortium and Crossref, can represent ‘much more than infrastructure’ by ‘working together to connect research’. Hosted by the Digital Humanities Centre at the prestigious Institute of Literary Research of the Polish Academy of Sciences, the workshop brought together a packed audience of publishers, data managers, researchers, librarians and administrators for a day of knowledge-sharing and discussion centred on increasing access to research output.

Professor Łukasz Szumowski (Under Secretary of State with the Ministry of Science and Higher Education) opened the event with a recognition of just how quickly the digital world is changing. He stressed the need for developing new mechanisms in bibliometrics to enable objective evaluation that can guide public funding of research.

Professor Paweł Rowiński also extended a welcome as Vice President of the Polish Academy of Sciences, home to 69 institutions spanning a multitude of disciplines. Introducing a thread that ran throughout the day, Rowiński highlighted the fact that persistent identifiers not only make research more accessible, they can provide an incentive for scientists to share data, safe in the knowledge that their achievements will be more visible and attributed to them.

In the morning sessions, Rachael Lammey (Crossref), Ginny Hendricks (Crossref), Josh Brown (ORCID), Laura Rueda (DataCite), Rachael Kotarski (British Library) and I (Ade Deane-Pratt, ORCID EU) gave an overview of the persistent identifier landscape and the services that are being developed to support them as this landscape evolves. Crossref recently gained 26 new members from Poland alone, making it one of their fastest growing countries.

Some common themes emerged from the presentations and discussion: achieving persistence is a process, one that involves constant evaluation and adaptation. The challenges can be highly domain specific, and a number of questions also remain unresolved.

But permissions and privacy are key. Services such as EThOS, the British Library’s repository of doctoral theses, can make it easier to track career paths, but at the same time throw up the challenge of claims for legacy theses.

And more broadly: is it possible to enact a cultural shift away from the citation of physical objects to their digital representations? When is it appropriate to do so?

The afternoon sessions saw some interesting case studies from Polish industry and academia, and some robust discussion, with contributions from Dr Eng. Jakub Koperwas (Warsaw University of Technology), Marcin Werla (Poznań Supercomputing and Networking Centre) and Dr Marta Hoffman-Sommer (Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, RepOD Repository for Open Data, OpenAIRE NOAD for Poland).

We heard about the effort to build from scratch a university knowledge base with clear and consistent metadata that semantically links the full spectrum of academic activity, encompassing conception and funding, the research process, publications, implementations, practical applications, patents and results. The motivation was that it should be possible, for example, as a researcher, manager, funder, administrator or librarian, to interrogate the system to find experts in a field. We also heard about the work and challenges involved in providing an infrastructure − the PIONIER Network in this case − to support research via PID uptake. An outstanding question is how to prevent the duplication of DOIs assigned to the same object.

During the discussion that closed the day, we heard from both panel and audience on what the future of research communication should look like. With the event coming hot on the heels of the deadline for contributions to a new Polish national research evaluation exercise, the topics of making research communication more effective, and capturing and sharing information were naturally of real significance to the room. One thing was abundantly clear: persistent identifiers are integral to that future.

This timely meeting was just one strand of ongoing work to improve scholarly infrastructure and make the research landscape fit for 21st-century purpose. You can download slides from the day here: https://zenodo.org/communities/2017-24-04-warsawmeeting/?page=1&size=20. And you can keep abreast of future events at our website.

Webinar Series: Persistent Identifiers: What, Why and How?

Please join the THOR Project for a series of three webinars focusing on applications of persistent identifiers (PIDs). The first webinar will explain what PIDs are and why they are important. The second will dive a bit deeper, giving more insight into how to use PIDs and what services can be built on top of identifier systems. The series will end with an introduction to the PID (s)election guide: how to determine the most appropriate identifier for your needs.

Following these three webinars you will be fully up-to-date on PID systems; all webinars will offer enough time for Q&A. You can register here:

  • Webinar 1: Overview of PID systems, Jonathan Clark, International DOI Foundation
  • Webinar 2: Persistent identifiers: current features and future properties, Juha Hakala, National Library of Finland
  • Webinar 3: Persistent Identifier (s)election guide, Marcel Ras, Netherlands Coalition for Digital Preservation (NCDD) and Remco van Veenendaal, Dutch National Archives

All three webinars will be recorded and made available on the THOR YouTube channel. Full descriptions of the webinars can be found below. 


Webinar 1: Overview of PID systems

Presenter: Jonathan Clark
Date and time: 18 May, 16.00-17.00 CET
Register here


Are you intrigued, interested or simply a bit confused by persistent identifiers and would like to know more? Then this introductory level webinar is for you! The webinar will be especially interesting if you are working with digital archives and digital collections. We hope you will get a clear understanding of what persistent identifiers are, why they are important and how trustworthy they are. We will also discuss how you can determine the most appropriate identifier for your needs. There will be plenty of time to ask questions. Note that this will not be a deeply technical webinar.

Topics that will be covered include:

  • What are persistent identifiers?
  • The case for PIDs – knowing what’s what and who’s who
  • The data architecture of PIDs
  • What is social infrastructure and why is it important?
  • Review of current identifier systems
  • How to choose a PID System
  • Case studies in documents, data, video

Jonathan Clark is the Managing Agent for the International DOI Foundation (IDF) which is a not-for-profit membership organisation that manages the DOI (Digital Object Identifier) and is the registration authority for the ISO 26324 standard. Jonathan also works as an independent advisor on strategy and innovation. Prior to this he was at Elsevier for 20 years in various positions in publishing, marketing and technology. He holds a BSc and PhD in Chemical Engineering from the University of Newcastle-upon-Tyne. He lives in the Netherlands and when not working he can most often be found refereeing rugby matches.


Webinar 2: Persistent identifiers: current features and future properties

Presenter: Juha Hakala
Date and time: 1 June, 15.00 – 16.00 CET
Register here.


You should attend this webinar if you know what persistent identifiers are but are interested in knowing much more about what you can actually do with them. In other words, what are the services that are being built on top of identifier systems that could be useful to the digital preservation community? We will cover topics such as party identification, interoperability and (metadata) services such as multiple resolution. Following on from that, we will explain more about the next generation of resolvers and work on extensions, such as specification of the URN r-component semantics.

Juha Hakala is a senior advisor at the National Library of Finland. After obtaining a degree in library and information science from Tampere University, he has held various positions in the National Library since 1987. From the beginning he has concentrated on library automation, including standardisation. His involvement with persistent identifiers started more than twenty years ago, when URN syntax was established in the Internet Engineering Task Force (IETF). He is closely involved with the revision of URN syntax and various other URN-related on-going efforts in IETF, and the maintenance of standard identifiers of ISO TC 46.


Webinar 3: Persistent Identifier (s)election guide

Presenters: Marcel Ras and Remco van Veenendaal
Date and time: 13 June, 16.00 – 17.00 CET
Register here


You should attend this webinar if you want to learn about how to choose the most suitable identifier system for your needs, and how to implement persistent identifiers in your own organisation.

Cultural heritage organisations, regardless of size, are often hesitant to implement PIDs. They lack knowledge of what PIDs are, don’t know about their capabilities and benefits, and fear a possible complex and costly implementation process as well as the maintenance costs for a sustained service. The Digital Heritage Network and the Dutch Coalition on Digital Preservation addresses these issues in three ways:

  1. By raising awareness of (the importance of) PIDs in cultural heritage organisations.
  2. By increasing the knowledge regarding the use of PIDs within cultural heritage.
  3. By supporting the technical implementation of PIDs in cultural heritage collection management systems. How we did this on a nationwide scale will be explained in the webinar.

There are multiple PID systems. But which system is most suited to your situation: Archival Resource Keys (ARKs), Digital Object Identifiers (DOIs), Handle, OpenURL, Persistent Uniform Resource Locators (PURLs) or Uniform Resource Names (URNs)? Each system has its own particular properties, strengths and weaknesses. The PID Guide from the Digital Heritage Network’s Persistent Identifier project helps you learn and think about important PID subjects, and guides your first steps towards selecting a PID system.

Marcel Ras is the Program Manager for the Netherlands Coalition for Digital Preservation (NCDD). The NCDD was established in 2008 to promote national collaboration to ensure the long-term availability of digital information in The Netherlands. Marcel is NCDD’s Program Manager since January 2014 but has some years of experience in digital preservation. He started his digital preservation career at the National Library of The Netherlands (Koninklijke Bibliotheek), where he set up a web archiving program. From 2007 to 2011 Marcel was manager of the e-Depot department at the KB and responsible for acquisition, ingest and long term storage of digital publications in the library. As program manager for the International e-Depot, he was responsible for the development of the international e-journals archiving program of the KB in 2011.

Remco van Veenendaal is a Preservation Advisor for the Dutch National Archives (Nationaal Archief) in The Hague. He contributes to the (digital) preservation policies of the Nationaal Archief, and to the development and implementation of the e-Depot. In March 2015 he established Veenentaal. Veenentaal advises on applications on the intersection of language and computers, and how those applications can improve organisations’ opportunities. Remco has acquired more than fifteen years’ experience on the intersection of language and computers. Before joining the Dutch National Archives he was project manager of the Flemish-Dutch Human Language Technology Agency (HLT Agency or TST-Centrale), a repository for digital Dutch language resources.

#ENVRiD: Integrating ORCID iDs in Environmental Research Infrastructures

THOR – ENVRIplus Bootcamp

Logos

On March 28 and 29, representatives from over twenty environmental research infrastructures gathered at Aalto University, Finland to discuss ORCID integrations and more.

Tweet Asmi
Tweet: Starting #ENVRiD!

After introductions to the organising projects (THOR and ENVRIplus) and a general introduction to ORCID, Markus Stocker (PANGAEA) kicked off the series of presentations on ORCID integrations with a live demo on how to connect your PANGAEA account with ORCID and log in with your ORCID iD. This demo immediately showcased one of the key benefits of integrating ORCID within your infrastructure: through linking with ORCID, PANGAEA automatically receives the information you have given ORCID permission to share, in particular your ORCID iD. This enables automated cross-linking of data DOIs and contributor ORCID iDs and sharing of such link information with PID infrastructure, specifically ORCID. Xiaoli Chen’s (CERN) presentation on ORCID integration at CERN also showed the benefits of integrating ORCID within the high energy physics community − for example: how to deal with a publication with no less than 2853 authors!

Markus
Photo: Markus Stocker welcoming participants in Helsinki

The ORCID integration talks continued with representatives from two environmental research infrastructures, namely ICOS and Argo, as well as the EGI e-Infrastructure. While these infrastructures have not started fully integrating ORCID within their systems, the talks gave an overview of their current plans.

  • At ICOS, ORCID has been integrated in Carbon Portal user profiles and the team is working to implement the integration following best practice (ie obtaining validated iDs from ORCID). As there are currently only a few people with ORCID iDs, the main challenge is to motivate people to create an ORCID account and link their user profiles.
  • After instructions on how to cite data were included in Argo’s user manual, more people have started to assign DOIs to their datasets. DOIs make citing much more efficient, but at present the DOIs used do not provide credit to the individual contributors, since Argo is listed as the single author of datasets. Argo has identified ORCID iDs as a tool to list and credit individual contributors. Argo’s metadata describing the different roles of contributors to the dataset will be pushed to DataCite. DataCite will then push the information to ORCID records automatically.
  • At EGI, users with ORCID iDs can use their iD to login to the EGI Checkin service, which enables them to get authenticated access to EGI resources and tools. Further plans for integration, which are already in development, include linking to articles and datasets.
Argo
Slide: Argo: Auto-update ORCID record through DataCite DOIs

After lunch, Tom Demeranville (ORCID) explained more about the ORCID API and ORCID’s collect and connect program. Laura Rueda (DataCite) stressed the importance of complete and interoperable metadata − it even got its own slide! (see below) – and Kristian Garza (DataCite) showed the importance of complete metadata in his demo of claiming published datasets to ORCID retroactively. Other topics discussed in the afternoon included the Scholix framework and DataCite event data.

metadata groot
Slide: Metadata!

Day one ended with a discussion on the motivations for research infrastructures to integrate ORCID iDs in their workflows. The main motivations are attribution and disambiguation. Other reasons mentioned by the participants are the benefits of automated workflows and interoperable research systems whereby information is pushed to and linked within different systems and repositories automatically. Another reason for ORCID integration is that ORCID iDs are required by some publishers. Some of the biggest challenges to integration, however, were identified as being social rather than technical: getting people to register, and making sure they will use their iDs, was cited as one of the biggest barriers. For journal articles, however, ORCID iDs are more accepted as common practice. Yet people need to be encouraged to use their iD when they are uploading their datasets as well. Funding and time constraints to build the integration itself also pose a challenge.

Tom
Photo: Hands on with the ORCID API

On day two, attendees split into two breakout sessions to attend either the infrastructure developers’ or managers’ track. In the developers’ room, Tom Demeranville took the participants through a hands-on session on the ORCID API. In the managers’ track there were more general presentations on new developments within the PID community, such as on the Organisation Identifier project, dynamic data citation, and PIDs for instruments. As these are all new initiatives, more work and discussions are needed to move forward to take into account the different requirements by different stakeholder communities. For example, within the environmental research community, more discussion is needed on how to describe instruments. Should DOIs be used? Or is it better to use serial numbers for physical objects? And what happens when organisations use the same instruments? One solution that was suggested was the adoption of a form of ISO standard that is recognised by different countries. For dynamic data citation, there is no standard solution in place yet.

For the environmental research institutions that want to take their ORCID integrations forward, the same also applies. A short exercise showed that most RIs think that pursuing ORCID integration is urgent. And as the closing summary of the participating infrastructures’ intentions towards ORCID integrations shows, most RIs are either thinking about it or are definitely going for it this year. Much work remains to be done but we are confident that at #ENVRiD Part Two we will see progress toward such integrations!

Much more than infrastructure: working together to connect research

Crossref/THOR Outreach Meeting, Warsaw, Poland

Monday, 24 April 2017

Digital Humanities Centre at the Institute of Literary Research, Polish Academy of Sciences


This outreach meeting aims to explore how the research community can work together to help connect research and improve discoverability of content – publications, data and more.

Representatives from Crossref and Project THOR partners (ORCID, DataCite and the British Library) will introduce and provide updates on their initiatives and services (and how they work together). We will also have panelists from Polish institutions join us to discuss how the research landscape looks for them, and how they might work with some of the services discussed.

The day aims to provide a deeper understanding of foundational scholarly infrastructure, but also to have the opportunity to discuss how that can be used in publisher and researcher workflows.

We welcome editors, publishers, librarians, researchers, funders and the wider community to come share their thoughts and ideas. There will be lots of time for discussion and questions, so please join us and register here.


Agenda

08.30-9.00 Registration & coffee
09:00-09:10 Welcome from organisers
09:10-9.30 Opening remarks

Professor Łukasz Szumowski, Under-Secretary of State, Ministry of Science and Higher Education

Professor Paweł Rowiński, Vice-President of the Polish Academy of Sciences

9:30-11.00 Introduction to Persistent Identifiers

Crossref, ORCID, DataCite, Project THOR, PID Interoperability

11:00-11:20 Coffee
11:20-12:50 Persistent Identifier Services

Crossref services, THOR partner services, PID Integrations. Discussion

12.50-13.50 Lunch
13:50-15:20 What’s happening: Plans & Applications

  • Industry initiatives and how to get involved with Project THOR partners
  • Polish case-studies
15.20-15.40 Coffee
15:40-17:00 Panel Discussion: Let’s Link Research! Persistent Identifiers for Polish Scholarship.

Moderator: Dr. Maciej Maryl, Digital Humanities Centre at the Institute of Literary Research, Polish Academy of Sciences
Dr. Marta Hoffman-Sommer, Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, RepOD Repository for Open Data, OpenAIRE NOAD for Poland

Dr. Eng. Przemysław Korytkowski, West Pomeranian University of Technology in Szczecin, member of the Committee for Evaluation of Scientific Units at the Ministry of Science and Higher Education

Dr. Habil. Emanuel Kulczycki  Adam Mickiewicz University in Poznań, the chairman of the Specialist Team for the Evaluation of Scientific Journals at the Ministry of Science and Higher Education

Rachael Lammey, Member & Community Outreach, Crossref

Laura Rueda, Communications Director, DataCite

Josh Brown, Director of Partnerships, ORCID

17.00 Closing remarks

CRefThor combined

DigHumPol

 

Challenges of Measuring PID Adoption

This has been cross-posted on the ORCID blog.

The THOR team is hard at work helping forge the path to sustainable persistent identifier (PID) services. As with any long-term goal, a bit of self-reflection is helpful for tracking your progress, considering your successes, and psyching yourself up to tackle challenges along the way. In the case of a project like THOR, we can help this self-reflection along by developing a structure to help us properly measure our success as we go. But this is often tougher than you might think.

In the early days of PID services, it was fine to be concerned only with uptake, since the priority was to get the word out. While we still have some work to do there, PID services have now matured to the point that we can no longer be satisfied solely with simply “getting the numbers up.” We need to tailor our messages in order to drive further innovation towards the interoperable future that THOR and our partners dream of. Having better information about underlying motivations for adopting PIDs and about who might be ready to do so will help us drive the creation of services that will make the whole system better. To further this warm and friendly mission, we need cold hard facts. So how do we go about finding those facts? And how do we turn them into something useful and, quite frankly, a bit less prickly?

What can be measured?

The first step in evaluating our progress was to set objectives that are actionable and measureable. Though it’s tempting to set strict performance targets, this is just setting yourself up for failure. If you define success as selling 50 widgets, and you only sell 48 then, by your own definition, you’ve failed. In THOR’s case, our driving purpose is infrastructure improvement, so we’re more interested in observable trends rather than concrete targets. Developing key performance indicators (KPIs) is helpful here. Remember that an indicator is just a way to consider trends (e.g. “number of widgets sold”), and it isn’t itself a target (e.g. 50 widgets).

How should it be measured? (With which indicators?)

The next step was to determine how to measure what we want to measure. The goal here is to select indicators that are valuable as well as meaningful. “Valuable” means that knowing the indicator’s status will help us to make a decision. “Meaningful” means that we understand what the indicator is actually tracking. If the trend line associated with our chosen indicator goes up, will we know what that means for us, and will we know how to react?

Part of the difficulty of selecting indicators in this way is that the most meaningful and valuable information for you might not be immediately available. When THOR first started down the indicator path, we just wanted easily gatherable quantitative measures; we weren’t looking to take on any complex user studies. However, some of the information we wanted wasn’t available, either because it wasn’t being tracked on a regular basis or because gathering it ourselves would have been a manual process we weren’t yet willing to take on.

How should it be measured? (Tool or no tool?)

Once you know what your objectives are and which indicators will help you track your progress to those objectives, you need a convenient way to monitor it all. Fancy tools may not be necessary, in fact most of the time they probably aren’t, depending on which indicators are important to your particular flavour of success. But we wanted to demonstrate some of the possibilities of having PID measures ready to aggregate — and if we’re honest, we do like fancy — so we developed a dashboard to keep everything in one place. (Read more about our process in our report.) Creating the dashboard was a good exercise in establishing what could be measured and how. It also gave us a chance to explore what meaningful metrics might be. For instance, we can see that PID uptake is on the rise, and we can see some information about the metadata that is associated with those PIDs, but this doesn’t actually give us any insight into causal relationships or let us know precisely why this trend is happening or even exactly who is involved.

Because we’re all about meaningful data, these adventures in measurement have led the THOR team to identify gaps in the available metrics surrounding PID service adoption and to consider which additional indicators might be useful for future work in the PID research space. We’ve now embarked on a more detailed gap analysis that will lead to a study of some of these missing measures. Since our goal is to drive PID service adoption, we’ve identified disciplinary coverage and geographic distribution as our most promising themes to pursue. We are now collecting the data we need to analyze PID adoption in X disciplines and Y countries – a full report will be available later this year.

Moving forward

So what have we learned throughout this process? First and foremost, not everything is as concrete as you might want it to be. When you’re dealing with humans and human behaviours, things get squishy. Second, since we’re only monitoring existing trends based on factors we don’t necessarily control, some information available to us will remain just “good enough” until others can do more detailed work to either improve the data or flesh it out. Our job for the remainder of the THOR project is to point out what would be most useful to know about interoperability, so that it can be studied.

The PID field is still evolving and has a lot of growth and changes left in store. Some potentially valuable information requires further study to tease out. Our service adoption study, beginning with the gap analysis, will help us make a start on that research, and we hope to gather some useful information that can set the stage for future work. We’ll also need help from the wider PID user and integrator community to improve existing metadata and to help us consider meaningful metrics.

As always, if you have questions or comments about THOR, please get in touch.

Giving Credit for Data with Claiming Services

Researchers demand credit for the work that they do. While there are well established practices and services in place to give credit for traditional publications, these are sorely lacking for the full range of research artefacts, including data and software.

THOR partners have been busy developing data claiming services. The results are published in our latest report, ‘Services that Support Claiming of Datasets in Multiple Workflows’ (10.5281/zenodo.290649), where you can read about the successful implementation of claiming services in the databases and services of disciplinary repositories as well as PID infrastructures of several THOR partners.

The report summarises progress on facilitating researchers and other contributors to associate research artefacts with their ORCID record, a process known as claiming. The dataset claiming process involves creating, maintaining, and sharing information about the relationship between researchers and datasets.

We describe our experience implementing claiming workflows at five organisations, identifying some of the shared challenges as well as the unique issues each organisation faced developing and successfully deploying the claiming process into a live operational production system.

This is an important advance in enabling unambiguous attribution and credit for research.

While technical challenges remain, such as synchronisation of claims, technical capabilities have substantially improved. The human and social challenges are now coming to the fore: we must ensure that claiming services are widely adopted and used across the research communities.

ORCID Integrations in Environmental Research Infrastructures

A THOR-ENVRIplus Bootcamp

Are you working in a technical or leading role within an Environmental Research Infrastructure? Join us at Aalto University in Finland on March 28-29, 2017 to learn more about ORCID integrations and discuss best practices with colleagues from other Environmental Research Infrastructures.

Project THOR supports seamless integration between articles, data and researchers across the research lifecycle. ENVRIplus brings together Environmental and Earth System Research Infrastructures, projects and networks to create an interoperable cluster of Environmental Research Infrastructures across Europe. These two H2020 projects are joining forces by organising a bootcamp focused on ORCID integrations in Environmental Research Infrastructures.

The two day event offers a unique opportunity for knowledge exchange between persistent identification experts from THOR partner organisations (in particular ORCID and DataCite) and the managers, as well as developers, of Environmental Research Infrastructures, in particular ENVRIplus partners.

The bootcamp has a strong emphasis on ORCID integrations and will touch upon the specific challenges Environmental Research Infrastructures are facing in regard of such integrations. The bootcamp also focuses on the technical aspects of implementing ORCID integrations. In addition to infrastructure managers, we thus strongly encourage developers to participate as well.

On March 28, we will give an introduction to ORCID and concepts. We also demonstrate various types of integrations in systems. The second day (March 29) is structured in two separated tracks: Research Infrastructure Developer and Research Infrastructure Manager. We still encourage participants to provide us with input on bootcamp topics. You can enter suggestions when you register for the bootcamp hereThe preliminary agenda topics are included below:

Tuesday March 28

  • Introductions to ORCID and concepts
  • ORCID Integrations in THOR partner systems
  • ORCID Integrations in environmental research infrastructures
  • Challenges and opportunities at the different research infrastructures
  • Q & A and discussion: ORCID integration, Metadata, identification of co-authors, crosslinking PIDs etc.

Wednesday March 29

Parallel track 1: Hands-on exercises for RI Developer

  • Coding ORCID integrations
  • Mining ORCID data dump
  • ORCID iDs in research infrastructure metadata (e.g. SensorML)
  • PID linking and link information exchange
  • ORCID and the ENVRI Reference Model

Parallel track 2 : Discussion/presentation sessions for RI Manager

  • Crosslinking data, author, publication
  • PIDs for instruments, platforms, deployments
  • Dynamic data identification
  • Cost of integrations
  • PIDs in workflows involving research infrastructures and e-infrastructures

The THOR-ENVRIplus team is looking forward to seeing you in Finland!

thor-envriplus-1

Hasta la Vista, THOR Bootcamp

With local support from THOR ambassador Eva Mendez, the first edition of the THOR Bootcamp was successfully carried out in Madrid, at Universidad Carlos III de Madrid on November 16-18. The Bootcamp is part of THOR’s outreach effort to engage and train local scholarly communication communities to further adoption of PID services. The full set of slides used can be found on the THOR Knowledge Hub.

THOR colleagues from different partner organizations and guest speakers from local research organizations joined forces to present a full curriculum on PID topics, from existing tools and services to technical and policy implementation. The event attracted more than 130 registrants in total and yielded valuable experience for both the attendees and the THOR project.

The Bootcamp consisted of 3 modules to cater tailored content to different audiences. The first half-day was organized as an integral part of research training for Ph.D. students and other young researchers at UC3M, focusing on Open Science recommendations and the incorporation of PIDs in existing research workflow. Students came from different disciplinary backgrounds and brought with them distinct questions, the Bootcamp provided a great opportunity for us to engage the young researchers’ community and address their concerns directly . 

“I consider the instruments presented along with the seminar an extremely powerful way to collect, share, exploit and advertise the work of a researcher in a way which is mostly new and free from older constraints. The value of the research itself is so, enhanced and collaborations are made way easier in benefits of the results.”

— Rocco Bombardieri, Ph.D. student at UC3M

bootcamp1
Ph.D. Students attending THOR Bootcamp at UC3M

The second day was reserved for local information professionals and research data service stakeholders (librarians, researchers, research administrators and policy makers). Their day followed an intense schedule consisting of talks and a mini panel with service implementation experience by ORCID, DataCite and CERN.

bootcamp2
Local information professionals at THOR Bootcamp General Day at UC3M

The final half-day offered a more technical tutorial. The self-contained programming module enabled participants to build a metrics dashboard that visualized data interactively, based on the technology used in the THOR dashboard. As a hands-on session designed for non-technical and technical savvy attendees alike, it was great to see how people from a variety of technical backgrounds approached the tutorial and contributed to the ensuing discussion.

bootcamp3
Instructors of the Hands-0n Day, Ioannis Tsanaktsidis (left) and Kristian Garza (right).

We aim to establish ties with research organizations and institutions by providing tailored PID content via the Bootcamp series — two more Bootcamps will be held in March and May next year (2017). Stay tuned to find out if we are coming to your neighborhood soon! Or better yet, if you want to organize your own Bootcamp, sign up to be an ambassador and we will provide all the materials that are ready to be reused, plus event planning tips for bringing your local community up to speed with PIDs.

THOR at PIDapalooza

If November taught us anything, it’s that open identifiers clearly do deserve their own festival. On 9th and 10th November 2016, people from all over the world gathered in Reykjavik to share PID stories, demos, use cases, victories, horror stories, and new frontiers at PIDapalooza, the first conference dedicated to PIDs. The THOR team travelled to the country of glaciers and volcanoes to talk about project identifiers, persistent identifiers for instruments, PIDagogy and measuring PID adoption.

PIDs for Projects

Martin Fenner (DataCite) and Tom Demeranville (ORCID) presented their work on project identifiers to a full house. They proposed that project IDs should be used to link participants, outputs and funding. But the most suitable identifiers to describe projects? That was left open for discussion – a discussion that quickly turned heated. What, even, is the exact definition of a project? What would persist if the project ends? Would researchers be willing to share the information needed for the project ID? How would we describe the metadata, given that a project does not have a publication date? Clearly more research needs to be done to answer these important questions. Keep an eye out for the announcement of a THOR webinar on Project identifiers, which will be held early 2017, in which we will be resuming this discussion.

tom

Tom Demeranville leading the discussion on PIDs for projects

Persistent Identification of Instruments

Markus Stocker (PANGAEA) continued to explore new frontiers with a presentation on PIDs for instruments, instrument platforms and their deployments. Beyond enabling the unambiguous identification of these entities as well as reference to them in articles and other research artefacts, Markus suggested that metadata preservation about these entities is critical for researchers to judge the fitness of observation data for reuse. He presented two examples for systems that already assign DOIs to deployments and platforms. A key challenge for the community is to decide on the required metadata for preservation.

markus-2

Twitter activity during Markus Stocker’s presentation on PIDs for instruments

The Human Perspective

Building the technical infrastructure for open research was a clear theme at the conference, but how do we move from infrastructure to adoption? How do you teach, learn, persuade, discuss and grow the uptake of PIDs in everyday research practice? My presentation showcased the contribution that the THOR ambassador network is making to the human infrastructure around PIDs. By organising training activities within their own communities and sharing training materials, THOR ambassadors are helping to overcome the cultural barriers to PID adoption. These forms of collaboration are not only critical between THOR partners and ambassadors, but need to extend to other organisations and projects in order to integrate PIDagogy within the Research Data Science Curriculum. The importance of communication was also reiterated in other sessions on PIDagogy, in which participants designed infographics to promote and explain PIDs to different stakeholder groups. These materials will be developed further and made available for the community to (re-)use.

discussion-2

PIDapalooza crowd developing videos, infographics and quizzes for PID adoption

Challenges of Measuring PID Adoption

Salvatore Mele (CERN) discussed the challenges of measuring PID adoption. THOR has already developed a comprehensive dashboard, which shows ORCID and DOI uptake over time. But the ways in which we evaluate and interpret the results remain open for discussion. Salvatore explained that it is difficult not to get philosophical when talking about measurement of PID uptake. What information is missing? What do we not (yet) know? And what further steps can we take to know the unknowable?

img_0798

Salvatore Mele explaining the THOR Dashboard

PIDapalooza definitely generated as many questions for THOR as we brought to the table. Participating and presenting at this event was a great opportunity for the team to discuss ideas and generate more thought for further research and future collaboration, complementing the PID frontiers already being explored by other organisations. And yes, THOR definitely believes identifiers deserve their own festival and is looking forward to PIDapalooza 2017!

Want to know more about PIDapalooza?