Reuse of Research Data: The Shape of Things to Come

Open Science needs incentives. Tracking data and software citation can be one of them.

To facilitate that, THOR strives to improve the usage of persistent identifiers for all scholarly objects. Tracking citations has proven difficult since there has been little reuse so far, but a first step in that direction can be improving the visibility and findability of data and software.

The CERN Open Data portal publishes data and code, and assigns DOIs to them. The boundary conditions for that publishing process are a bit “special”, as the data and code can be fairly complex and big (e.g. the last big release was about 320TB). Recently, this activity reached an important milestone with the first independent research paper based on the opened data, ‘Exposing the QCD Splitting Function with CMS Open Data‘. The publication of this paper highlights the potential of sharing data publicly,  and the benefits of enabling reproducible research.

The article reference list includes the recommended data citation along with the DOI, so that community portals can track the connections and impact of the data and software. Hint: check reference 72!

It is very important to create more milestones like this: share data and code, encourage reuse, citation and attribution of credit for the hard work that goes into data production and software development. We are only at the beginning of an interesting path.