Identifying Interpretation

Scientific research infrastructures collect large quantities of values. Values are typically numbers that result from observation, experiment, or computing activities. For instance, plant scientists collect values that result from observing fluxes of carbon dioxide on the leaf-atmosphere boundary; high-energy physicists collect values that result from observing collisions of atomic particles; social scientists observe the interactions of human populations and individuals, collecting values both qualitative and quantitative.

The interpretation of values is central to research investigations. With interpretation, values are given meaning in the context of investigations. The result of interpretation activities is information, and research infrastructures integrate information into existing bodies of knowledge. Therefore, research infrastructures are knowledge infrastructures or “robust networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds” (Borgman, 2015).

At the International Workshop on Reproducible Science, we presented the possibility of aggregating machine readable information in Research Objects. We have proposed to extend a Research Object Model (Belhajjame et al., 2012) with a new Resource called Interpretation. Existing Resource types include Dataset, Software, and Paper. In our proposal, machine readable interpretations are additional research artefacts created in scientific investigations. Research Objects thus capture also the interpretations given to observational, experimental, or computational values in research investigations.

Just as with other research artefacts, interpretations could be unambiguously and persistently identified in a global way. DataCite digital object identifiers (DOIs) could be used to enable unambiguous reference to interpretations, and the resolution to human and machine readable interpretation descriptions. The approach would also enable the citation of interpretations, and thus the recognition of contributions toward interpretations. Cross-linking of interpretations with the ORCID iD of contributors would enable unambiguous attribution.

Between the numerical values and the abstract high-level information reported in scientific articles, the primary information obtained by interpretation is generally refined into secondary and tertiary information. For example, primary information about individual events occurring in the environment, such as event date, location and duration, may be refined into secondary information about the seasonal mean event duration, and into tertiary information about the statistical significance in difference of seasonal mean duration. Curating such information and its provenance arguably supports the reproducibility of scientific investigations, from numerical values to the natural language text in scientific articles. Advanced knowledge infrastructure may increasingly capture, curate, and provide access to such information, in standardised and unambiguously identified form.

Belhajjame, K., et al. (2012). Workflow-Centric Research Objects: A First Class Citizen in the Scholarly Discourse. In Proceedings of the ESWC2012 Workshop on the Future of Scholarly Communication in the Semantic Web (SePublica2012), Heraklion, Greece.
Borgman, C.L. (2015). Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge, MA: MIT Press. ISBN 978-0-262-02856-1