Preserving research data by the end of a project

Archiving and publishing research data

When reaching the end of a research project, an important activity is to preserve research data for the long term. Increasingly, archiving and publishing research data is a requirement from research funders and national governments.

For instance, The Research Council of Norway, the European Research Council, and the Norwegian Ministry of Education and Research expect open access to research data under the principle “as open as possible, as closed as necessary”.

Aligned with these expectations, the guidelines for the management of research data at Nord University state that, as a general rule, researchers shall make research data openly accessible for further use for all relevant users, except when there are legal, ethical, security-related or commercial reasons for not doing so. When it is possible to preserve your data, an important decision concerns where you will keep your data. Data preservation can be achieved in two main ways:

  • Via journals' supplementary material services

  • Via different types of data repositories (e.g., domain-specific, institutional, general purpose). Using data repositories is generally a better strategy than using journals’ supplementary material services, as journals may claim copyright over the data, and may keep the data closed behind a subscription wall. Data repositories facilitate access to your data and allow data to be linked to publications through a data citation.

Source: CESSDA Data Management Expert Guide

  • Consider these options when choosing a data repository. In doubt, contact research-data@nord.no for guidance on how to choose a suitable repository for your data.

    • Research funders may require a specific repository for your data. In such cases, the indicated repository must be used. For instance, for projects within the social sciences, humanities, medicine and health, The Research Council of Norway may require researchers to archive data in the repository from Sikt.

    • Consider a trusted domain-specific data repository. If a trusted domain-specific data repository is already established in your research field, it is a good strategy to make use of it. This type of repository is likely to concentrate much of the research data being produced in a field, and data deposited in this type of repository is likely to have an increased impact. This guide from Springer Nature may help you find a suitable repository in your field.

    • As an employee at Nord University, you can use the institutional repository Nord University's collection in DataverseNO. This is a multi-disciplinary, CoreTrustSeal certified repository for open research data. Submitted data are curated, and published data will adhere to the best practices regarding e.g. data documentation and preferred file formats.

    • If the research data contain information that can identify individuals, the data can be archived at Sikt. Two important notes: (1) Regarding processing of personal data, GPDR must always be followed. Personal data cannot be stored longer than is necessary to achieve the purposes for which the personal data are processed. Therefore, all personal data must be deleted or anonymized at the end of the project. In special circumstances, e.g. in cases where the data have high re-use value, it may be possible to preserve research data with personal data even after the project has ended. However, this requires consent from study participants. In addition, you must inform about this when you report the project to Sikt so that they can provide their assessment. If the project requires approval from REK, the approval must also include the archiving of personal data beyond the project period; (2) Furthermore, it is worth noting that GDPR does not apply to anonymous data. This means that when a data set is completely anonymized, it can be preserved for the long term in open archives (e.g. in NORD Open Research Data). An assessment concerning research ethics must still be taken in this case.

    • Uninett Sigma2 Research Data Archive (NIRD) may be suitable for large data sets produced through Uninett Sigma 2's services for heavy computing.

    • Consider general-purpose repositories such as Zenodo or Figshare. Remember to check their terms of use and whether they have data publishing charges. If charges are involved, plan ahead how your project will cover such expenses.

    • If none of the options above apply, search re3data.org to discover other data repositories. You can also contact research-data@nord.no for guidance on how to choose a suitable repository for your data.

    Further information on how to choose a repository can also be found on OpenAIRE and Open Science Toolbox.

  • According to the guidelines for management of research data at Nord University, research data that is free of legal, ethical, security-related or commercial issues shall be made available as early as possible. Further, the guidelines state that:

    • Data providing the basis for scientific publications shall be made available no later than at the time of publication;

    • Other data that may be of interest to other research projects should be made available within a reasonable period of time, and never later than three years after the completion of a project.
  • For data that is free of legal, ethical, security-related or commercial issues, there is still the question of which portion of the data that will be selected for preservation. It is a good practice to set criteria for data preservation early in a research project. Digital Curation Center (DCC) offers some pointers in this respect:

    • When it comes to data underpinning a scientific publication, it is important to enable others to understand and reproduce the process of how your research results have been achieved on the basis of your data. In this case, the data to be preserved are the data used in the study, along with proper documentation.

    • For other data that may be of interest to others, a general recommendation is to be as inclusive as possible. An inclusive strategy is appropriate because there are many potential future uses for research data, and some of these uses may not be clear during a project. Consider, for instance, the following potential uses for research data:

      • Potential for further analysis: opportunities for further analysis of the data e.g. using new methods and integration with other sources of data.
      • Potential for further research: data can lead to new research questions, complementing the original study from which they were originated.  
      • Research community development: data of value to a known user group.
      • Learning and teaching: research data can also be valuable for learning/teaching purposes and/or for public engagement.
      • Potential for collaboration: research data that is discoverable can also foster collaboration with various stakeholders in future projects e.g. other research groups or industrial partners and public organizations.
  • When you make use of a data repository, you need to choose a license for your data. A license specifies what users are allowed to do with the data. A good practice is to choose standard licenses. Creative Commons licenses are generally recommended because they are widely adopted, human and machine-readable, and easy to use and understand.

    Following scientific community practices for research data, Nord University’s institutional repository Nord University's collection in DataverseNO uses the CC0 "Public Domain Dedication"  license as default. CC0 allows data to be distributed, copied, re-formatted, and integrated into new research, without legal impediments.

    CC0 enables scientists, educators, artists and other creators and owners of copyright- or database-protected content to waive those interests in their works and thereby place them as completely as possible in the public domain, so that others may freely build upon, enhance and reuse the works for any purposes without restriction under copyright or database law.” Source: Creative Commons.

    In addition, CC0 avoids attribution stacking challenges. That is, it can become complex to attribute datasets correctly when combining data from many different sources and with different licenses.

    It is important to note that although the CC0 license does not legally require attribution to the author, research ethics guidelines and norms still apply. In the same way as with academic literature, researchers must always inform their data sources and cite data in their research outputs. However, if CC0 is not applicable for your data, contact research-data@nord.no for making use of other Creative Commons licenses.

Why preserve research data?

By archiving and publishing research data, others can find and access your data later on for various purposes (e.g., consultation, verification, and re-use). In this respect, a best practice is to archive the data in a trustworthy repository with adequate documentation and in suitable file formats.

Also, data publication increases the recognition of scholarly work. Open research data contributes to higher visibility of researchers’ work, and to a higher research impact. The positive impact of data publication is illustrated in Colavizza et al. (2020), who found that scientific publications for which the underlying data is available to receive more citations on average.

Metadata

Metadata is information that describes and gives meaning to data. It helps others find, understand, and re-use a dataset. Examples of metadata include the origin, time, location, creator(s), terms of use, and access conditions of a dataset. For more information on metadata, see this guide from CESSDA.

When you submit your data to a repository, you will be requested to provide information about your data in a standard format (i.e., metadata), often together with a README file that serves as a guide to your dataset.

When using Nord University's collection in DataverseNO, you will be asked to provide as much information as possible about your data; see this guide on metadata. Additionally, you will be asked to generate a README file with detailed information on your dataset, with the purpose of guiding others in interpreting, understanding, and re-using your data; see this guide on README files.

TIP: if you have used a README file template from DataverseNO for data documentation throughout your research project, your data archiving process will be easier and faster.