Management of research data over the course of a project

Storage and sharing of active research data

Decisions regarding the storage and sharing of active research data (i.e., data being collected and analyzed over the course of a project) are important and should be taken in the initial phases of your research project. It is also important to reflect on the risks and potential consequences of data loss and unauthorized data access and to take steps to avoid these. Keep in mind, however, that there may be a need to reconsider these decisions as you progress with your project.

If the research data includes personal information, the routine for privacy in research must be followed.

The IT department has guidelines on how to store and share active research data:

- See «Retningslinjer for klassifisering av informasjon ved Nord universitet» for information on how to classify data based on their sensitivity level and criticality.
- See also «Veiledning klassifisering av filer og epost» for a step-by-step guide on how to classify and protect your files and e-mails.
Contact the Information Security Adviser (Per Gustav Gården: per.garden@nord.no) and the Data Protection Officer (Toril Irene Kringen: toril.i.kringen@nord.no) at Nord University for help with the classification of your research data.
- «Lagring av forskningsdata ved Nord universitet»: offers guidance on where to store your active research data according to their sensitivity level and according to whom the data are shared with (i.e., internal or external sharing).
- «Lagringsguide for Nord Universitet»: describes which tools and storage platforms that can be used given the sensitivity level of the information.
- TSD is a platform where sensitive research data can be collected, stored and analyzed in a secure environment.
- Nord has an agreement for the use of TSD, which is delivered by University of Oslo (UiO). TSD meets the requirements for the processing and storage of sensitive research data. TSD is a full set of services, from data collection to analysis, processing, and storage, all in a secure environment.
- TSD is used for data classified as Strictly confidential information (black data) or for large amounts of Confidential information (red data), which may cause significant harm to public interests, the university, individuals or others if the information becomes known to unauthorized persons.
- UiO offers support and a user guide for TSD: link.
- TSD has an annual price per project. See TSD prices here.
- Internal course material about TSD is available here.

Documentation of research data

Consistent data documentation throughout a research project is a good practice. An important goal is that your future self and others can understand your data. Here are some basic recommendations regarding data documentation:

Start early and use your data management plan to get started.
Consider which information is necessary to understand your data, and document consistently throughout your project.
Create a documentation file for each of your datasets, and provide enough information about your data so that they are understandable. Remember to clearly identify your documentation files so that there is a clear connection between them and the corresponding datasets. Consider one of these README file templates from DataverseNO for data documentation.
Consider your archiving options from the start of your project.

Source: based on CESSDA

Manage copies and versions of your data

Good practices include the following:

To keep your raw data untouched, saved, and stored in an appropriate location, and identified accordingly. Use copies of the raw data to run your analyses. An important reason for keeping your raw data untouched is that you may need to share them with a journal in a peer-review process. Reviewers may ask for your raw data so that they can reproduce your analyses and confirm your results.
Select one place for your master copies, while copies made in other places should be temporary and placed back or synchronized with the master copies regularly.
To set up a strategy for version control. This helps you keep track of changes in your data. A simple strategy is to include a version control number at the end of your file names. For example V1.0, V1.1, V2.0, and V3.0 where ordinal numbers indicate major changes and decimals indicate minor changes. What constitutes a major or minor change can vary according to the nature of a dataset.

Source: based on the guide "Storing and preserving data" by Utrecht University

Folder structure and file name

Regarding folder structure, two general recommendations are:

To find a balance between a flat and a deep structure. In a flat structure with very few levels, you will likely have files with little in common in the same place. In a deep structure with too many levels, you will likely end up with fragmented information and will use more time than needed to find your files.
To re-visit your folder structure throughout your project in order to fine-tune it. It is likely that your project will have changing needs in this respect. Therefore, plan ahead to have a flexible and scalable structure so that you can rearrange or expand it without having to redesign it completely.

Source: based on the guide "Storing and preserving data" by Utrecht University

Regarding file naming, some fundamental recommendations are:

Files must be named consistently.
File names must be descriptive, but short (< 25 characters).
Do not use spaces. Instead, use underscores (e.g. first_study), hyphens (e.g. first-study) or camel case (FirstStudy).
Avoid characters like \ / ? : * ” > < | : # % ” { } | ^ [ ] ` ~ æÆ øØ åÅ äÄ öÖ …
Use the international dating convention YYYY-MM-DD.

Source: DataverseNO

File formats

Throughout your project, you will likely use file formats such as Word (.docx) and Excel (.xlsx). However, as you approach project completion and plan to archive your data, you may want to create a copy of your data files in a format with the following characteristics:

non-proprietary;
open, with documented international standards;
using standard character encoding, preferably Unicode (e.g. UTF-8);
uncompressed (space permitting).

Doing so will enable continued access to your data, by yourself and others, as file formats with the above-mentioned characteristics are more likely to allow long-term readability. For a full list of preferred file formats, see DataverseNO deposit guidelines.