Skip to Main Content

A Guide to Working with Integrity as a CBS Student: Research Data Management

Research Data Management

Research data management (RDM) is the practice of organizing and managing your data throughout a research project. As a graduate student you may wonder why RDM is relevant for you. The answer is simple: RDM is an essential element of the craft of research that you are introduced to during your studies at CBS. Whether you are doing qualitative or quantitative research, writing a literature review or a theory discussion, inevitably you will be creating digital files that need to be organized and managed. If you do not adhere to basic data management principles, you risk doing double work, wasting time searching for the latest version of a file or even losing all your work. 

 

 

 

RDM ensures that your research data are well organized, documented, backed up, and securely stored throughout their entire lifecycle. 

Research Data Lifecycle

 

#Planning phase:
In this phase you plan the research design and how to manage your data. You create an overview of what kind of datasets you will be generating, investigate existing data sources, and consider how to collect, process, analyze, and share data during and maybe also after the project. You think about a useful folder structure for your project, a file naming strategy, and how to store and backup your data.

    
#Active phase:
In this phase you process, check, validate, and clean your data, you transcribe audio files and analyze the transcripts, pseudonymizing or anonymizing them whenever relevant (see the section on "Personal data & GDPR"). It is important to clearly describe and document all data processing steps. Safe storage and backup are paramount.

 
#Results sharing phase:
When you are about to wrap up your research project, you need to consider what data to delete (and when to delete them) and whether to publish the data, in full or in part. By publishing data, they can be re-used by others, for instance for follow-up research, research reviews, or for training and teaching purposes. 

 

Basic Research Data Management

There are plenty of guides on good RDM practices for full-blown researchers floating around. However, as you are still in the middle of your training as an academic, you should only focus on the most important RDM practices. These involve:

1) Organizing your data
2) Saving your raw data
3) Documenting your data
4) Storing and backing up your data
5) Securing your data
 

#1 Organizing your data

Do any of these questions sound familiar?

There are means, ways, and techniques that can help you overcome these challenges, so that you will never have to go back and re-do your analysis just because you have misplaced a file. 
 

Folder structure

  • Think about the folder structure for your project upfront. If you have different projects, create different folders for each project and create relevant subfolders.
  • The folder structure should be logical and intuitive for you and your project group. 
  • Avoid very deep folder structures since they can be difficult to navigate (more than five levels are considered a deep folder structure).


File name

  • Create a system for naming files and folders. Start simple and create a couple of rules for naming your files. Keep in mind that there is no general set of rules that will work for everyone. What matters is that you have rules that you will be able to stick to in a consistent way.  
  • Keep file and folder names as short as possible but still meaningful. 25 characters are usually considered maximum.
  • Formatting dates as YYYYMMDD or YYYY-MM-DD allows you to sort and search your files.
  • Use underscores _ instead of spaces.
  • Avoid using special characters such as % & / \ : ; * . ? < > ^! " () and Scandinavian letters.
  • Be very careful NEVER to use personal data when naming files and folders.

 
Versioning

  • Indicate version numbers by using 'V' or 'version' and three digits (or 4 digits, if you have a very large file portfolio) i.e. 001, 002…….201, 202 (NOT: 1, 2, 21).
  • Keep back-ups of different versions to make it easy for you to revert to a previous version if relevant.
     

Documenting decisions

  • Document all your decisions regarding folder structure, file naming and versioning in a README file in the uppermost project folder. A README file is a simple txt-file (README.txt) that explains the organizing logic of the project folder, the logic of the file names, the abbreviations used and the versioning logic. README files are useful to help your future self - as well as current and future collaborators - understand what you are doing. README files are essential when working in a project group.

 
#2 Saving your raw data
 

  • Save your raw data read-only in its original format in a separate folder, so you know you can always go back to check it against your processed data, and to be able to keep track of the progression in your project.
  • Include a short README file with relevant descriptions of the data (What? When? Where? Who? For what purpose? How?) in the folder with the raw data.
  • Check that the original data content is preserved when data is exported from one system, format, or location to another, e.g. from a survey program to a statistical software or from a voice recorder to a transcription software.
  • Create a working copy of your raw data for processing and analysis.

 


#3 Documenting your data
 
Documenting data means creating information which enables the correct interpretation of the data - independently of its creator. Data documentation includes a variety of documents that describe all data used in a project, including the nature of the project data, how it has been collected, what variables have been used, what the abbreviations mean, and how the data has been modified. It is  a good idea to document how the data was processed and modified while you are still doing it - having to go back and fill in the gaps afterwards is complicated, and often impossible! 

  • Describe your data in detail and give your variables clear names: Data without good metadata is useless.
  • Document each processing step in a README file that is stored together with the data. The README provides important information about the data files to ensure they are interpreted correctly.
  • For any kind of data processing always use CBS-licensed software, e.g. Qualtrics, NVivo, and Stata. Whenever CBS signs up with a new software provider, that provider needs to sign a data protection agreement that makes it their responsibility to ensure that no data winds up in the hands of an irrelevant third party. 

 

#4 Storing and backing up your data
 
Researchers often share horror stories of data loss. It is not an uncommon thing to happen, especially if you do not backup your data. Data recovery can be expensive, slow or even impossible. So, reliable backups are an important part of data management.

  • Ideally, you follow the 3-2-1 backup method: Create three copies of the data (one active copy and two backups) on two different media with one copy off-site. 
    • Storing locally + on an external medium + in the cloud (three copies):
      • Local hard drive
      • External hard drive (at home, at least password-protected or encrypted)
      • MS OneDrive
  • If you find this difficult to manage, create at least two copies on two different media: one active, local copy and one backup in the cloud.
  • Create a backup routine that works for you and stick to it, e.g. every evening, every other day, once a week, once a month. Balance the inconvenience of having to do the backup and the inconvenience of losing your work between backups.
  • NEVER store personal or confidential information on your personal account in a commercial cloud solution (e.g. Dropbox and Google Drive)


#5 Securing your data
 
•    Consider password-protecting your data folders and files - especially when working with personal data.
•    Encrypt your backup - especially when you use an external hard drive. 
•    Use CBS’ VPN client whenever using Wi-Fi off campus, especially when travelling with your data to non-EU countries.

Section Author

Mareike Buss - mabu.lib@cbs.dk

CBS Library, Solbjerg Plads 3, DK-2000 Frederiksberg, Denmark

Homepage | Addresses and Opening Hours | Contact