Research data management (RDM) is the practice of organizing and managing your data throughout a research project. As a graduate student you may wonder why RDM is relevant for you. The answer is simple: RDM is an essential element of the craft of research that you are introduced to during your studies at CBS. Whether you are doing qualitative or quantitative research, writing a literature review or a theory discussion, inevitably you will be creating digital files that need to be organized and managed. If you do not adhere to basic data management principles, you risk doing double work, wasting time searching for the latest version of a file or even losing all your work.
RDM ensures that your research data are well organized, documented, backed up, and securely stored throughout their entire lifecycle.
#Planning phase:
In this phase you plan the research design and how to manage your data. You create an overview of what kind of datasets you will be generating, investigate existing data sources, and consider how to collect, process, analyze, and share data during and maybe also after the project. You think about a useful folder structure for your project, a file naming strategy, and how to store and backup your data.
#Active phase:
In this phase you process, check, validate, and clean your data, you transcribe audio files and analyze the transcripts, pseudonymizing or anonymizing them whenever relevant (see the section on "Personal data & GDPR"). It is important to clearly describe and document all data processing steps. Safe storage and backup are paramount.
#Results sharing phase:
When you are about to wrap up your research project, you need to consider what data to delete (and when to delete them) and whether to publish the data, in full or in part. By publishing data, they can be re-used by others, for instance for follow-up research, research reviews, or for training and teaching purposes.
There are plenty of guides on good RDM practices for full-blown researchers floating around. However, as you are still in the middle of your training as an academic, you should only focus on the most important RDM practices. These involve:
1) Organizing your data
2) Saving your raw data
3) Documenting your data
4) Storing and backing up your data
5) Securing your data
#1 Organizing your data
Do any of these questions sound familiar?
There are means, ways, and techniques that can help you overcome these challenges, so that you will never have to go back and re-do your analysis just because you have misplaced a file.
Folder structure
File name
Versioning
Documenting decisions
#2 Saving your raw data
#3 Documenting your data
Documenting data means creating information which enables the correct interpretation of the data - independently of its creator. Data documentation includes a variety of documents that describe all data used in a project, including the nature of the project data, how it has been collected, what variables have been used, what the abbreviations mean, and how the data has been modified. It is a good idea to document how the data was processed and modified while you are still doing it - having to go back and fill in the gaps afterwards is complicated, and often impossible!
#4 Storing and backing up your data
Researchers often share horror stories of data loss. It is not an uncommon thing to happen, especially if you do not backup your data. Data recovery can be expensive, slow or even impossible. So, reliable backups are an important part of data management.
#5 Securing your data
• Consider password-protecting your data folders and files - especially when working with personal data.
• Encrypt your backup - especially when you use an external hard drive.
• Use CBS’ VPN client whenever using Wi-Fi off campus, especially when travelling with your data to non-EU countries.
Mareike Buss - mabu.lib@cbs.dk