Implementation
General guidance on handling research data throughout the research process. For specific questions, please feel free to contact our team.
A uniform, systematic naming system ensures the clarity and comprehensibility of your data. File, directory and variable names, as well as the names of table columns and rows, should be formulated so that they are as meaningful, unambiguous, and short as possible. Information that applies to all data in a folder can be expressed in the folder name and need not be repeated in the file name. Models for file naming:
- Date as YYYY-MM-DD or YYYYMMDD (ISO 8601)
- Person abbreviation of the processing person
- Location (for on-site surveys; field research)
- Versioning (e.g. v01, v02; the zero has to be put in front for sorting reasons in list view)
- Suitable for file names: A-Z a-z 0-9 Hyphen (-) Underscore (_) (Applies to naming in Latin script).
- Avoid umlauts, spaces, and special characters other than the ones mentioned above
To prefer | To avoid |
website-texts-2020-05-v15.docx | Website-Texts-May-finalFinal2_new.docx |
Digitized_XY-ZZ_E-1_M-296778.tiff Digitized_XY-ZZ_E-2_HT-493887.tiff Digitized_XY-ZZ_M-5_LS-345-c.tiff (Scheme: Digitized_ArchiveID_Callnumber_ArchivaliaID.tiff) |
Image0001.tiff Image0002.tiff Image0003.tiff |
Survey_OpenAccess_Cleaned_v04.csv | OA-Results-clean-4.csv |
For long-term storage and use over long periods of time, "open" formats that can be read, opened and used without restriction are particularly suitable.
Format recommendations for long-term archiving
File format |
To prefer |
To avoid |
Table |
CSV, TSV, SPSS portable, XLSX |
XLS |
Text |
TXT, HTML, RTF, PDF/A, DOCX |
DOC, PPT, PDF |
Multimedia |
Container: MPEG4, MKV Codec: Theora, Dirac, FLAC |
QuickTime, Flash |
Images |
TIFF, JPEG2000, PNG |
GIF, JPG |
(from Dolzycka, Dominika, Biernacka, Katarzyna, Helbig, Kerstin, & Buchholz, Petra. (2019). Train-the-Trainer Konzept zum Thema Forschungsdatenmanagement (Version 2.0). Zenodo. http://doi.org/10.5281/zenodo.2581292 , p. 84)
Further information: Data Formats for Preservation on openaire.eu and Formate erhalten (German only) on forschungsdaten.info.
A concise name should be chosen that is indicative of the content. Information (e.g., the year of the survey) that applies to all the data contained can already be expressed here in order to keep the individual titles short. Common folder structures are e.g. survey period, location, file format, or processing status. A directory hierarchy with no more than three levels is usually clear.
Metadata is additional structured information that describes the specific (research) data. They are used to publish data, make it findable, and cite it. Metadata for research data usually contain technical, legal, and administrative information (such as data volume, data format, and licenses). They might also include descriptive and subject specific information (such as data author, title, subject, short description and keywords). The description can be based on controlled vocabularies and thesauri, which ensure better findability. Subject-specific vocabularies can be researched on Bartoc.org.
Some disciplines have established specific documentation and metadata standards (see, for example, the overviews of the Research Data Alliance and the Digital Curation Centre). If these are not available in your field of research, generic standards, such as the DataCite Metadata Standard, are a good choice.
The repository of the Freie Universität Berlin Refubium allows the following metadata fields for your research data in addition to extensive voluntary information: Author of the data, main title, year of publication, department/institution, language, resource type, abstract, Dewey Decimal Classification, and free keywords.
This structured data description is usually created during the research process as an additional file in a table or in a database or – if possible – inserted within the data to be described (e.g., TEI-XML, TIFF).
In addition to maintaining metadata in German, it is also advisable to produce English-language metadata so that the data can be found, understood, and reused internationally when published. If the data itself is available in another language, this language should also be used in the metadata.
Related Links
- Metadata Standards Overview from the Research Data Alliance
- Metadata Standards Overview from the Digital Curation Centre
- DataCite Metadata Standard
Together with the metadata, the documentation provides all the information that ensures the traceability, interpretation, and possible reproduction of the research data or research results. It provides information about what was collected, gathered, or processed by whom, how, with what, why, when, where, and in which context.
The documentation includes the exact description of data generation, processing and indexing as well as the relevant methods and tools (software). This can be in the form of codebooks, (digital) lab books, descriptions of the research design, edition guidelines, or other descriptive documents.
A README file in Markdown or plain text can provide a quick and easy method for data documentation and should be stored together with the data.
With the Sync & Share service, ZEDAT allows you to store data in the cloud (Box.FU) and share it with external parties via a link.
OnlyOffice, which is integrated into Box.FU, enables joint online editing to work collaboratively on files such as text documents, spreadsheets, and presentations. An additional integrated rich text editor makes it possible to work on minutes and notes as a team.
Box.FU is not suitable for sensitive data or data requiring protection (see the IT Security Working Group's guideline on outsourcing data to the cloud).
Recommendations for the security of research data
- Use the user management function in your respective operating system.
- Lock your screen when you are absent.
- Keep the operating system and applications up to date.
- Use security measures such as antivirus programs and firewalls, if necessary.
- Use strong passwords.
- Use secure communication channels (e.g., via digital certificates for secure e-mail communication).
- Check the privacy policies of services used, especially when collecting and processing sensitive data.
- Pay attention to the server locations of international or commercial service providers (see also guideline on outsourcing data to the cloud).
- Check systems and storage devicesfor encryption (e.g., with tools such as VeraCrypt).
Further information: IT security policy of Freie Universität Berlin (in German)
To reduce the risk of data loss due to software or hardware failure, virus attack or hacking, and human error, regular data backups should be performed. The network drives offered by ZEDAT include automated procedures. In addition, special backup software and the backup options of the respective operating systems are available. Copies of data backups should be stored in multiple locations.
Related Links
For collaborative work as well as for multilevel or iterative data generation, modification, and processing, the use of a version control system that saves changes to files with a timestamp, author information, and a change note is a good idea. Earlier versions can thus be easily restored, and change histories can be documented and visualized.
An standard system for version control, for example, is Git. For collaborative work in groups or in the context of a project, as well as for the central management of multiple projects, solutions such as GitLab or GitHub are available. Various departments at Freie Universität currently have their own GitLab sites.
This page was last edited on 10 February 2022. Unless otherwise noted, this work is licensed under a Creative Commons Attribution 4.0 International License.