Document your research data

Because data are rarely self-explanatory, all research data should be accompanied by metadata (information that describes the data according to community best practices).  Metadata standards vary across disciplines, but generally state who created the data and when, how the data were created, their quality, accuracy, and precision, as well as other features necessary to facilitate data discovery, understanding and reuse.

Careful attention to file naming and tracking is important for the sharing and re-use of data, as well as being able to keep track of one’s work. Here are the UK Data Archive’s guide for long term preservation of data, and the ANDS Guide on File Formats. Any restrictions on use of the data must be explained in the metadata, along with information on how to obtain approved access to the data, where possible.

A new version of a dataset may be created when the dataset is updated in a number of ways, including when data is updated or appended.The Australian National Data Service (ANDS) provides some helpful tips and suggestions for data versioning. ANDS provides several suggestions for numbering systems, as well as tools for data versioning such as Git (and Github) as well as ArcGIS.

What documentation will be needed for the data to be read and interpreted correctly in the future?
  • Typically, good documentation includes information about the study, data-level descriptions, and any other contextual information required to make the data usable by other researchers.  Other elements you should document, as applicable, include: research methodology used, variable definitions, vocabularies, classification systems, units of measurement, assumptions made, format and file type of the data, a description of the data capture and collection methods, explanation of data coding and analysis performed (including syntax files), and details of who has worked on the project and performed each task, etc.
  • This is an example of what data documentation might look like for a codebook.
How will you make sure that documentation is created or captured consistently throughout your project?
  • Consider how you will capture this information and where it will be recorded, ideally in advance of data collection and analysis, to ensure accuracy, consistency, and completeness of the documentation.  Often, resources you’ve already created can contribute to this (e.g. publications, websites, progress reports, etc.).  It is useful to consult regularly with members of the research team to capture potential changes in data collection/processing that need to be reflected in the documentation.  Individual roles and workflows should include gathering data documentation as a key element.
If you are using a metadata standard and/or tools to document and describe your data, please list here.
  • There are many general and domain-specific metadata standards.  Dataset documentation should be provided in one of these standard, machine readable, openly-accessible formats to enable the effective exchange of information between users and systems.  These standards are often based on language-independent data formats such as XML, RDF, and JSON. There are many metadata standards based on these formats, including discipline-specific standards.
  • Background from UBC and DCC.
  • York University Libraries provide best practices for metadata to accompany a digitization initiative.

Note: Much of the text on this page can be attributed to the DMP Assistant, licensed under a CC0 license.