Data storage, sharing, security, and preservation

Planning how research data will be stored and backed up throughout and beyond a research project is critical in ensuring data security and integrity. Appropriate storage and backup not only helps protect research data from catastrophic losses (due to hardware and software failures, viruses, hackers, natural disasters, human error, etc.), but also facilitates appropriate access by current and future researchers.

Data may be stored in a number of ways. Each storage method has benefits and drawbacks that should be considered when determining the most appropriate solution.

What are the anticipated storage requirements for your project, in terms of storage space (in megabytes, gigabytes, terabytes, etc.) and the length of time you will be storing it?
  • Storage-space estimates should take into account requirements for file versioning, backups, and growth over time.
  • If you are collecting data over a long period (e.g. several months or years), your data storage and backup strategy should accommodate data growth. Similarly, a long-term storage plan is necessary if you intend to retain your data after the research project.
How and where will your data be stored and backed up during your research project?
  • The risk of losing data due to human error, natural disasters, or other mishaps can be mitigated by following the 3-2-1 backup rule:
  1. Have at least three copies of your data.
  2. Store the copies on two different media.
  3. Keep one backup copy offsite
  • Data may be stored using optical or magnetic media, which can be removable (e.g. DVD and USB drives), fixed (e.g. desktop or laptop hard drives), or networked (e.g. networked drives or cloud-based servers). Each storage method has benefits and drawbacks that should be considered when determining the most appropriate solution.
  • UIT Storage (York University)
    On campus, UIT provides secure storage solutions that meet and exceed the 3-2-1 backup rule. Each file saved to UIT storage is copied three times, resulting in four separate copies. These copies reside in two separate data centres on campus, with the fourth copy stored at the University of Guelph.
How will the research team and other collaborators access, modify, and contribute data throughout the project?
  • An ideal solution is one that facilitates co-operation and ensures data security, yet is able to be adopted by users with minimal training. Transmitting data between locations or within research teams can be challenging for data management infrastructure. Relying on email for data transfer is not a robust or secure solution. Third-party commercial file sharing services (such as Google Drive and Dropbox) facilitate file exchange, but they are not necessarily permanent or secure, and are often located outside Canada. Please contact your Library to develop the best solution for your research project.
What are the options for long-term preservation?

Data preservation will depend on potential reuse value, whether there are obligations to either retain or destroy data, and the resources required to properly curate the data and ensure that it remains usable in the future. In some circumstances, it may be desirable to preserve all versions of the data (e.g. raw, processed, analyzed, final), but in others, it may be preferable to keep only selected or final data (e.g. transcripts instead of audio interviews).

  • Scholars Portal Dataverse is available to York researchers and can serve preservation needs where file sizes are less than 2 GB.
  • Larger projects will need to contact UIT to discuss the ongoing cost of long-term preservation. It is prudent that these costs be written into the grant budget.
  • For other deposit options, including discipline specific repositories, see the re3data.org directory.

Some data formats are optimal for long-term preservation of data. For example, non-proprietary file formats, such as text (‘.txt’) and comma-separated (‘.csv’), are considered preservation-friendly. Preservation-friendly files converted from one format to another may lose information (e.g. converting from an uncompressed TIFF file to a compressed JPG file), so changes to file formats should be documented.

Note: Much of the text on this page can be attributed to the DMP Assistant, licensed under a CC0 license.