What is open data?

Open data is research data that is freely available on the Internet for anyone to download, modify, and distribute without any legal or financial restrictions.

Open data is:

  • Available: the data should be in whole, downloadable from the Internet, with no costs apart from reproduction fees
  • Accessible: the data should be provided in a convenient form that can be modified
  • Reusable: this should be expressed under terms provided with the data
  • Redistributable: the data can be combined with data from other research
  • Unrestricted: everyone can use, modify, and share the data, regardless of how they use the data (e.g. for commercial, non-commercial, or educational purposes)

The Open Data Handbook provides the following definition of Open Data:

“Open data is data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike.”

Open data exists in many forms such as datasets, survey results, and metadata. Data should exist in a form that can be used to duplicate and verify research findings. Open data is structured data and machine-readable. Therefore Open data policies often permit data to be accessed by machines for extraction, modification, and analysis. Open data does not include personal data about individuals.

A disciplinary view of open data

Open data can relate to various disciplines and uses. While hard sciences data includes computer modeling, simulations, and laboratory measurements, data generated by the social sciences may include includes demographics, economic indicators, survey and interview results. It is important to remember that data can take many general forms, including images, numerical measurements, software and code.

Table source: MIT Data Strategies for Data Sharing and Storage presentation

Why is open data important ?

A number of benefits can be realized through the open sharing of data:

  • Increases reproducibility of research
  • Promotes future research growth
  • Supports research integrity
  • Prevents duplication and loss of research
  • Provides opportunities for collaboration
  • Strengthens the economy
  • Is recognized as an important aspect of research across many research communities.

Open Data can be used to support a number of disciplines and applications including science, culture, finance, statistics, weather, and the environment. It can strengthen and sustain any research field that produces data. This is especially true (yet not limited to) the field of science which depends on the reuse and criticism of published research. The tasks and functions of many organizations depend on Open Data that is accessible and available for reuse.

Open Data enables data to be interoperable since different researchers and organizations can share and work together on datasets. Sharing data promotes openness and increases communication and interoperability among organizations,while increasing possibilities for further research.

Challenges

Challenges to making data openly available include:

  1. Labelling. Open data is only useful if it is clearly identified as open. Improper labelling limits its benefits. Labelling of data ensures that researchers are acknowledged when their data is reused and distributed.
  2. Licensing. Data on publisher’s websites and in data repositories should be clearly defined as available for free and unrestricted access, redistribution and reuse. A publisher or researcher may state that the publisher or researcher reserves rights to all data, or that there is an open knowledge license to the data. See Open Definition for a list of licences.
  3. Research ethics restrictions. When making data openly available one must consider the safeguarding of information, obtaining consent and the secondary use of identifiable information, and how identifiable information is handled during data linkage. The Tri-Council Policy Statement TCPS 2 (2014): Ethical Conduct for Research Involving Humans promotes best practices for research and the use of research data in Canada. The Statement outlines important recommendations about privacy and confidentiality.

How does one make open data available?

There are two steps to making one’s data open:

(1) Make the data publicly available through a data repository or publisher’s website.

Open data can be made available by posting it to a data repository. York University Libraries makes available two repositories for data deposit: Scholars Portal Dataverse and YorkSpace. Additional options for multi-disciplinary data repositories include Figshare, and re3data.org where searches can be filtered by a number of dimensions, including Country and Subject area.

(2) Assigning an open data license. This is required even if the data is for ‘Public Domain’.

To make data open, the data must be licensed. There are two types of license:

The Open Data Commons outlines the steps one must take to make data open, including how to select a license. Open Data Commons offers legal advice for the open knowledge community about open data, and is an Open Knowledge Foundation not-for-profit project run by its Advisory Council. The project introduced the first open data license in 2008: the Public Domain Dedication and License (PDDL).

The Panton Principles

Some useful considerations to make when making one’s data open are offered by the Panton Principles, Principles for open data in science. Developed by researchers and the Open Knowledge Foundation Working Group on Open Data in Science, these principles can be applied to many research fields:

  1. “When publishing data make an explicit and robust statement of your wishes.”
  2.  “Use a recognized waiver or license that is appropriate for data.”
  3. “ If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.”
  4. “Explicit dedication of data underlying published science into the public domain via PDDL (Public Domain and Dedication Licence) or CCZero (Creative Commons Zero Waiver) is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition.”

Resources

  • Open Data Handbook is a guide on Open Data definitions and principles.
  • The Government of Canada: Open Data 101 contains an overview of the uses of Open Data in Canada, its benefits and principles.
  • Guide to Open Data Licensing provides advice on how to license data in Canada and other regions, and the types of rights to assign data.
  • Creative Commons is a global network of researchers and advocates that provide resources and support about approaches to copyright and intellectual property. CC provides free copyright licenses, in addition to information about Open Data, and on using  Creative Commons to share data.
  • Open Science was developed by the Creative Commons as a guide to Open Data in science.
  • Open Knowledge International is a global not-for-profit organization advocating the use of Open Data. Its international network of members and organizations includes the Open Science Working Group. The group shares support and advice on the use of Open Data in science.
    The Panton Principles are principles for Open Data in Science. The principles were developed by members of the scientific research community in 2009, and refined by members of the Open Knowledge Foundation Working Group in 2010.  
  • Open Data Commons outlines the steps one must take to make data open, including how to select a license.
  • SPARC (the Scholarly Publishing and Academic Resources Coalition) advocates the open sharing of research in academic and research libraries in Canada and the U.S. The website contains resources about Open Data, and case studies on it uses.