Skip to content

Licensing Research Data

These guidelines are provided as best-effort, community-maintained guidance and do not constitute legal advice. Licensing requirements vary by institution, funder, and jurisdiction. If in doubt, consult your institution's research data or legal support team.

Why does my research data need a licence?

If you share data without a licence, the default position is "all rights reserved," which makes reuse legally uncertain and therefore unlikely. While individual facts are not protected by copyright, the structure, selection, documentation, and any associated database rights usually are. Adding a clear licence tells others exactly what they are allowed to do with your data, enabling reuse, citation, and compliance with funder and journal open research requirements. In short, a licence turns your data from "visible but unusable" into a reusable research output.

Which Licence should I use to share research data?

Research data: Use one of two Creative Commons licences: Use the CC-BY-Attribution licence as a default choice. This licence allows users to distribute, remix, tweak, and build upon a work, even commercially, as long as they give credit to the original creator of the work. If you want any derivatives of data to be shared under identical terms, i.e. the licence is "infectious", use the CC-SA Share-Alike licence. Databases: Use one of two Open Data Commons licences: Use the ODC-By licence as a default choice. This licence allows users to freely share, modify, and use a database providing that they include a notice to the original database. If you want any derivative databases to be shared under identical terms, i.e. the licence is "infectious", use the OCD-ODbL licence.

Do I need a data licence or a database licence?

In most cases, you will license a dataset (e.g. a CSV file of experimental results, a collection of spectra, a table of model training parameters) using a standard data licence. You usually only need to consider a database licence if you are publishing a structured, searchable collection of data (e.g. a relational database, curated online resource, or API-driven platform). In practice, if you are depositing a fixed collection of files in a repository, treat it as a dataset. If you are releasing a maintained data platform or queryable system, treat it as a database to ensure the licence covers database rights as well.

Where should I share research data?

A good repository should make your data discoverable, citable, and preserved over the long term.

General purpose repositories: Research data archives like Zenodo are a good default choice: every deposit gets a DOI, making your dataset easy to cite, and the platform guarantees long-term preservation (currently tied to the lifetime of CERN with at least a 20-year horizon) and metadata harvesting for discovery.

Discipline specific repositories: Chemistry- or AI-specific repositories can be preferable because it maximises findability within the research field. For machine-learning-focused data, community platforms like Kaggle or Hugging Face Hub can offer useful tools and visibility for models and dataset exploration, but they don't inherently provide DOIs or the same long-term preservation guarantees as Zenodo. So, you can use them in addition to a proper research archive rather than as your primary deposit.

Institutional repositories: Institutional research data repositories provide an alternative to larger general-purpose repositories. Depositing here usually ensures your dataset receives a DOI and benefits from institutional preservation and support. This is often the most straightforward choice if you do not have a suitable discipline-specific repository, or if you want your data formally linked to your university's research outputs and reporting systems.