Reference
Last updated on 2026-03-24 | Edit this page
Glossary
- CAS Registry Number
- A unique numerical identifier assigned by the Chemical Abstracts Service to every chemical substance. Proprietary – not ideal for FAIR data.
- Chemotion
- An open-source electronic lab notebook (ELN) and repository ecosystem for chemistry, maintained by NFDI4Chem. The ELN supports structure drawing, reaction schemes, and direct export to the Chemotion Repository.
- CIF (Crystallographic Information File)
- A standard file format for representing crystallographic data, managed by the International Union of Crystallography.
- CML (Chemical Markup Language)
- An XML-based format for representing molecular and chemical data.
- Creative Commons (CC)
- A set of copyright licences that allow creators to grant permissions for reuse. Common variants include CC0, CC-BY, CC-BY-SA, and CC-BY-NC.
- CSD (Cambridge Structural Database)
- A curated collection of over one million small-molecule crystal structures, managed by the CCDC. Deposition is mandatory for most crystallography journals.
- Data Access Statement
- A statement included in a publication that describes how the supporting research data can be accessed. Required by UKRI and many publishers.
- Data Management Plan (DMP)
- A document that describes how data will be collected, organised, stored, shared, and preserved during and after a research project.
- DOI (Digital Object Identifier)
- A persistent identifier used to uniquely identify a dataset, publication, or other digital object. Makes data citable and discoverable.
- ELN (Electronic Lab Notebook)
- A digital replacement for a paper lab notebook, offering features such as searchability, automatic timestamping, and data linking.
- FAIR Principles
- A set of guiding principles for research data management: Findable, Accessible, Interoperable, and Reusable.
- InChI (International Chemical Identifier)
- A machine-readable, non-proprietary textual identifier for chemical substances, maintained by IUPAC.
- InChIKey
- A fixed-length hash of an InChI string, designed for web searching and database lookups.
- ioChem-BD
- A computational chemistry repository that accepts DFT and molecular dynamics input/output files from codes including Gaussian, ORCA, CP2K, and VASP.
- IUPAC FAIRSpec
- An IUPAC specification defining metadata standards for FAIR management of spectroscopic data in chemistry.
- JCAMP-DX
- An open standard file format for spectroscopic data (NMR, IR, MS, UV-Vis, Raman). A plain-text format that embeds metadata in a structured header.
- Metadata
- Data about data. Descriptive information that provides context for a dataset, making it findable, interpretable, and reusable.
- mzML
- An open standard file format for mass spectrometry data.
- NFDI4Chem
- Germany’s national initiative for chemistry research data infrastructure. Maintains the Chemotion ecosystem and a detailed knowledge base of chemistry RDM best practice at knowledgebase.nfdi4chem.de.
- NOMAD
- An open repository for computational materials science data, with strong metadata standards for DFT and molecular dynamics output.
- ODC-By / ODbL (Open Data Commons)
- Licences designed specifically for databases. ODC-By requires attribution; ODbL additionally requires derivative databases to remain open.
- PSDI (Physical Sciences Data Infrastructure)
- A UK initiative supporting researchers in chemistry and materials science with data management tools, guidance, and infrastructure.
- re3data
- A global registry of research data repositories, useful for finding domain-specific repositories. Available at re3data.org.
- SMILES (Simplified Molecular Input Line Entry System)
- A widely used line notation for representing molecular structures as text strings.