Why Does Research Data Management Matter?
- Research data includes everything needed to validate your findings — not just the final published dataset.
- Research data management covers the full lifecycle: planning, collecting, storing, sharing, and preserving data.
- Poor data management leads to data loss, retractions, wasted time, and irreproducible research.
- The reproducibility crisis is real and affects chemistry — but good RDM practices directly address its root causes.
- Good data management is not difficult; it requires some upfront planning and consistent habits.
Planning Your Data: DMPs, Budgeting, and Funder Policies
- A Data Management Plan describes how you will organise, store, share, and preserve your data. Write one at the start of every project.
- EPSRC requires metadata to be publicly available within 12 months of data generation, a data access statement in every publication, and data preserved for 10 years.
- EPSRC does not require a DMP at application stage, but all other UKRI councils do – check your specific funder’s requirements via DMPonline.
- RDM costs are allowable on most UK grants. Budget for storage, staff time, and repository costs from the start.
- It is always cheaper to manage data well from the start than to sort it out later.
FAIR Data Principles
- FAIR stands for Findable, Accessible, Interoperable, and Reusable: It is a framework for making data useful beyond its original context.
- FAIR is not the same as open: data can be FAIR with access restrictions, and open without being FAIR.
- Metadata should always be as open as possible, even when the data itself cannot be shared freely.
- Most researchers’ current data scores poorly against FAIR principles – the goal is incremental improvement, not perfection.
- The highest-impact steps are depositing data in a repository and applying a licence.
Data Storage, Security, and Organisation
- Use descriptive, consistent file names with ISO 8601 dates, no spaces, and explicit version numbers. Avoid “final”.
- Keep raw data in a dedicated, read-only folder, separate from processed data and analysis.
- A single copy is a single point of failure, ensure at least one copy is stored offsite or in managed cloud storage.
- Use institutional research data storage as your primary backup, it is more reliable and funder-compliant than personal solutions.
- Portable storage and consumer cloud services are useful but are not substitutes for institutional storage.
- Sensitive data requires access controls and encryption, check your institution’s policy.
Sharing, Preserving, and Licensing Your Data
- Share your data via a repository: it makes your work citable, discoverable, and preserved long-term.
- Choose domain-specific repositories first; Zenodo is a good general-purpose fallback.
- Always apply a licence. Without one, the default is “all rights reserved” and others cannot safely reuse your data. CC-BY is the recommended default for research datasets; use Open Data Commons licences (ODC-By) for structured databases.
- All UKRI-funded publications require a data access statement, even when there is no associated data.
- Prefer open file formats and include documentation to ensure data remains usable for the required ten-year preservation period.
The Reproducibility Crisis in Chemistry
- Chemistry is fundamentally a reproducible science, but inadequate methods reporting and poor data management undermine this in practice.
- Experimental details that seem obvious to the author – catalyst loading, solvent ratio, reagent purity – are often critical for reproduction and routinely absent from publications.
- The same problem applies in computational chemistry: functional, basis set, software version, and input files must be archived and shared, not discarded after publication.
- Publication pressure and selective reporting compound the problem: failure modes rarely appear in the literature.
- Good RDM practices – recording metadata systematically in an ELN, keeping raw data, documenting processing steps – directly address the most preventable causes of irreproducibility.
Electronic Lab Notebooks for Chemists
- Electronic lab notebooks improve searchability, backup, data linking, and collaboration compared to paper notebooks.
- For chemistry, look for ELNs with structure drawing, reaction scheme capture, stoichiometry tools, and analytical data integration.
- Data portability is critical – ensure you can export all records in a usable format before committing.
- Check what your institution already provides before evaluating tools independently.
- Chemotion ELN and eLabFTW are widely used open-source options with good chemistry support.
Metadata and Chemical Data Standards
- Metadata transforms raw data files into interpretable, reusable scientific records – it is the practical foundation of FAIR data in chemistry.
- Use InChI or InChIKey as the primary identifier for compounds in data records; SMILES is also widely supported. Avoid relying on CAS numbers alone.
- Record technique-specific metadata systematically: field strength and solvent for NMR, ionisation method for MS, radiation source for XRD.
- Export data to open formats (JCAMP-DX, CIF, mzML) for deposition and sharing alongside proprietary raw files.
- The IUPAC FAIRSpec standard defines a community-agreed metadata schema for spectroscopic data in chemistry.
Chemistry Data Repositories and Databases
- Domain-specific repositories provide better discoverability and community-standard validation than general-purpose alternatives – use them as your first choice.
- Key repositories for chemistry include Chemotion (synthetic/spectroscopic), CSD (crystal structures), NOMAD and ioChem-BD (computational), and Zenodo (general fallback).
- When a paper involves multiple data types, split the deposit across appropriate repositories – each with its own DOI – rather than pooling everything in a general-purpose archive. List all DOIs in the data access statement.
- Crystal structure deposition in the CSD is mandatory for most crystallography journals.
- re3data.org and FAIRsharing.org are the best starting points for finding a repository appropriate for your data type.
- NFDI4Chem provides freely accessible guidance and tools for chemistry data management regardless of where you are based.
Managing Data from Common Chemistry Techniques
- Record metadata for every analytical measurement at the time of acquisition – it cannot be reliably reconstructed later.
- For NMR, always save the raw FID alongside the processed spectrum, and record field strength, nucleus, solvent, pulse sequence, and reference compound as a minimum.
- A
README.mdanddata-dictionary.csvat the root of every project folder cost little time to write and save a great deal of confusion later. - High-throughput and large-scale facilities require automated metadata capture and advance storage planning – the same principles apply, but the consequences of skipping them are much larger.
- Open format exports (JCAMP-DX for spectra, CIF for crystal structures) should accompany vendor format raw files for all deposited data.
PSDI and the Chemistry Data Landscape
- PSDI provides practical tools and guidance for chemistry and physical sciences data management in the UK, including format conversion, legacy data digitisation, and training resources.
- The chemistry data ecosystem is becoming more connected: PSDI, NFDI4Chem, and IUPAC are working towards compatible standards that connect instruments, notebooks, repositories, and publications.
- You do not need to adopt everything at once – picking one or two concrete improvements and building from there is more sustainable than an overhaul.
- re3data.org, the NFDI4Chem Knowledge Base, and PSDI Resources are all free starting points for further learning.