Why Does Research Data Management Matter?


  • Research data includes everything needed to validate your findings — not just the final published dataset.
  • Research data management covers the full lifecycle: planning, collecting, storing, sharing, and preserving data.
  • Poor data management leads to data loss, retractions, wasted time, and irreproducible research.
  • The reproducibility crisis is real and affects chemistry — but good RDM practices directly address its root causes.
  • Good data management is not difficult; it requires some upfront planning and consistent habits.

Planning Your Data: DMPs, Budgeting, and Funder Policies


  • A Data Management Plan describes how you will organise, store, share, and preserve your data. Write one at the start of every project.
  • EPSRC requires metadata to be publicly available within 12 months of data generation, a data access statement in every publication, and data preserved for 10 years.
  • EPSRC does not require a DMP at application stage, but all other UKRI councils do – check your specific funder’s requirements via DMPonline.
  • RDM costs are allowable on most UK grants. Budget for storage, staff time, and repository costs from the start.
  • It is always cheaper to manage data well from the start than to sort it out later.

FAIR Data Principles


  • FAIR stands for Findable, Accessible, Interoperable, and Reusable: It is a framework for making data useful beyond its original context.
  • FAIR is not the same as open: data can be FAIR with access restrictions, and open without being FAIR.
  • Metadata should always be as open as possible, even when the data itself cannot be shared freely.
  • Most researchers’ current data scores poorly against FAIR principles – the goal is incremental improvement, not perfection.
  • The highest-impact steps are depositing data in a repository and applying a licence.

Data Storage, Security, and Organisation


  • Use descriptive, consistent file names with ISO 8601 dates, no spaces, and explicit version numbers. Avoid “final”.
  • Keep raw data in a dedicated, read-only folder, separate from processed data and analysis.
  • A single copy is a single point of failure, ensure at least one copy is stored offsite or in managed cloud storage.
  • Use institutional research data storage as your primary backup, it is more reliable and funder-compliant than personal solutions.
  • Portable storage and consumer cloud services are useful but are not substitutes for institutional storage.
  • Sensitive data requires access controls and encryption, check your institution’s policy.

Sharing, Preserving, and Licensing Your Data


  • Share your data via a repository: it makes your work citable, discoverable, and preserved long-term.
  • Choose domain-specific repositories first; Zenodo is a good general-purpose fallback.
  • Always apply a licence. Without one, the default is “all rights reserved” and others cannot safely reuse your data. CC-BY is the recommended default for research datasets; use Open Data Commons licences (ODC-By) for structured databases.
  • All UKRI-funded publications require a data access statement, even when there is no associated data.
  • Prefer open file formats and include documentation to ensure data remains usable for the required ten-year preservation period.

The Reproducibility Crisis in Chemistry


  • Chemistry is fundamentally a reproducible science, but inadequate methods reporting and poor data management undermine this in practice.
  • Experimental details that seem obvious to the author – catalyst loading, solvent ratio, reagent purity – are often critical for reproduction and routinely absent from publications.
  • The same problem applies in computational chemistry: functional, basis set, software version, and input files must be archived and shared, not discarded after publication.
  • Publication pressure and selective reporting compound the problem: failure modes rarely appear in the literature.
  • Good RDM practices – recording metadata systematically in an ELN, keeping raw data, documenting processing steps – directly address the most preventable causes of irreproducibility.

Electronic Lab Notebooks for Chemists


  • Electronic lab notebooks improve searchability, backup, data linking, and collaboration compared to paper notebooks.
  • For chemistry, look for ELNs with structure drawing, reaction scheme capture, stoichiometry tools, and analytical data integration.
  • Data portability is critical – ensure you can export all records in a usable format before committing.
  • Check what your institution already provides before evaluating tools independently.
  • Chemotion ELN and eLabFTW are widely used open-source options with good chemistry support.

Metadata and Chemical Data Standards


  • Metadata transforms raw data files into interpretable, reusable scientific records – it is the practical foundation of FAIR data in chemistry.
  • Use InChI or InChIKey as the primary identifier for compounds in data records; SMILES is also widely supported. Avoid relying on CAS numbers alone.
  • Record technique-specific metadata systematically: field strength and solvent for NMR, ionisation method for MS, radiation source for XRD.
  • Export data to open formats (JCAMP-DX, CIF, mzML) for deposition and sharing alongside proprietary raw files.
  • The IUPAC FAIRSpec standard defines a community-agreed metadata schema for spectroscopic data in chemistry.

Chemistry Data Repositories and Databases


  • Domain-specific repositories provide better discoverability and community-standard validation than general-purpose alternatives – use them as your first choice.
  • Key repositories for chemistry include Chemotion (synthetic/spectroscopic), CSD (crystal structures), NOMAD and ioChem-BD (computational), and Zenodo (general fallback).
  • When a paper involves multiple data types, split the deposit across appropriate repositories – each with its own DOI – rather than pooling everything in a general-purpose archive. List all DOIs in the data access statement.
  • Crystal structure deposition in the CSD is mandatory for most crystallography journals.
  • re3data.org and FAIRsharing.org are the best starting points for finding a repository appropriate for your data type.
  • NFDI4Chem provides freely accessible guidance and tools for chemistry data management regardless of where you are based.

Managing Data from Common Chemistry Techniques


  • Record metadata for every analytical measurement at the time of acquisition – it cannot be reliably reconstructed later.
  • For NMR, always save the raw FID alongside the processed spectrum, and record field strength, nucleus, solvent, pulse sequence, and reference compound as a minimum.
  • A README.md and data-dictionary.csv at the root of every project folder cost little time to write and save a great deal of confusion later.
  • High-throughput and large-scale facilities require automated metadata capture and advance storage planning – the same principles apply, but the consequences of skipping them are much larger.
  • Open format exports (JCAMP-DX for spectra, CIF for crystal structures) should accompany vendor format raw files for all deposited data.

PSDI and the Chemistry Data Landscape


  • PSDI provides practical tools and guidance for chemistry and physical sciences data management in the UK, including format conversion, legacy data digitisation, and training resources.
  • The chemistry data ecosystem is becoming more connected: PSDI, NFDI4Chem, and IUPAC are working towards compatible standards that connect instruments, notebooks, repositories, and publications.
  • You do not need to adopt everything at once – picking one or two concrete improvements and building from there is more sustainable than an overhaul.
  • re3data.org, the NFDI4Chem Knowledge Base, and PSDI Resources are all free starting points for further learning.