✨✨ A curated collection of resources on artificial intelligence for spectral data analysis, covering computational methods for mass spectrometry (MS), NMR, IR, and XRD data.
Computational approaches for predicting mass spectra from molecular structures
AI methods for molecular identification and elucidation from mass spectra
Computational methods for predicting peptides mass spectra
AI approaches for peptides identification and quantification
Prediction of NMR spectra from molecular structures
| Paper Title & Link | Method Type | Data Source | Performance Metric | Notes |
|---|---|---|---|---|
| Prediction of chemical shift in NMR: A review | Empirical | - | Rule-based | Interpretable, less generalizable |
| iShiftML: Highly Accurate Prediction of NMR Chemical Shifts | Hybrid ML + QM | QM descriptors | < 0.2 ppm error | Fast inference, needs QM feature prep |
| NMR shift prediction from small data quantities | ML | NMRShiftDB2 | MAE (ppm) | Good scalability |
| NMR-spectrum prediction for dynamic molecules | ML-Dynamics | Simulated ensembles | Time-avg ppm | Accounts for flexible molecules |
| Machine learning in NMR spectroscopy | DL | NMRShiftDB2 | TBD | Multitask joint learning |
| Paper Title & Link | Method Type | Input Data | Accuracy / Metric | Notes |
|---|---|---|---|---|
| A Bayesian approach to structural elucidation using crystalline-state solid‑state NMR and probabilistic inference (2019) | Bayesian | Solid‑state NMR | Top‑5 accuracy | Requires crystal information |
| Accurate and efficient structure elucidation from routine one‑dimensional NMR spectra using multitask machine learning (2024) | DL (CNN + Transformer) | 1D spectra | Top‑1 ~70% | No need for 2D spectra |
| Deep reinforcement learning and graph convolutional networks for molecular inverse problem of NMR (2022) | RL (MCTS + GCN) | Shift table | Top‑3 ~80% | Effective for small molecules |
| High‑resolution iterative Full Spin Analysis (HiFSA) for small molecules using PERCH (2015) | Spectral ID | Simulated spectra | — | Useful for detailed peak assignment |
| Automated mixture component identification via wavelet packet transform and optimization (2023) | Mixture ID (WPT + Optimization) | Mixtures | Component-level accuracy | Robust for complex sample spectra |
| Dataset Name & Link | Spectrum Count | Real / Simulated | Multi-modal Spectra | Labeled | Downloadable / Crawlable |
|---|---|---|---|---|---|
| NMRShiftDB2 | ~50,000 | Real | ¹H, ¹³C | ✅ Yes | ✅ Yes (open source) |
| BMRB | >13,000 biomolecules | Real | ¹H, ¹³C, ¹⁵N, ²H, ³¹P | ✅ Yes | ✅ Yes (FTP/STAR) |
| SDBS | ~14,000 | Real | ¹H, ¹³C, IR, MS, UV | ✅ Yes | ✅ Yes (Crawl Script Needed) |
| QM9-NMR (Simulated) | 130,000+ | Simulated (DFT) | ¹H, ¹³C | ✅ Yes | ✅ Yes (via DOI or GitHub) |
| 2DNMRGym (2024) | 22,000 2D HSQC | Simulated | HSQC (2D) | ✅ Yes | ✅ Yes (HuggingFace) |
| NMRMixDB | ~3,000 mixtures | Real | ¹H | ✅ Yes (with labels) | ✅ Yes |
| NMRPredBench | ~3,000 | Real + Simulated | ¹H, ¹³C | ✅ Yes | ✅ Yes (GitHub) |
| MolAid | ~840K+ | Experimental | Multi-property | ✅ Yes | ❌ No(API Chared) |
| NIST WebBook | ~700K+ | Experimental | ¹H, ¹³C etc. | ✅ Yes | ✅ Yes (Need Search Key) |
| PubChem | ~100M+ | Experimental + Predicted | Full compound attributes | ✅ Yes | ✅ Yes (API) |
Infrared spectrum prediction from molecular structures
Molecular characterization from infrared spectra
Joint prediction of multiple spectral modalities from molecular structures
Multimodal integration for enhanced molecular identification
Prediction of XRD patterns from crystal structures
Crystal structure determination from XRD patterns
📄 This project is licensed under the MIT License — see the LICENSE file for details.
