QUARK: QUantum-informed Analysis for Recommendation of Kombinations

This repository contains the official implementation of "Multimodal Drug Recommendation with Quantum Chemical Molecular Representations".

Paper Overview

QUARK is a multimodal medication recommendation framework that integrates quantum-chemical molecular representations with longitudinal patient EHR data to generate safe and clinically relevant drug combinations.

Unlike prior approaches that rely only on molecular structure or predefined interaction graphs, QUARK encodes electron-level physicochemical properties using ELF (Electron Localization Function) and ESP (Electrostatic Potential) maps derived from density functional theory (DFT). These quantum-informed drug embeddings are fused with patient representations through a cross-attention mechanism, enabling the model to jointly capture:

molecular reactivity related to pharmacodynamic DDIs
patient-specific clinical context

Highlights

Quantum-informed Drug Representation: Utilizes DFT-derived ELF and ESP molecular maps to capture electron density, polarity, and intermolecular interaction patterns beyond atomic connectivity.
Cross-modal Molecular Fusion: Applies multi-head cross-attention between ELF and ESP embeddings to model complementary physicochemical properties.
Patient–Drug Context Matching: Computes drug relevance dynamically via compatibility between longitudinal patient states and quantum drug embeddings.
Substructure-aware Pharmacological Modeling: Extends the SafeDrug bipartite formulation with patient-conditioned substructure importance estimation.
Dual-level Evaluation Protocol: CID-level for DDI safety, and ATC3-level for evaluate therapeutic validity.

Experimental Setup

We provide:

the full implementation of the QUARK architecture
preprocessing and training pipelines for EHR data and quantum molecular representations
evaluation scripts following the dual-level safety–effectiveness protocol

The overall data preprocessing and DDI construction pipeline follows our previous implementation in MMM for reproducibility and fair comparison.

Dataset

Experiments are conducted on MIMIC-III, containing:

5,413 patients
14,057 visits
250 medications
4,918 DDI pairs

We adopt the same cohort construction and medication filtering strategy as in MMM.

Molecular Data Generation (Offline)

Quantum molecular images are generated once per drug and reused during both training and inference.

Workflow:

SMILES → 3D geometry
Avogadro
DFT calculation
ORCA (B3LYP / def2-SVP)
ELF & ESP map extraction
Multiwfn

This preprocessing step is performed offline and does not affect inference-time latency.

The molecular preprocessing scripts are based on the MMM pipeline and extended to support dual-modality (ELF + ESP) inputs.

CNN Backbones

We use modality-specific pretrained image encoders:

ELF encoder: EfficientNet-V2-L
ESP encoder: ResNet-18

Training Environment

All experiments were conducted with:

Python 3.9
PyTorch 2.3.0
CUDA 11.8

Installation

To further train the model, you need to install RDKit-related tools and several packages. To avoid version conflicts among these packages, please follow the installation steps in the exact order below.

First, create and activate a new conda environment.

conda create -c conda-forge -n new_env python=3.9
conda activate new_env

Install RDKit

conda install -c conda-forge rdkit

If RDKit does not work after the above installation, try:

pip install rdkit-pypi

Install numpy, pandas, and scipy with specific versions to avoid conflicts:

pip install numpy==1.22.4 pandas==1.3.0 scipy==1.13.1

To install PyTorch 2.3.0 with CUDA 11.8 support and torchvision 0.18.0 matching CUDA version, run::

pip install torch==2.3.0+cu118 torchvision==0.18.0+cu118 torchaudio --extra-index-url https://download.pytorch.org/whl/cu118

Data Preparation

Data paths and hyperparameters (such as learning rate, target_ddi, etc.) are configured in the main.py file. Dataset paths should be set in the code to correctly correspond to the input and output folders. Running the preprocessing script processing.py automatically generates related files within the output folder. The paths in the code must then be updated accordingly to reflect the locations of these generated files.

Dataset Configuration

In main.py the paths for the following variables must be updated to correspond to the .pkl files generated within the output folder:

data_path = "[records_final.pkl]"
voc_path = "[voc_final.pkl]"
ddi_adj_path = "[ddi_A_final.pkl]"
ddi_mask_path = "[ddi_mask_H.pkl]"
molecule_path = "[cidtoSMILES.pkl]"
ddi_rate = ddi_rate_score("[ddi_A_final.pkl]")

These should be set to point to the corresponding .pkl files generated by preprocessing, typically located in the data/output folder.

External data files required for preprocessing
The following files are obtained from external sources and must be prepared in advance:

Filename	Description
ndc2RXCUI.txt	NDC-to-RxCUI mapping file, adapted from `ndc2rxnorm_mapping.csv` in the GAMENet repository.
drug-DDI.csv	Contains drug–drug interaction (DDI) information indexed by CID. Download from Google Drive.
RXCUI2atc4.csv	RxCUI-to-ATC4 mapping file, adapted from `ndc2atc_level4.csv` in the GAMENet repository.

Training & Inference

Hyperparameters can be configured in main.py. These hyperparameters are set using the argparse module, allowing default values to be specified and overridden via command-line arguments:

hyperparameters = {
    "Test": [True or False],               
    "model_name": ["model_identifier"],   
    "resume_path": ["path/to/checkpoint"], 
    "lr": [learning_rate],              
    "target_ddi": [target_ddi],  
    "kp": [coefficient_of_P_signal],   
    "dim": [dimension_size],        
    "cuda": [cuda_device_index]
}

Run the Code

python main.py

python main.py --Test --resume_path [best_epoch_path]

Citation

If you find this code useful for your work, please cite the following and consider starring this repository:

@inproceedings{kim2026quark,
  title     = {Multimodal Drug Recommendation with Quantum Chemical Molecular Representations},
  author    = {Yujin Kim and Seoeun Park and Chongmyung Kwon and Charmgil Hong},
  booktitle = {Proceedings of the 31st International Conference on Database Systems for Advanced Applications (DASFAA 2026)},
  year      = {2026},
  publisher = {Springer},
  note      = {To appear}
}

References

@inproceedings{yang2021safedrug,
    title = {SafeDrug: Dual Molecular Graph Encoders for Safe Drug Recommendations},
    author = {Yang, Chaoqi and Xiao, Cao and Ma, Fenglong and Glass, Lucas and Sun, Jimeng},
    booktitle = {Proceedings of the Thirtieth International Joint Conference on
               Artificial Intelligence, {IJCAI} 2021},
    year = {2021}
}

@inproceedings{
kwon2025mmm,
title={{MMM}: Quantum-Chemical Molecular Representation Learning for Personalized Drug Recommendation},
author={Chongmyung Kwon and Yujin Kim and Seoeun Park and Yunji Lee and Charmgil Hong},
booktitle={PRedictive Intelligence in MEdicine},
year={2025},
organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
data		data
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QUARK: QUantum-informed Analysis for Recommendation of Kombinations

Paper Overview

Experimental Setup

Dataset

Molecular Data Generation (Offline)

CNN Backbones

Training Environment

Installation

Data Preparation

Training & Inference

Citation

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QUARK: QUantum-informed Analysis for Recommendation of Kombinations

Paper Overview

Experimental Setup

Dataset

Molecular Data Generation (Offline)

CNN Backbones

Training Environment

Installation

Data Preparation

Training & Inference

Citation

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages