Skip to content

nth221/quark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

QUARK: QUantum-informed Analysis for Recommendation of Kombinations

This repository contains the official implementation of "Multimodal Drug Recommendation with Quantum Chemical Molecular Representations".

Paper Overview

QUARK is a multimodal medication recommendation framework that integrates quantum-chemical molecular representations with longitudinal patient EHR data to generate safe and clinically relevant drug combinations.

Unlike prior approaches that rely only on molecular structure or predefined interaction graphs, QUARK encodes electron-level physicochemical properties using ELF (Electron Localization Function) and ESP (Electrostatic Potential) maps derived from density functional theory (DFT). These quantum-informed drug embeddings are fused with patient representations through a cross-attention mechanism, enabling the model to jointly capture:

  • molecular reactivity related to pharmacodynamic DDIs
  • patient-specific clinical context

Highlights

  • Quantum-informed Drug Representation: Utilizes DFT-derived ELF and ESP molecular maps to capture electron density, polarity, and intermolecular interaction patterns beyond atomic connectivity.
  • Cross-modal Molecular Fusion: Applies multi-head cross-attention between ELF and ESP embeddings to model complementary physicochemical properties.
  • Patient–Drug Context Matching: Computes drug relevance dynamically via compatibility between longitudinal patient states and quantum drug embeddings.
  • Substructure-aware Pharmacological Modeling: Extends the SafeDrug bipartite formulation with patient-conditioned substructure importance estimation.
  • Dual-level Evaluation Protocol: CID-level for DDI safety, and ATC3-level for evaluate therapeutic validity.

Experimental Setup

We provide:

  • the full implementation of the QUARK architecture
  • preprocessing and training pipelines for EHR data and quantum molecular representations
  • evaluation scripts following the dual-level safety–effectiveness protocol

The overall data preprocessing and DDI construction pipeline follows our previous implementation in MMM for reproducibility and fair comparison.

Dataset

Experiments are conducted on MIMIC-III, containing:

  • 5,413 patients
  • 14,057 visits
  • 250 medications
  • 4,918 DDI pairs

We adopt the same cohort construction and medication filtering strategy as in MMM.

Molecular Data Generation (Offline)

Quantum molecular images are generated once per drug and reused during both training and inference.

Workflow:

  1. SMILES → 3D geometry
    Avogadro

  2. DFT calculation
    ORCA (B3LYP / def2-SVP)

  3. ELF & ESP map extraction
    Multiwfn

This preprocessing step is performed offline and does not affect inference-time latency.

The molecular preprocessing scripts are based on the MMM pipeline and extended to support dual-modality (ELF + ESP) inputs.

CNN Backbones

We use modality-specific pretrained image encoders:

  • ELF encoder: EfficientNet-V2-L
  • ESP encoder: ResNet-18

Training Environment

All experiments were conducted with:

Python 3.9
PyTorch 2.3.0
CUDA 11.8

Installation

To further train the model, you need to install RDKit-related tools and several packages. To avoid version conflicts among these packages, please follow the installation steps in the exact order below.

  • First, create and activate a new conda environment.
conda create -c conda-forge -n new_env python=3.9
conda activate new_env
  • Install RDKit
conda install -c conda-forge rdkit
  • If RDKit does not work after the above installation, try:
pip install rdkit-pypi
  • Install numpy, pandas, and scipy with specific versions to avoid conflicts:
pip install numpy==1.22.4 pandas==1.3.0 scipy==1.13.1
  • To install PyTorch 2.3.0 with CUDA 11.8 support and torchvision 0.18.0 matching CUDA version, run::
pip install torch==2.3.0+cu118 torchvision==0.18.0+cu118 torchaudio --extra-index-url https://download.pytorch.org/whl/cu118

Data Preparation

Data paths and hyperparameters (such as learning rate, target_ddi, etc.) are configured in the main.py file. Dataset paths should be set in the code to correctly correspond to the input and output folders. Running the preprocessing script processing.py automatically generates related files within the output folder. The paths in the code must then be updated accordingly to reflect the locations of these generated files.

  • Dataset Configuration

In main.py the paths for the following variables must be updated to correspond to the .pkl files generated within the output folder:

data_path = "[records_final.pkl]"
voc_path = "[voc_final.pkl]"
ddi_adj_path = "[ddi_A_final.pkl]"
ddi_mask_path = "[ddi_mask_H.pkl]"
molecule_path = "[cidtoSMILES.pkl]"
ddi_rate = ddi_rate_score("[ddi_A_final.pkl]")

These should be set to point to the corresponding .pkl files generated by preprocessing, typically located in the data/output folder.

  • External data files required for preprocessing
    The following files are obtained from external sources and must be prepared in advance:
Filename Description
ndc2RXCUI.txt NDC-to-RxCUI mapping file, adapted from ndc2rxnorm_mapping.csv in the GAMENet repository.
drug-DDI.csv Contains drug–drug interaction (DDI) information indexed by CID. Download from Google Drive.
RXCUI2atc4.csv RxCUI-to-ATC4 mapping file, adapted from ndc2atc_level4.csv in the GAMENet repository.

Training & Inference

Hyperparameters can be configured in main.py. These hyperparameters are set using the argparse module, allowing default values to be specified and overridden via command-line arguments:

hyperparameters = {
    "Test": [True or False],               
    "model_name": ["model_identifier"],   
    "resume_path": ["path/to/checkpoint"], 
    "lr": [learning_rate],              
    "target_ddi": [target_ddi],  
    "kp": [coefficient_of_P_signal],   
    "dim": [dimension_size],        
    "cuda": [cuda_device_index]
}
  • Run the Code
python main.py
python main.py --Test --resume_path [best_epoch_path]

Citation

If you find this code useful for your work, please cite the following and consider starring this repository:

@inproceedings{kim2026quark,
  title     = {Multimodal Drug Recommendation with Quantum Chemical Molecular Representations},
  author    = {Yujin Kim and Seoeun Park and Chongmyung Kwon and Charmgil Hong},
  booktitle = {Proceedings of the 31st International Conference on Database Systems for Advanced Applications (DASFAA 2026)},
  year      = {2026},
  publisher = {Springer},
  note      = {To appear}
}

References

@inproceedings{yang2021safedrug,
    title = {SafeDrug: Dual Molecular Graph Encoders for Safe Drug Recommendations},
    author = {Yang, Chaoqi and Xiao, Cao and Ma, Fenglong and Glass, Lucas and Sun, Jimeng},
    booktitle = {Proceedings of the Thirtieth International Joint Conference on
               Artificial Intelligence, {IJCAI} 2021},
    year = {2021}
}
@inproceedings{
kwon2025mmm,
title={{MMM}: Quantum-Chemical Molecular Representation Learning for Personalized Drug Recommendation},
author={Chongmyung Kwon and Yujin Kim and Seoeun Park and Yunji Lee and Charmgil Hong},
booktitle={PRedictive Intelligence in MEdicine},
year={2025},
organization={Springer}
}

About

QUARK: QUantum-informed Analysis for Recommendation of Combinations (DASFAA-2026)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages