This repository contains the official source code for the paper "Anatomy-Aware Lymphoma Lesion Detection in Whole-Body PET/CT".
It provides a comprehensive overview of the project and detailed instructions to reproduce all experiments described in the paper, including data preparation, model training, and evaluation steps.
This project utilizes two publicly available, expertly annotated whole-body PET/CT datasets from The Cancer Imaging Archive (TCIA):
-
FDG-PET-CT-Lesions
This dataset, published by Gatidis and Kuestner (2022), includes whole-body [18F]FDG PET/CT scans from 900 patients, primarily with lymphoma. Each scan features comprehensive, high-quality manual 3D lesion segmentations by clinical experts. The dataset also provides corresponding metadata, making it ideal for developing and evaluating algorithms for lesion detection and segmentation.- Details:
- Modality: [18F]FDG PET/CT
- Subjects: 900
- Annotations: Manual segmentation of all lymphoma lesions
- DOI: 10.7937/gkr0-xv29
- Details:
-
PSMA-PET-CT-Lesions
Described by Jeblick et al. (2024), this resource contains whole-body [68Ga]PSMA-11 PET/CT scans from 365 prostate cancer patients, also with expert manual 3D lesion segmentations. Similar to the FDG dataset, this resource supports training and testing of deep learning models for lesion detection and segmentation across a different cancer type and tracer.- Details:
- Modality: [68Ga]PSMA-11 PET/CT
- Subjects: 365
- Annotations: Manual segmentation of all tumor lesions
- DOI: 10.7937/r7ep-3x37
- Details:
Both datasets are freely available to the research community via TCIA, and their use in this project complies with the corresponding terms and conditions. Please cite the original dataset publications when using the data in your own research.
To simplify environment setup and ensure reproducibility, you can use the provided Docker image. This project is compatible with the nndetection environment:
docker pull ghcr.io/minnelab/nndetection:1.0or, to use the same image with Singularity, you can use the following command:
singularity pull nndetection_1.0.sif docker://ghcr.io/minnelab/nndetection:1.0To set up the environment locally, you can create a virtual environment and install the dependencies using the following command:
python -m venv venv
source venv/bin/activate
pip install -r requirements.txtThe experimental workflow consists of the following main stages:
-
Data Preparation
- Download the required datasets from TCIA.
- Convert the semantic lesion segmentations to instance segmentations.
- Create the datalist for the AutoPET Decathlon dataset.
- [Optional] Generate anatomical segmentations using TotalSegmentator. These anatomical masks can be incorporated as an additional input channel to the detection network if anatomy awareness is desired.
- Experimental planning.
- Data preprocessing.
-
Model Training and Evaluation
- [Optional] Train a self-supervised learning (SSL) model, such as the Swin Retina UneTR, to initialize weights for downstream tasks.
- Train and evaluate lymphoma lesion detection models. Two main architectures are considered:
- Retina U-Net: Used as the baseline model, trained without anatomical (TotalSegmentator) input.
- Swin Retina UneTR: Can be trained either with or without the additional anatomical input channel (from TotalSegmentator masks).
- For anatomy-aware experiments, both the Retina U-Net and Swin Retina UneTR models can be augmented with the anatomical channel; performance comparisons should be made to assess the impact of providing anatomical context.
- Evaluate all models according to standardized metrics.
To download the PSMA-FDG-PET-CT-Lesions dataset, you can use the following command:
wget https://fdat.uni-tuebingen.de/records/6gjsg-zcg93/files/psma-fdg-pet-ct-lesions.zip -O psma-fdg-pet-ct-lesions.zip
unzip psma-fdg-pet-ct-lesions.zipThe dataset has the following Decathlon structure:
psma-fdg-pet-ct-lesions/
├── dataset.json
├── imagesTr/
│ ├── <patient_id>_0000.nii.gz # CT image
│ ├── <patient_id>_0001.nii.gz # PET image
│ └── ...
├── labelsTr/
├── <patient_id>.nii.gz # Semantic segmentation
└── ...
For object detection, each individual lesion must be treated as a distinct object. To achieve this, the semantic segmentation masks, which label all lesions with the same value, need to be converted into instance segmentation masks, where each separate lesion is assigned a unique integer label. To convert the semantic segmentations to instance segmentations, you can use the following command:
PyMAIA_convert_semantic_to_instance_segmentation --data-folder $DATA_DIR/labelsTr --sem-seg-suffix .nii.gz --inst-seg-suffix .nii.gz --inst-seg-folder $DATA_DIR/labelsInstTr --output-json-path $DATA_DIR/Autopet_Inst-Seg.json --decathlon-format yesA separate folder for the instance segmentations will be created:
psma-fdg-pet-ct-lesions/
├── labelsInstTr/
│ ├── <patient_id>.nii.gz # Instance segmentation
│ └── ...
Additionaly, a JSON file (Autopet_Inst-Seg.json) will be created, including the number of lesions for each patient.
{
"<patient_id>" : <number_of_lesions>
}
Before proceeding with the data preprocessing, we need to create the Decathlon datalist. This is a dictionary that maps the patient IDs to the corresponding image and segmentation files, following this structure:
{
"testing": [],
"training": [
{
"ct": "imagesTr/<patient_id>_0000.nii.gz",
"pet": "imagesTr/<patient_id>_0001.nii.gz",
"label": "labelsTr/<patient_id>.nii.gz",
"instance_seg": "labelsInstTr/<patient_id>.nii.gz",
"fold": 0
},
...
],
}The single cases are assigned to the corresponding folds, according to the splits_final.json file, already provided in the dataset.
To create the datalist, you can use the following command:
python scripts/create_Autopet_Decathlon_Datalist.py --root-dir $DATA_DIRThe datalist is saved as Autopet_Decathlon.json in the root directory of the dataset.
The experimen planning step is required to create a dataset fingerprint (including values such as voxel spacing, image size, modalities, etc.) that is used to preprocess the data. To run the planning, you can use the following command:
export BUNDLE=Bundles/Detection
python -m monai.bundle run plan \
--bundle_root $BUNDLE \
--decathlon_data_list $DATA_DIR/Autopet_Decathlon.json \
--data_dir $DATA_DIR \
--config_file "['$BUNDLE/Detection/dataset.yaml','$BUNDLE/Detection/transforms.yaml', '$BUNDLE/Detection/plan.yaml','$BUNDLE/Detection/params.yaml','$BUNDLE/Detection/imports.yaml']" \
--meta_file $BUNDLE/Detection/metadata.jsonThe command will analyze the dataset and create a plan.json file in the dataset directory:
{
"<PATIENT_ID>": {
"ct":
{
"spacing": [x_spacing, y_spacing, z_spacing],
"size": [x_size, y_size, z_size],
"space": "RAS"
},
"pet":
{
"spacing": [x_spacing, y_spacing, z_spacing],
"size": [x_size, y_size, z_size],
"space": "RAS"
}
}
}The preprocessing step is performing the following transformations:
- Loading CT, PET and instance segmentations images
- Orienting the images to the given orientation axes
- Resampling the images to the given voxel spacing
- Scaling the single channel images to the range [0, 1]
- Concatenating the PET and CT images into a single image
To run the preprocessing, you can use the following command:
python -m monai.bundle run preprocess \
--bundle_root $BUNDLE \
--decathlon_data_list $DATA_DIR/Autopet_Decathlon.json \
--data_dir $DATA_DIR \
--config_file "['$BUNDLE/Detection/dataset.yaml','$BUNDLE/Detection/transforms.yaml', '$BUNDLE/Detection/preprocess.yaml','$BUNDLE/Detection/params.yaml','$BUNDLE/Detection/imports.yaml']" \
--meta_file $BUNDLE/Detection/metadata.jsonYou will get a folder with the preprocessed images in the <data_dir>/preprocessed/ directory:
psma-fdg-pet-ct-lesions/
├── preprocessed/
│ ├── <patient_id> # Folder with the preprocessed images
│ ├── <patient_id>_0000.nii.gz # Preprocessed CT image
│ ├── <patient_id>_0001.nii.gz # Preprocessed PET image
│ ├── <patient_id>.json # Object information
│ ├── <patient_id>_image.nii.gz # Preprocessed concatenated image
│ └── <patient_id>.nii.gz # Preprocessed instance segmentation
Where the <patient_id>.json file contains the object information, including the bounding box, class label, and instance ID:
{
"ct": <path_to_ct_image>,
"pet": <path_to_pet_image>,
"instance_seg": <path_to_instance_seg_image>,
"box": [
[BOX_0_X_MIN, BOX_0_Y_MIN, BOX_0_Z_MIN, BOX_0_X_MAX, BOX_0_Y_MAX, BOX_0_Z_MAX],
[BOX_1_X_MIN, BOX_1_Y_MIN, BOX_1_Z_MIN, BOX_1_X_MAX, BOX_1_Y_MAX, BOX_1_Z_MAX],
...
],
"label": [
<CLASS_0>,
<CLASS_1>,
...
]
}Where the box is a list of bounding boxes (in image coordinates) and the label is a list of class labels for each object.
To visualize the preprocessed images with the corresponding instance segmentations and the bounding boxes, you can follow the instructions in the Detection Notebook.
- Gatidis S, Kuestner T. A whole-body FDG-PET/CT dataset with manually annotated tumor lesions (FDG-PET-CT-Lesions) [Dataset]. The Cancer Imaging Archive, 2022. DOI: 10.7937/gkr0-xv29
- Jeblick, K., Boning, G., Battistel, L., et al. A whole-body PSMA-PET/CT dataset with manually annotated tumor lesions (PSMA-PET-CT-Lesions) (Version 1) [Dataset]. The Cancer Imaging Archive, 2024. DOI: 10.7937/r7ep-3x37