The First Real-World Federated Learning Benchmark on Cardiovascular Disease Data
We recommend the user to use FedCVD following the directory structure below:
data: store the raw data and preprocessed data for each client;
code: use `FedCVD` in this directory;
output: store results and log files。
The directory structure is shown below:
├── workspace
│ └── data
│ │ ├── ECG
│ │ │ ├── raw
│ │ │ │ ├── SPH
│ │ │ │ ├── PTB
│ │ │ │ ├── SXPH
│ │ │ │ ├── G12EC
│ │ │ ├── preprocessed
│ │ │ │ ├── client1
│ │ │ │ ├── client2
│ │ │ │ ├── client3
│ │ │ │ ├── client4
│ │ ├── ECHO
│ │ │ ├── raw
│ │ │ │ ├── CAMUS
│ │ │ │ ├── ECHONET
│ │ │ │ ├── HMCQU
│ │ │ ├── preprocessed
│ │ │ │ ├── client1
│ │ │ │ ├── client2
│ │ │ │ ├── client3
│ ├── output
│ └── code
│ └── FedCVD
You can run scripts below to create the directory structure:
mkdir workspace
cd workspace
mkdir data code output
cd data
mkdir -p ECG/raw ECG/preprocessed ECHO/raw ECHO/preprocessed
cd ECG/raw
mkdir SPH PTB SXPH G12EC
cd ../../ECHO/raw
mkdir CAMUS ECHONET HMCQU
cd ../../../
cd code Datasets for Fed-ECG can be downloaded from the URLs below:
Datasets for Fed-ECHO can be downloaded from the URLs below:
- client1: CAMUS
- client2: ECHONET-DYNAMIC
- client3: HMC-QU
Note that a Stanford AIMI account is required to access the ECHONET-DYNAMIC dataset and a Kaggle account is required to access the HMC-QU dataset.
Make sure the data is stored in the correct directory structure. The reference structure is shown below:
- SPH directory must contain the following files or directories:
├── SPH
│ ├── ...
│ ├── metadata.csv
│ ├── records
│ │ ├── A00001.h5
│ │ ├── ...
│ │ └── A25770.h5
- PTB directory must contain the following files or directories:
├── PTB
│ ├── ...
│ ├── ptbxl_database.csv
│ ├── records500
│ │ ├── 00000
│ │ │ ├── 00001_hr.dat
│ │ │ ├── 00001_hr.hea
│ │ │ └── ...
│ │ ├── ...
│ │ └── 21000
- SXPH directory must contain the following files or directories:
├── SXPH
│ ├── ...
│ ├── WFDBRecords
│ │ ├── 01
│ │ │ ├── 010
│ │ │ │ ├── JS00001.hea
│ │ │ │ ├── JS00001.mat
│ │ │ │ └── ...
│ │ │ ├── ...
│ │ ├── ...
│ │ └── 46
- G12EC directory must contain the following files or directories:
├── G12EC
│ ├── g1
│ │ ├── E00001.hea
│ │ ├── E00001.mat
│ │ └── ...
│ ├── ...
│ └── g11
- CAMUS directory must contain the following files or directories:
├── CAMUS
│ ├── training
│ │ ├── patient0001
│ │ ├── ...
│ │ └── patient0450
│ ├── testing
│ │ ├── patient0001
│ │ ├── ...
│ │ └── patient0050
- ECHONET directory must contain the following files or directories:
├── ECHONET
│ ├── ...
│ ├── FileList.csv
│ ├── VolumeTracings.csv
│ ├── Videos
│ │ ├── 0X1A0A263B22CCD966.avi
│ │ └── ...
- HMCQU directory must contain the following files or directories:
├── HMCQU
│ ├── ...
│ ├── A4C.xlsx
│ ├── HMC-QU
│ │ ├── A4C
│ │ │ ├── ES0001 _4CH_1.avi
│ │ │ └── ...
│ ├── LV Ground-truth Segmentation Masks
│ │ ├── Mask_ES0001 _4CH_1.mat
│ │ └── ...
After downloading the data, you can run the following scripts to preprocess and split the data:
cd code/FedCVD/
bash scripts/preprocess/preprocess.sh
bash scripts/preprocess/split.shYou can use the following scripts to set up the environment:
cd code/FedCVD/
conda create -n fedcvd_env python=3.12 -y
conda activate fedcvd_env
pip install -r requirements.txtYou can run the scripts in the code/FedCVD/scripts directory to reproduce our experiments and add new algorithms.
You can add new models in the code/FedCVD/models directory. Then just import and use the new model in the corresponding training script.
You don't need to change your dataset preprocessing logic, just make sure you provide the DataLoader as a list of torch.utils.data.DataLoader for each client in the training script. Each element in the list corresponds to a client.
You can add new algorithms in the code/FedCVD/algorithms directory. You only need to extend the FedAvgServerHandler and FedAvgSerialClientTrainer classes in code/FedCVD/algorithms/fedavg.py and override three important methods: global_update, local_update and trainto implement your own algorithm. Then just import and use the new algorithm in the corresponding training script.
