ASCDomain: Domain Invariant Device-Adversarial Isotropic Knowledge Distillation Convolutional Neural Architecture
This repository is official Pytorch implementaion of ASCDomain (ICASSP 2025). It contains the code to reproduce the results of the Truchan_LUH submission to DCASE24 Task 1 "Data-Efficient Low-Complexity Acoustic Scene Classification" challenge.
The codebase for this repository is the baseline for task1: here
ASCDomain integrates four key components: Preprocessing, Lightweight Isotropic Neural Network , Adversarial Domain Adaptation, and ensemble Knowledge-Distillation. Figure presents the ASCDomain workflow. Solid lines indicate the training and validation phase, and dashed lines indicate the train phase.
- The inputs to the network are the audio snippets [1s], audio labels and device lables.
- The audio snippets first go through preprocessing, transforming them into Mel-Spectograms [256 x 65].
- The isotropic network extracts the features from the time-frequency representation.
- Adversarial Domain Adaptation encourages embedding representation to become domain-invariant (genaralization across different recording devices).
- Ensemble Knowledge Distillation improves learning efficiency by transferring knowledge from multiple high-performance teacher models to a compact student model.
Create a conda environment
conda create -n asc python=3.10
conda activate asc
Download the dataset from this location and extract the files.
There are a total of 5 architectures:
- Isotropic
- Siren
- Adverserial
- RSC
- ASC Domain
Only Isotropic, Siren and RSC were submitted. In each experiment folder there is a dataset
folder with a dcase24.py
file, where the path to the datset has to be specified:
dataset_dir = None
All experiments have an argument split
which specifies the corresponding split: 5, 10, 25, 50,100 are available
The device impulse response augmentation has shown great success in previous submissions and is also used in in this submission. The device impulse responses are provided by MicIPR. All files are shared via Creative Commons license. All credits go to MicIRP & Xaudia.com.
Run isotropic training
python run_isotropic_training.py
Run isotropic with mixstyle from here
python run_isotropic_training.py --model=mix
Run isotropic without activation motivated from here
python run_isotropic_training.py --model=noact
Navigate to Isotropic_HPO and run isotropic hyper parameter optimization.
python run_isotropic_hpo.py
Isotropic Notebook Demonstrator
Run siren training
python run_siren_training.py
Previous domain generalization techniques have used augmentation to generalization. For next two achitectures we conduct representation learning experiments with the isotropic architecture as a backbone model. Two representation learning techniques from DeepDG were chosen:
- Domain Adverserial Neural Network (here called adverserial)
- Representation Self Challenging (RSC)
Run adverserial training
python run_adv_training.py
Run RSC training
python run_rsc_training.py
ASC Domain combines the adverserial approach with knowledge distillation. The training procedure and teacher models were taken from cpjku_dcase23 and EfficientAT. We train a total of 4 differenct architectures:
- MobileNet
- Dynamic MobileNet
- CP-ResNet
- PaSST each with different training setups and version of the architecture leading to a total of 22 teacher models. Each teacher model is trained on the 5 splits, resulting in 110 models.
Run teacher model training
Run MobileNet training. Argument width=[0.4, 0.5, 1.0]
python run_mn_training.py --width=0.4
Run Dynamic MobileNet training. Argument width=[0.4, 1.0]
python run_dymn_training.py --width=0.4
Run PaSST training
python run_passt_training.py
Run CP-ResNet training
python run_cp-resnet_training.py
Run single teacher student training with teacher and Isotropic as student
python run_convmixer_training.py --teacher=<teacher_name>
Example: Run single teacher student training with PaSST teacher and Isotropic as student
python run_convmixer_training.py --teacher=passt_dir_fms
Run ensemble teacher student training with teacher and Isotropic as student
python run_convmixer_training.py --teacher=best
Run ensemble teacher student training with teacher and Siren as student
python run_siren_training.py --teacher=best
Run ensemble teacher student adverserial training with teacher and Isotropic as student
python run_convmixer_adv_training.py --teacher=best
Run ensemble teacher student adverserial training with teacher and Siren as student
python run_siren_adv_training.py --teacher=best
The ensemble is selected by a forward stepwise selection algorithm:
- Start with empty ensemble
- Add the model to the ensemble that minimizes the ensemble validation loss
- Repeat step 2 until no improvement can be achieved
- Return ensemble
The implementation of the ensemble selection can be seen in ensemble_selection.ipynb
.
If you find ASCDomain useful for your work, please consider citing us as follows:
@inproceedings{truchan2025ascdomain,
title={ASCDomain: Domain Invariant Device-Adversarial Isotropic Knowledge Distillation Convolutional Neural Architecture},
author={Truchan, Hubert and Ngo, Tien Hung and Ahmadi, Zahra},
booktitle={ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={1--5},
year={2025},
organization={IEEE}
}