Skip to content

arul28/VersicDiarization

 
 

Repository files navigation

UNMIXX a novel framework for the multiple singing voices separation (MSVS). While similar to speech separation, MSVS presents unique challenges, namely data scarcity and highly correlated nature of singing voice. To address these issues, we propose three key components: (1) a musically informed mixing strategy to construct highly correlated training mixtures, (2) a reverse attention that drives the two outputs apart using cross attention and (3) a magnitude penalty loss penalizing energy erroneously assigned to the other output. Experiments show that UNMIXX achieves substantial improvements, with more than ~2.2 dB SDRi gains on MedleyVox evaluation set over prior method. Audio samples are available on our demo page.


Quickstart

1) Data Preprocessing

We use a total of 400 hours of singing datasets for training.
Download the datasets and follow the preprocessing steps in MedleyVox preprocessing steps.

2) Training

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
python audio_train.py --conf_dir configs/unmixx.yml

3) Inference

python inference.py \
  --conf_path ckpt/conf.yml \
  --ckpt_path ckpt/best.ckpt \
  --audio_path sample_music/free_mixture.wav \
  --output_dir separated_audio

Outputs will be saved to separated_audio/.


Implementation references:

About

[ICASSP 2026] Official code for UNMIXX: Untangling Highly Correlated Singing Voices Mixtures

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%