S5-HuBERT: Self-Supervised Speaker-Separated Syllable HuBERT

This is the official repository of the IEEE SLT 2024 paper Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT.

Setup

conda create -y -n py310 python=3.10.14 pip=24.0
conda activate py310
pip install -r requirements/requirements.txt

sh scripts/setup.sh

Usage: encoding waveforms into pseudo-syllabic units

import torchaudio

from src.s5hubert import S5HubertForSyllableDiscovery

wav_path = "/path/to/wav"

# download a pretrained model from hugging face hub
model = S5HubertForSyllableDiscovery.from_pretrained("ryota-komatsu/s5-hubert").cuda()

# load a waveform
waveform, sr = torchaudio.load(wav_path)
waveform = torchaudio.functional.resample(waveform, sr, 16000)

# encode a waveform into pseudo-syllabic units
outputs = model(waveform.cuda())

# pseudo-syllabic units
units = outputs["units"]  # [3950, 67, ..., 503]

Demo

Google Colab demo is found here.

Models

You can download a pretrained model from Hugging Face.

Other models can be downloaded from the old repository.

Data Preparation

If you already have LibriSpeech, you can use it by editing a config file;

dataset:
  root: "/path/to/LibriSpeech/root" # ${dataset.root}/LibriSpeech/train-clean-100, train-clean-360, ...

otherwise you can download the new one under dataset_root.

dataset_root=data  # be consistent with dataset.root in a config file

sh scripts/download_librispeech.sh ${dataset_root}

Check the directory structure

dataset.root in a config file
└── LibriSpeech/
    ├── train-clean-100/
    ├── train-clean-360/
    ├── train-other-500/
    ├── dev-clean/
    ├── dev-other/
    ├── test-clean/
    ├── test-other/
    └── SPEAKERS.TXT

Training & Evaluation

python main.py --config configs/default.yaml

Citation

@inproceedings{Komatsu_Self-Supervised_Syllable_Discovery_2024,
  author    = {Komatsu, Ryota and Shinozaki, Takahiro},
  title     = {Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT},
  year      = {2024},
  month     = {Dec.},
  booktitle = {IEEE Spoken Language Technology Workshop},
  pages     = {1131--1136},
  doi       = {10.1109/SLT61566.2024.10832325},
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
configs		configs
data		data
figures		figures
models		models
requirements		requirements
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

S5-HuBERT: Self-Supervised Speaker-Separated Syllable HuBERT

Setup

Usage: encoding waveforms into pseudo-syllabic units

Demo

Models

Data Preparation

Training & Evaluation

Citation

About

Releases

Packages

Languages

License

ryota-komatsu/speaker_disentangled_hubert

Folders and files

Latest commit

History

Repository files navigation

S5-HuBERT: Self-Supervised Speaker-Separated Syllable HuBERT

Setup

Usage: encoding waveforms into pseudo-syllabic units

Demo

Models

Data Preparation

Training & Evaluation

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages