Owl Audio Gen

Latent diffusion for unconditional audio generation. The model is a DiT operating on audio latents from any pre-trained audio VAE.

Setup

git clone https://github.com/Overworldai/owl-audio-gen.git
cd owl-audio-gen
pip install -e .

A .env in the repo root is read at startup — drop WANDB_API_KEY=... there if you don't want to set it in your shell.

Audio VAE

The training loop encodes raw audio on-the-fly with a frozen VAE. The baseline uses the VAE from Stable Audio Open 1.0 That model is gated, so to use it as-is you'll need to accept the license on HuggingFace and run huggingface-cli login. Support for more VAEs/custom VAEs is a WIP.

Data

The default loader (audio_dir_loader) walks a directory tree for .mp4 files and decodes random audio windows on-the-fly with PyAV. Point train.data_kwargs.source at any folder of mp4s (it accepts a directory, a list of directories, or a glob).

Training

Edit configs/audio_baseline.yml — at minimum, set:

train.data_kwargs.source → your audio dataset path
wandb.name → your wandb entity

Then launch:

# single GPU
python train.py --config_path configs/audio_baseline.yml

# multi-GPU (single node, 8 GPUs)
torchrun --nproc_per_node=8 train.py --config_path configs/audio_baseline.yml

Checkpoints land in train.checkpoint_dir, and samples are logged to wandb every train.sample_interval steps.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
owl_audio		owl_audio
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Owl Audio Gen

Setup

Audio VAE

Data

Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Owl Audio Gen

Setup

Audio VAE

Data

Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages