Releases · FoxNoseTech/diarize

06 May 10:03

loookashow

v0.1.2

4f25d27

v0.1.2 Latest

Latest

What's Changed

diarize 0.1.2 focuses on diarization quality, reproducible benchmarks, and clearer accuracy documentation.

Improvements

Reduced short speaker label switching with temporal smoothing during diarization assembly.
Improved automatic speaker-count selection with silhouette refinement plus a small larger-k prior.
Added scripts/benchmark_rttm.py for reproducible audio+RTTM benchmark runs across VoxConverse, AMI, and similar datasets.

Benchmarks and Docs

Updated VoxConverse dev benchmark numbers:
- Weighted DER: ~4.8%
- Speaker count: 125/216 exact, 178/216 within ±1
Added preliminary AMI Mix-Headset test validation:
- Weighted DER: 14.96%
- Speaker count: 4/16 exact, 8/16 within ±1
Documented known limitations around speaker-count errors and speaker label fragmentation.
Added a Changelog page to the documentation.

Package

Synced package metadata and runtime diarize.__version__ to 0.1.2.

Assets 2

06 Mar 17:20

loookashow

v0.1.1

871d71f

v0.1.1

This patch release fixes dependency compatibility for audio loading.

Fixed

Pinned torch and torchaudio to a compatible range:
- torch>=1.13,<2.9
- torchaudio>=0.13,<2.9
Prevents failures where newer torchaudio requires torchcodec.

Docs

Clarified that diarize now installs a compatible torch/torchaudio range automatically.

No API changes.

Assets 2

01 Mar 11:30

loookashow

v0.1.0

c7bc69a

v0.1.0 — Initial Release

diarize v0.1.0

Speaker diarization for Python — answers "who spoke when?" in any audio file. CPU-only, no GPU, no API keys, no account signup.

Highlights

~10.8% DER on VoxConverse dev set — lower than pyannote's free models (community-1 and 3.1 legacy, both ~11.2%)
~8x faster than real-time on CPU (RTF 0.12 vs pyannote community-1's 0.86)
Automatic speaker count detection via GMM BIC with silhouette refinement (1–7 speakers)
Zero setup friction — pip install diarize and you're done, no HuggingFace token or account needed

Pipeline

Silero VAD → WeSpeaker ResNet34-LM (ONNX) → GMM BIC → Spectral Clustering

All four stages run on CPU. All components are open-source with permissive licenses.

Usage

from diarize import diarize

result = diarize("meeting.wav")
for seg in result.segments:
    print(f"  [{seg.start:.1f}s - {seg.end:.1f}s] {seg.speaker}")

Known Limitations

Benchmarked on a single dataset (VoxConverse). Cross-dataset validation is planned.
Speaker count estimation degrades for 8+ speakers — pass num_speakers explicitly when known.
Overlapping speech is not modeled — each segment is assigned to one speaker.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Improvements

Benchmarks and Docs

Package

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Fixed

Docs

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

diarize v0.1.0

Highlights

Pipeline

Usage

Known Limitations

Uh oh!

Releases: FoxNoseTech/diarize

v0.1.2

What's Changed

Improvements

Benchmarks and Docs

Package

Uh oh!

v0.1.1

Fixed

Docs

Uh oh!

v0.1.0 — Initial Release

diarize v0.1.0

Highlights

Pipeline

Usage

Known Limitations

Uh oh!