Skip to content

Commit de9fd74

Browse files
committedNov 19, 2023
Add documentation page (#209)
* Add initial docs * Include README in docs page * Improve README * Update README * Add docs requirements.txt * Add readthedocs config file * Fix links * Add some docstrings * Ignore private attrs in docs * Add some docstrings. Effectively ignore __init__ * Blacken code * Blacken code with good version * Clean up some code * Fix wrong html title
1 parent 2227f54 commit de9fd74

12 files changed

+245
-11
lines changed
 

‎.readthedocs.yaml

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
version: 2
2+
3+
build:
4+
os: "ubuntu-22.04"
5+
tools:
6+
python: "3.10"
7+
8+
python:
9+
install:
10+
- requirements: docs/requirements.txt
11+
# Install diart before building the docs
12+
- method: pip
13+
path: .
14+
15+
sphinx:
16+
configuration: docs/conf.py

‎README.md

+32-7
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,11 @@
11
<br/>
22

33
<p align="center">
4-
<img width="50%" src="/logo.jpg" title="Logo" />
4+
<img width="50%" src="https://raw.githubusercontent.com/juanmc2005/diart/main/logo.jpg" title="Logo" />
5+
</p>
6+
7+
<p align="center">
8+
<i>🌿 Build AI-powered real-time audio applications in a breeze 🌿</i>
59
</p>
610

711
<p align="center">
@@ -56,9 +60,21 @@
5660
<br/>
5761

5862
<p align="center">
59-
<img width="100%" src="/demo.gif" title="Real-time diarization example" />
63+
<img width="100%" src="https://github.com/juanmc2005/diart/blob/main/demo.gif?raw=true" title="Real-time diarization example" />
6064
</p>
6165

66+
## ⚡ Quick introduction
67+
68+
Diart is a python framework to build AI-powered real-time audio applications. With diart you can
69+
create your own AI pipeline, benchmark it, tune its hyper-parameters, and even serve it on the web using websockets.
70+
71+
**We provide pre-trained AI pipelines for:**
72+
73+
- Speaker Diarization
74+
- Voice Activity Detection
75+
- Transcription (coming soon)
76+
- [Speaker-Aware Transcription](https://betterprogramming.pub/color-your-captions-streamlining-live-transcriptions-with-diart-and-openais-whisper-6203350234ef) (coming soon)
77+
6278
## 💾 Installation
6379

6480
1) Create environment:
@@ -289,13 +305,18 @@ prediction = inference()
289305

290306
## 🔬 Powered by research
291307

292-
Diart is the official implementation of the paper *[Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation](/paper.pdf)* by [Juan Manuel Coria](https://juanmc2005.github.io/), [Hervé Bredin](https://herve.niderb.fr), [Sahar Ghannay](https://saharghannay.github.io/) and [Sophie Rosset](https://perso.limsi.fr/rosset/).
308+
Diart is the official implementation of the paper
309+
[Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation](https://github.com/juanmc2005/diart/blob/main/paper.pdf)
310+
by [Juan Manuel Coria](https://juanmc2005.github.io/),
311+
[Hervé Bredin](https://herve.niderb.fr),
312+
[Sahar Ghannay](https://saharghannay.github.io/)
313+
and [Sophie Rosset](https://perso.limsi.fr/rosset/).
293314

294315

295316
> We propose to address online speaker diarization as a combination of incremental clustering and local diarization applied to a rolling buffer updated every 500ms. Every single step of the proposed pipeline is designed to take full advantage of the strong ability of a recently proposed end-to-end overlap-aware segmentation to detect and separate overlapping speakers. In particular, we propose a modified version of the statistics pooling layer (initially introduced in the x-vector architecture) to give less weight to frames where the segmentation model predicts simultaneous speakers. Furthermore, we derive cannot-link constraints from the initial segmentation step to prevent two local speakers from being wrongfully merged during the incremental clustering step. Finally, we show how the latency of the proposed approach can be adjusted between 500ms and 5s to match the requirements of a particular use case, and we provide a systematic analysis of the influence of latency on the overall performance (on AMI, DIHARD and VoxConverse).
296317

297318
<p align="center">
298-
<img height="400" src="/figure1.png" title="Visual explanation of the system" width="325" />
319+
<img height="400" src="https://github.com/juanmc2005/diart/blob/main/figure1.png?raw=true" title="Visual explanation of the system" width="325" />
299320
</p>
300321

301322
## 📗 Citation
@@ -315,7 +336,7 @@ If you found diart useful, please make sure to cite our paper:
315336

316337
## 👨‍💻 Reproducibility
317338

318-
![Results table](/table1.png)
339+
![Results table](https://github.com/juanmc2005/diart/blob/main/table1.png?raw=true)
319340

320341
Diart aims to be lightweight and capable of real-time streaming in practical scenarios.
321342
Its performance is very close to what is reported in the paper (and sometimes even a bit better).
@@ -367,9 +388,13 @@ if __name__ == "__main__": # Needed for multiprocessing
367388
This pre-calculates model outputs in batches, so it runs a lot faster.
368389
See `diart.benchmark -h` for more options.
369390

370-
For convenience and to facilitate future comparisons, we also provide the [expected outputs](/expected_outputs) of the paper implementation in RTTM format for every entry of Table 1 and Figure 5. This includes the VBx offline topline as well as our proposed online approach with latencies 500ms, 1s, 2s, 3s, 4s, and 5s.
391+
For convenience and to facilitate future comparisons, we also provide the
392+
[expected outputs](https://github.com/juanmc2005/diart/tree/main/expected_outputs)
393+
of the paper implementation in RTTM format for every entry of Table 1 and Figure 5.
394+
This includes the VBx offline topline as well as our proposed online approach with
395+
latencies 500ms, 1s, 2s, 3s, 4s, and 5s.
371396

372-
![Figure 5](/figure5.png)
397+
![Figure 5](https://github.com/juanmc2005/diart/blob/main/figure5.png?raw=true)
373398

374399
## 📑 License
375400

‎docs/Makefile

+20
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = .
9+
BUILDDIR = _build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

‎docs/_static/logo.png

250 KB
Loading

‎docs/conf.py

+65
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Configuration file for the Sphinx documentation builder.
2+
#
3+
# For the full list of built-in configuration values, see the documentation:
4+
# https://www.sphinx-doc.org/en/master/usage/configuration.html
5+
6+
# -- Project information -----------------------------------------------------
7+
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
8+
9+
project = "diart"
10+
copyright = "2023, Juan Manuel Coria"
11+
author = "Juan Manuel Coria"
12+
release = "v0.9"
13+
14+
# -- General configuration ---------------------------------------------------
15+
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
16+
17+
extensions = [
18+
"autoapi.extension",
19+
"sphinx.ext.coverage",
20+
"sphinx.ext.napoleon",
21+
"sphinx_mdinclude",
22+
]
23+
24+
autoapi_dirs = ["../src/diart"]
25+
autoapi_options = [
26+
"members",
27+
"undoc-members",
28+
"show-inheritance",
29+
"show-module-summary",
30+
"special-members",
31+
"imported-members",
32+
]
33+
34+
templates_path = ["_templates"]
35+
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
36+
37+
# -- Options for autodoc ----------------------------------------------------
38+
# https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#configuration
39+
40+
# Automatically extract typehints when specified and place them in
41+
# descriptions of the relevant function/method.
42+
autodoc_typehints = "description"
43+
44+
# Don't show class signature with the class' name.
45+
autodoc_class_signature = "separated"
46+
47+
# -- Options for HTML output -------------------------------------------------
48+
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
49+
50+
html_theme = "furo"
51+
html_static_path = ["_static"]
52+
html_logo = "_static/logo.png"
53+
html_title = "diart documentation"
54+
55+
56+
def skip_submodules(app, what, name, obj, skip, options):
57+
return (
58+
name.endswith("__init__")
59+
or name.startswith("diart.console")
60+
or name.startswith("diart.argdoc")
61+
)
62+
63+
64+
def setup(sphinx):
65+
sphinx.connect("autoapi-skip-member", skip_submodules)

‎docs/index.rst

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
Get started with diart
2+
======================
3+
4+
.. mdinclude:: ../README.md
5+
6+
7+
Useful Links
8+
============
9+
10+
.. toctree::
11+
:maxdepth: 1

‎docs/make.bat

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
@ECHO OFF
2+
3+
pushd %~dp0
4+
5+
REM Command file for Sphinx documentation
6+
7+
if "%SPHINXBUILD%" == "" (
8+
set SPHINXBUILD=sphinx-build
9+
)
10+
set SOURCEDIR=.
11+
set BUILDDIR=_build
12+
13+
%SPHINXBUILD% >NUL 2>NUL
14+
if errorlevel 9009 (
15+
echo.
16+
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
17+
echo.installed, then set the SPHINXBUILD environment variable to point
18+
echo.to the full path of the 'sphinx-build' executable. Alternatively you
19+
echo.may add the Sphinx directory to PATH.
20+
echo.
21+
echo.If you don't have Sphinx installed, grab it from
22+
echo.https://www.sphinx-doc.org/
23+
exit /b 1
24+
)
25+
26+
if "%1" == "" goto help
27+
28+
%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
29+
goto end
30+
31+
:help
32+
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
33+
34+
:end
35+
popd

‎docs/requirements.txt

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
sphinx==6.2.1
2+
sphinx-autoapi==3.0.0
3+
sphinx-mdinclude==0.5.3
4+
furo==2023.9.10

‎src/diart/blocks/base.py

+42
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,28 @@
1111

1212
@dataclass
1313
class HyperParameter:
14+
"""Represents a pipeline hyper-parameter that can be tuned by diart"""
15+
1416
name: Text
17+
"""Name of the hyper-parameter (e.g. tau_active)"""
1518
low: float
19+
"""Lowest value that this parameter can take"""
1620
high: float
21+
"""Highest value that this parameter can take"""
1722

1823
@staticmethod
1924
def from_name(name: Text) -> "HyperParameter":
25+
"""Create a HyperParameter object given its name.
26+
27+
Parameters
28+
----------
29+
name: str
30+
Name of the hyper-parameter
31+
32+
Returns
33+
-------
34+
HyperParameter
35+
"""
2036
if name == "tau_active":
2137
return TauActive
2238
if name == "rho_update":
@@ -32,24 +48,34 @@ def from_name(name: Text) -> "HyperParameter":
3248

3349

3450
class PipelineConfig(ABC):
51+
"""Configuration containing the required
52+
parameters to build and run a pipeline"""
53+
3554
@property
3655
@abstractmethod
3756
def duration(self) -> float:
57+
"""The duration of an input audio chunk (in seconds)"""
3858
pass
3959

4060
@property
4161
@abstractmethod
4262
def step(self) -> float:
63+
"""The step between two consecutive input audio chunks (in seconds)"""
4364
pass
4465

4566
@property
4667
@abstractmethod
4768
def latency(self) -> float:
69+
"""The algorithmic latency of the pipeline (in seconds).
70+
At time `t` of the audio stream, the pipeline will
71+
output predictions for time `t - latency`.
72+
"""
4873
pass
4974

5075
@property
5176
@abstractmethod
5277
def sample_rate(self) -> int:
78+
"""The sample rate of the input audio stream"""
5379
pass
5480

5581
def get_file_padding(self, filepath: FilePath) -> Tuple[float, float]:
@@ -60,6 +86,8 @@ def get_file_padding(self, filepath: FilePath) -> Tuple[float, float]:
6086

6187

6288
class Pipeline(ABC):
89+
"""Represents a streaming audio pipeline"""
90+
6391
@staticmethod
6492
@abstractmethod
6593
def get_config_class() -> type:
@@ -92,4 +120,18 @@ def set_timestamp_shift(self, shift: float):
92120
def __call__(
93121
self, waveforms: Sequence[SlidingWindowFeature]
94122
) -> Sequence[Tuple[Any, SlidingWindowFeature]]:
123+
"""Runs the next steps of the pipeline
124+
given a list of consecutive audio chunks.
125+
126+
Parameters
127+
----------
128+
waveforms: Sequence[SlidingWindowFeature]
129+
Consecutive chunk waveforms for the pipeline to ingest
130+
131+
Returns
132+
-------
133+
Sequence[Tuple[Any, SlidingWindowFeature]]
134+
For each input waveform, a tuple containing
135+
the pipeline output and its respective audio
136+
"""
95137
pass

‎src/diart/blocks/diarization.py

+12
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,18 @@ def reset(self):
157157
def __call__(
158158
self, waveforms: Sequence[SlidingWindowFeature]
159159
) -> Sequence[tuple[Annotation, SlidingWindowFeature]]:
160+
"""Diarize the next audio chunks of an audio stream.
161+
162+
Parameters
163+
----------
164+
waveforms: Sequence[SlidingWindowFeature]
165+
A sequence of consecutive audio chunks from an audio stream.
166+
167+
Returns
168+
-------
169+
Sequence[tuple[Annotation, SlidingWindowFeature]]
170+
Speaker diarization of each chunk alongside their corresponding audio.
171+
"""
160172
batch_size = len(waveforms)
161173
msg = "Pipeline expected at least 1 input"
162174
assert batch_size >= 1, msg

‎src/diart/blocks/embedding.py

+7-1
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,13 @@ def __call__(
6969

7070

7171
class OverlappedSpeechPenalty:
72-
"""
72+
"""Applies a penalty on overlapping speech and low-confidence regions to speaker segmentation scores.
73+
74+
.. note::
75+
For more information, see `"Overlap-Aware Low-Latency Online Speaker Diarization
76+
based on End-to-End Local Segmentation" <https://github.com/juanmc2005/diart/blob/main/paper.pdf>`_
77+
(Section 2.2.1 Segmentation-driven speaker embedding). This block implements Equation 2.
78+
7379
Parameters
7480
----------
7581
gamma: float, optional

‎src/diart/models.py

+1-3
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,7 @@
1212

1313
try:
1414
from pyannote.audio import Model
15-
from pyannote.audio.pipelines.speaker_verification import (
16-
PretrainedSpeakerEmbedding,
17-
)
15+
from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
1816
from pyannote.audio.utils.powerset import Powerset
1917

2018
_has_pyannote = True

0 commit comments

Comments
 (0)
Please sign in to comment.