Add documentation page (#209)

juanmc2005 · juanmc2005 · commit de9fd74c9824 · 2023-11-19T11:57:01.000+01:00
* Add initial docs

* Include README in docs page

* Improve README

* Update README

* Add docs requirements.txt

* Add readthedocs config file

* Fix links

* Add some docstrings

* Ignore private attrs in docs

* Add some docstrings. Effectively ignore __init__

* Blacken code

* Blacken code with good version

* Clean up some code

* Fix wrong html title
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,16 @@
+version: 2
+
+build:
+  os: "ubuntu-22.04"
+  tools:
+    python: "3.10"
+
+python:
+  install:
+    - requirements: docs/requirements.txt
+    # Install diart before building the docs
+    - method: pip
+      path: .
+
+sphinx:
+  configuration: docs/conf.py
diff --git a/README.md b/README.md
@@ -1,7 +1,11 @@
 <br/>
 
 <p align="center">
-<img width="50%" src="/logo.jpg" title="Logo" />
+<img width="50%" src="https://raw.githubusercontent.com/juanmc2005/diart/main/logo.jpg" title="Logo" />
+</p>
+
+<p align="center">
+<i>🌿 Build AI-powered real-time audio applications in a breeze 🌿</i>
 </p>
 
 <p align="center">
@@ -56,9 +60,21 @@
 <br/>
 
 <p align="center">
-<img width="100%" src="/demo.gif" title="Real-time diarization example" />
+<img width="100%" src="https://github.com/juanmc2005/diart/blob/main/demo.gif?raw=true" title="Real-time diarization example" />
 </p>
 
+## ⚡ Quick introduction
+
+Diart is a python framework to build AI-powered real-time audio applications. With diart you can
+create your own AI pipeline, benchmark it, tune its hyper-parameters, and even serve it on the web using websockets.
+
+**We provide pre-trained AI pipelines for:**
+
+- Speaker Diarization
+- Voice Activity Detection
+- Transcription (coming soon)
+- [Speaker-Aware Transcription](https://betterprogramming.pub/color-your-captions-streamlining-live-transcriptions-with-diart-and-openais-whisper-6203350234ef) (coming soon)
+
 ## 💾 Installation
 
 1) Create environment:
@@ -289,13 +305,18 @@ prediction = inference()
 
 ## 🔬 Powered by research
 
-Diart is the official implementation of the paper *[Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation](/paper.pdf)* by [Juan Manuel Coria](https://juanmc2005.github.io/), [Hervé Bredin](https://herve.niderb.fr), [Sahar Ghannay](https://saharghannay.github.io/) and [Sophie Rosset](https://perso.limsi.fr/rosset/).
+Diart is the official implementation of the paper
+[Overlap-aware low-latency online speaker diarization based on end-to-end local segmentation](https://github.com/juanmc2005/diart/blob/main/paper.pdf)
+by [Juan Manuel Coria](https://juanmc2005.github.io/),
+[Hervé Bredin](https://herve.niderb.fr),
+[Sahar Ghannay](https://saharghannay.github.io/)
+and [Sophie Rosset](https://perso.limsi.fr/rosset/).
 
 
 > We propose to address online speaker diarization as a combination of incremental clustering and local diarization applied to a rolling buffer updated every 500ms. Every single step of the proposed pipeline is designed to take full advantage of the strong ability of a recently proposed end-to-end overlap-aware segmentation to detect and separate overlapping speakers. In particular, we propose a modified version of the statistics pooling layer (initially introduced in the x-vector architecture) to give less weight to frames where the segmentation model predicts simultaneous speakers. Furthermore, we derive cannot-link constraints from the initial segmentation step to prevent two local speakers from being wrongfully merged during the incremental clustering step. Finally, we show how the latency of the proposed approach can be adjusted between 500ms and 5s to match the requirements of a particular use case, and we provide a systematic analysis of the influence of latency on the overall performance (on AMI, DIHARD and VoxConverse).
 
 <p align="center">
-<img height="400" src="/figure1.png" title="Visual explanation of the system" width="325" />
+<img height="400" src="https://github.com/juanmc2005/diart/blob/main/figure1.png?raw=true" title="Visual explanation of the system" width="325" />
 </p>
 
 ## 📗 Citation
@@ -315,7 +336,7 @@ If you found diart useful, please make sure to cite our paper:
 
 ## 👨‍💻 Reproducibility
 
-![Results table](/table1.png)
+![Results table](https://github.com/juanmc2005/diart/blob/main/table1.png?raw=true)
 
 Diart aims to be lightweight and capable of real-time streaming in practical scenarios.
 Its performance is very close to what is reported in the paper (and sometimes even a bit better).
@@ -367,9 +388,13 @@ if __name__ == "__main__":  # Needed for multiprocessing
 This pre-calculates model outputs in batches, so it runs a lot faster.
 See `diart.benchmark -h` for more options.
 
-For convenience and to facilitate future comparisons, we also provide the [expected outputs](/expected_outputs) of the paper implementation in RTTM format for every entry of Table 1 and Figure 5. This includes the VBx offline topline as well as our proposed online approach with latencies 500ms, 1s, 2s, 3s, 4s, and 5s.
+For convenience and to facilitate future comparisons, we also provide the
+[expected outputs](https://github.com/juanmc2005/diart/tree/main/expected_outputs)
+of the paper implementation in RTTM format for every entry of Table 1 and Figure 5.
+This includes the VBx offline topline as well as our proposed online approach with
+latencies 500ms, 1s, 2s, 3s, 4s, and 5s.
 
-![Figure 5](/figure5.png)
+![Figure 5](https://github.com/juanmc2005/diart/blob/main/figure5.png?raw=true)
 
 ## 📑 License
 
diff --git a/docs/Makefile b/docs/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?=
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/docs/_static/logo.png b/docs/_static/logo.png
diff --git a/docs/conf.py b/docs/conf.py
@@ -0,0 +1,65 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# For the full list of built-in configuration values, see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Project information -----------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
+
+project = "diart"
+copyright = "2023, Juan Manuel Coria"
+author = "Juan Manuel Coria"
+release = "v0.9"
+
+# -- General configuration ---------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
+
+extensions = [
+    "autoapi.extension",
+    "sphinx.ext.coverage",
+    "sphinx.ext.napoleon",
+    "sphinx_mdinclude",
+]
+
+autoapi_dirs = ["../src/diart"]
+autoapi_options = [
+    "members",
+    "undoc-members",
+    "show-inheritance",
+    "show-module-summary",
+    "special-members",
+    "imported-members",
+]
+
+templates_path = ["_templates"]
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
+
+# -- Options for autodoc ----------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#configuration
+
+# Automatically extract typehints when specified and place them in
+# descriptions of the relevant function/method.
+autodoc_typehints = "description"
+
+# Don't show class signature with the class' name.
+autodoc_class_signature = "separated"
+
+# -- Options for HTML output -------------------------------------------------
+# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
+
+html_theme = "furo"
+html_static_path = ["_static"]
+html_logo = "_static/logo.png"
+html_title = "diart documentation"
+
+
+def skip_submodules(app, what, name, obj, skip, options):
+    return (
+        name.endswith("__init__")
+        or name.startswith("diart.console")
+        or name.startswith("diart.argdoc")
+    )
+
+
+def setup(sphinx):
+    sphinx.connect("autoapi-skip-member", skip_submodules)
diff --git a/docs/index.rst b/docs/index.rst
@@ -0,0 +1,11 @@
+Get started with diart
+======================
+
+.. mdinclude:: ../README.md
+
+
+Useful Links
+============
+
+.. toctree::
+   :maxdepth: 1
diff --git a/docs/make.bat b/docs/make.bat
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=.
+set BUILDDIR=_build
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.https://www.sphinx-doc.org/
+	exit /b 1
+)
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/docs/requirements.txt b/docs/requirements.txt
@@ -0,0 +1,4 @@
+sphinx==6.2.1
+sphinx-autoapi==3.0.0
+sphinx-mdinclude==0.5.3
+furo==2023.9.10
diff --git a/src/diart/blocks/base.py b/src/diart/blocks/base.py
@@ -11,12 +11,28 @@
 
 @dataclass
 class HyperParameter:
+    """Represents a pipeline hyper-parameter that can be tuned by diart"""
+
     name: Text
+    """Name of the hyper-parameter (e.g. tau_active)"""
     low: float
+    """Lowest value that this parameter can take"""
     high: float
+    """Highest value that this parameter can take"""
 
     @staticmethod
     def from_name(name: Text) -> "HyperParameter":
+        """Create a HyperParameter object given its name.
+
+        Parameters
+        ----------
+        name: str
+            Name of the hyper-parameter
+
+        Returns
+        -------
+        HyperParameter
+        """
         if name == "tau_active":
             return TauActive
         if name == "rho_update":
@@ -32,24 +48,34 @@ def from_name(name: Text) -> "HyperParameter":
 
 
 class PipelineConfig(ABC):
+    """Configuration containing the required
+    parameters to build and run a pipeline"""
+
     @property
     @abstractmethod
     def duration(self) -> float:
+        """The duration of an input audio chunk (in seconds)"""
         pass
 
     @property
     @abstractmethod
     def step(self) -> float:
+        """The step between two consecutive input audio chunks (in seconds)"""
         pass
 
     @property
     @abstractmethod
     def latency(self) -> float:
+        """The algorithmic latency of the pipeline (in seconds).
+        At time `t` of the audio stream, the pipeline will
+        output predictions for time `t - latency`.
+        """
         pass
 
     @property
     @abstractmethod
     def sample_rate(self) -> int:
+        """The sample rate of the input audio stream"""
         pass
 
     def get_file_padding(self, filepath: FilePath) -> Tuple[float, float]:
@@ -60,6 +86,8 @@ def get_file_padding(self, filepath: FilePath) -> Tuple[float, float]:
 
 
 class Pipeline(ABC):
+    """Represents a streaming audio pipeline"""
+
     @staticmethod
     @abstractmethod
     def get_config_class() -> type:
@@ -92,4 +120,18 @@ def set_timestamp_shift(self, shift: float):
     def __call__(
         self, waveforms: Sequence[SlidingWindowFeature]
     ) -> Sequence[Tuple[Any, SlidingWindowFeature]]:
+        """Runs the next steps of the pipeline
+        given a list of consecutive audio chunks.
+
+        Parameters
+        ----------
+        waveforms: Sequence[SlidingWindowFeature]
+            Consecutive chunk waveforms for the pipeline to ingest
+
+        Returns
+        -------
+        Sequence[Tuple[Any, SlidingWindowFeature]]
+            For each input waveform, a tuple containing
+            the pipeline output and its respective audio
+        """
         pass
diff --git a/src/diart/blocks/diarization.py b/src/diart/blocks/diarization.py
@@ -157,6 +157,18 @@ def reset(self):
     def __call__(
         self, waveforms: Sequence[SlidingWindowFeature]
     ) -> Sequence[tuple[Annotation, SlidingWindowFeature]]:
+        """Diarize the next audio chunks of an audio stream.
+
+        Parameters
+        ----------
+        waveforms: Sequence[SlidingWindowFeature]
+            A sequence of consecutive audio chunks from an audio stream.
+
+        Returns
+        -------
+        Sequence[tuple[Annotation, SlidingWindowFeature]]
+            Speaker diarization of each chunk alongside their corresponding audio.
+        """
         batch_size = len(waveforms)
         msg = "Pipeline expected at least 1 input"
         assert batch_size >= 1, msg
diff --git a/src/diart/blocks/embedding.py b/src/diart/blocks/embedding.py
@@ -69,7 +69,13 @@ def __call__(
 
 
 class OverlappedSpeechPenalty:
-    """
+    """Applies a penalty on overlapping speech and low-confidence regions to speaker segmentation scores.
+
+    .. note::
+        For more information, see `"Overlap-Aware Low-Latency Online Speaker Diarization
+        based on End-to-End Local Segmentation" <https://github.com/juanmc2005/diart/blob/main/paper.pdf>`_
+        (Section 2.2.1 Segmentation-driven speaker embedding). This block implements Equation 2.
+
     Parameters
     ----------
     gamma: float, optional
diff --git a/src/diart/models.py b/src/diart/models.py
@@ -12,9 +12,7 @@
 
 try:
     from pyannote.audio import Model
-    from pyannote.audio.pipelines.speaker_verification import (
-        PretrainedSpeakerEmbedding,
-    )
+    from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
     from pyannote.audio.utils.powerset import Powerset
 
     _has_pyannote = True