Rebreda · Rebreda · Mar 7, 2026 · Mar 5, 2026 · Mar 5, 2026 · Mar 7, 2026
diff --git a/.dockerignore b/.dockerignore
@@ -0,0 +1,15 @@
+.git
+.venv
+.env
+__pycache__
+*.pyc
+*.pyo
+*.egg-info
+dist
+build
+.pytest_cache
+.mypy_cache
+.ruff_cache
+tests/
+*.md
+screenshot.png
diff --git a/.env.example b/.env.example
@@ -0,0 +1,39 @@
+# Copy this file to .env and edit to suit your system.
+# .env is gitignored — .env.example is the reference.
+
+# ── GPU ───────────────────────────────────────────────────────────────────────
+
+# Which GPU(s) ROCm should use. "0" = first GPU only (recommended for
+# multi-GPU systems with unmatched cards, avoids imbalance segfaults).
+# Set to "0,1" to use both GPUs if they are the same model.
+HIP_VISIBLE_DEVICES=0
+
+# Enables Flash Efficient and Memory Efficient attention on RDNA3+ GPUs (RX 7000 / 9000).
+# Set to empty string to disable (PyTorch will log a warning and use the slow path).
+TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
+
+# Set ONLY if your GPU's gfx version isn't natively supported by ROCm.
+# WARNING: Do NOT set this to an empty string — an empty string is not the
+# same as unset and will cause ROCm to fail. Only uncomment if needed.
+# Check your GPU: rocm-smi --showproductname
+#   RX 6000 series (RDNA2): 10.3.0
+#   RX 7000 series (RDNA3): 11.0.0
+# HSA_OVERRIDE_GFX_VERSION=10.3.0
+
+# ── paths (override if your data lives elsewhere) ─────────────────────────────
+
+# Host directory containing audio_clips/ and manifest.jsonl
+# Default: ~/.listenr
+# LISTENR_DATA=/home/you/.listenr
+
+# Host directory for train/dev/test dataset splits (written by build-dataset)
+# Default: ~/listenr_dataset
+# LISTENR_DATASET=/home/you/listenr_dataset
+
+# Host directory for LoRA adapter checkpoints (written by finetune)
+# Default: ~/listenr_finetune
+# LISTENR_FINETUNE=/home/you/listenr_finetune
+
+# HuggingFace model cache (shared with host to avoid re-downloads)
+# Default: ~/.cache/huggingface
+# HF_CACHE=/home/you/.cache/huggingface
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,67 @@
+# Listenr — AMD ROCm fine-tuning image
+#
+# Base: official AMD-tested ROCm 7.2 + PyTorch 2.9.1 image (Python 3.12).
+# Ref:  https://rocm.docs.amd.com/en/latest/how_to/pytorch_install/pytorch_install.html
+#
+# Pull:  podman pull rocm/pytorch:rocm7.2_ubuntu24.04_py3.12_pytorch_release_2.9.1
+# Build: podman build -t listenr-rocm .
+# Run:   see docs/finetune-amd.md
+#
+# NOTE: sounddevice (microphone capture) will not work inside this container.
+#       This image is intended for fine-tuning only (listenr-finetune /
+#       listenr-build-dataset), not real-time audio capture.
+#
+# NOTE: listenr requires Python >=3.13 for local installs; this image uses
+#       Python 3.12 (the AMD-tested version). --ignore-requires-python is safe
+#       here — the codebase uses no 3.13-only syntax.
+
+FROM rocm/pytorch:rocm7.2_ubuntu24.04_py3.12_pytorch_release_2.9.1
+
+# ── system packages ──────────────────────────────────────────────────────────
+# libsndfile1   : required by soundfile (audio I/O in finetune data pipeline)
+# ffmpeg        : optional but useful for converting audio files
+#
+# IMPORTANT: We must not upgrade libdrm, mesa, or any ROCm library — doing so
+# breaks the GPU stack that ships with the base image. Use --no-upgrade to
+# install only what is missing (both packages are typically absent from the
+# base image but their deps like libdrm are already present).
+RUN apt-get update && apt-get install -y --no-install-recommends --no-upgrade \
+        libsndfile1 \
+        ffmpeg \
+    && rm -rf /var/lib/apt/lists/*
+
+# ── project install ──────────────────────────────────────────────────────────
+WORKDIR /app
+COPY . /app
+
+# Freeze the ROCm-aware torch/torchvision/torchaudio/triton that ship in the
+# base image before installing finetune extras. Without this, pip resolves
+# transformers' torch dependency and pulls a CPU-only build from PyPI.
+# pip show works regardless of whether torch was installed via URL or wheel.
+RUN pip show torch torchvision torchaudio triton 2>/dev/null \
+    | awk '/^Name:/{name=$2} /^Version:/{print name "==" $2}' \
+    > /tmp/torch-constraints.txt \
+    && cat /tmp/torch-constraints.txt
+
+# Install core + finetune extras, pinning torch to the ROCm version above.
+# --ignore-requires-python: base image is Python 3.12; constraint is >=3.13.
+RUN pip install --no-cache-dir \
+        --ignore-requires-python \
+        --constraint /tmp/torch-constraints.txt \
+        -e ".[finetune]"
+
+# ── runtime defaults ─────────────────────────────────────────────────────────
+# Pin to GPU 0 by default to avoid imbalance crashes on multi-GPU systems.
+# Override at runtime: -e HIP_VISIBLE_DEVICES=0,1
+#
+# Do NOT set HSA_OVERRIDE_GFX_VERSION here — an empty string is not the same
+# as unset and causes ROCm to fail. Set it at runtime only if your GPU needs
+# it (e.g. -e HSA_OVERRIDE_GFX_VERSION=10.3.0 for RX 6000 series).
+#
+# TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL: enables Flash Efficient and
+# Memory Efficient attention on newer AMD GPUs (RDNA 3 / RDNA 4). Without
+# this, PyTorch logs a UserWarning and falls back to a slower implementation.
+ENV HIP_VISIBLE_DEVICES="0" \
+    TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL="1"
+
+CMD ["listenr-finetune", "--help"]
diff --git a/README.md b/README.md
@@ -11,193 +11,39 @@ Listenr is a privacy-first tool for collecting real-world audio and high-quality
 - **Open models.** Uses Whisper.cpp for transcription and any GGUF-compatible LLM for post-processing correction.
 - **Automatic correction pipeline.** A local LLM cleans up punctuation, grammar, and homophones — producing a higher-quality training corpus than raw Whisper output alone.
 - **Real-world data.** Collects natural, conversational speech in realistic environments.
-- **Dataset-ready output.** Every utterance is saved with its audio clip, a per-clip JSON, and appended to a single `manifest.jsonl`. One command builds train/dev/test splits.
+- **Dataset-ready output.** Every utterance is saved with its audio clip and appended to a single `manifest.jsonl`. One command builds train/dev/test splits.
 
 ## How It Works
 
-1. **Capture.** `listenr` streams your microphone to Lemonade's `/realtime` WebSocket in ~85 ms chunks. Audio is captured at the device's native rate and resampled to 16 kHz before sending.
-2. **VAD.** Lemonade's built-in server-side voice activity detection segments speech boundaries automatically.
-3. **Transcribe.** Lemonade runs Whisper.cpp on each speech segment and streams back interim and final transcripts.
-4. **Correct (optional).** The final transcript is sent to a local LLM via Lemonade's chat completions API. The LLM returns a cleaned transcript, an `is_improved` flag, and content `categories`.
-5. **Save.** Each utterance is saved as a `.wav` clip and appended to `manifest.jsonl`.
-6. **Build dataset.** `build_dataset.py` reads the manifest and writes train/dev/test CSV splits.
+1. **Capture** — `listenr` streams your microphone to Lemonade's `/realtime` WebSocket in ~85 ms chunks, resampled to 16 kHz.
+2. **VAD** — Lemonade's built-in voice activity detection segments speech boundaries automatically.
+3. **Transcribe** — Lemonade runs Whisper.cpp on each segment and streams back transcripts.
+4. **Correct (optional)** — a local LLM cleans the transcript and tags content categories.
+5. **Save** — each utterance is saved as a `.wav` clip and a line in `manifest.jsonl`.
+6. **Build dataset** — `listenr-build-dataset` writes train/dev/test splits from the manifest.
 
-
-
-## Requirements
-
-- [Lemonade Server](https://lemonade-server.ai) running on `localhost:8000`
-- Python 3.13+ with `uv` (recommended) or `pip`
-- A microphone accessible via PipeWire or ALSA
-
-## Installation
+## Quick Start
 
 ```bash
 git clone https://github.com/Rebreda/listenr
 cd listenr
 uv pip install -e .
-```
-
-Then run commands via `uv run` (no activation needed):
-
-```bash
+lemonade-server serve   # in another terminal
 uv run listenr
 ```
 
-Or activate the venv once per session:
+See [docs/setup.md](docs/setup.md) for full installation instructions.
 
-```bash
-source .venv/bin/activate
-listenr
-```
+## Documentation
 
-## Start Lemonade Server
-
-```bash
-lemonade-server serve
-```
-
-Listenr will automatically call `POST /api/v1/load` on startup to load the configured models. On first use, Lemonade will download them.
-
-## Usage
-
-### CLI — Real-Time Microphone Capture
-
-```bash
-# Record and save everything (default)
-uv run listenr
-
-# Don't save to disk — just print transcriptions
-uv run listenr --no-save
-
-# Also print the raw Whisper output before LLM correction
-uv run listenr --show-raw
-
-# Verbose debug output (WebSocket messages, mic RMS, etc.)
-uv run listenr --debug
-```
-
-Example output:
-
-```
-🎤 Listenr CLI — streaming to Lemonade
-   Model  : Whisper-Large-v3-Turbo
-   WS URL : ws://localhost:9000/realtime?model=Whisper-Large-v3-Turbo
-   LLM    : enabled (gpt-oss-20b-mxfp4-GGUF)
-   Save   : yes → ~/.listenr/audio_clips
-   Press Ctrl+C to stop.
-
-  [ASR] I'm going to the store to buy some milk.  [dictation]
-  [SAVED] ~/.listenr/audio_clips/audio/2026-02-28/clip_2026-02-28_abc123.wav (2.4s)
-```
-
-Press **Ctrl+C** to stop. Listenr will unload all models from Lemonade before exiting.
-
-### Build a Dataset
-
-After collecting recordings, generate train/dev/test splits from `manifest.jsonl`:
-
-```bash
-# Default: 80/10/10 CSV splits in ~/listenr_dataset/
-uv run listenr-build-dataset
-
-# Custom output directory and split ratio
-uv run listenr-build-dataset --output ~/my_dataset --split 90/5/5
-
-# Exclude very short clips
-uv run listenr-build-dataset --min-duration 1.0
-
-# HuggingFace datasets format
-uv run listenr-build-dataset --format hf
-
-# Preview stats without writing files
-uv run listenr-build-dataset --dry-run
-```
-
-Output CSV columns: `uuid`, `split`, `audio_path`, `raw_transcription`, `corrected_transcription`, `is_improved`, `categories`, `duration_s`, `sample_rate`, `whisper_model`, `llm_model`, `timestamp`.
-
-### Batch Transcription
-
-Transcribe a single audio file:
-
-```bash
-python -m listenr.unified_asr --audio path/to/audio.wav --whisper-model Whisper-Large-v3-Turbo
-
-# With LLM correction
-python -m listenr.unified_asr --llm --audio path/to/audio.wav
-```
-
-## Configuration
-
-Config is created with defaults at `~/.config/listenr/config.ini` on first run.
-
-
-### Finding your input device
-
-```bash
-python -c "import sounddevice as sd; [print(f'{i}: {d[\"name\"]}') for i, d in enumerate(sd.query_devices()) if d['max_input_channels'] > 0]"
-```
-
-Set `input_device` to the device name (partial match works) or its index number.
-
-### VAD Tuning
-
-| Goal | Setting |
+| Guide | Description |
 |---|---|
-| Shorter segments | Lower `silence_duration_ms` (e.g. `500`) |
-| Avoid cutting off speech | Raise `silence_duration_ms` (e.g. `1200`) |
-| Ignore background noise | Raise `threshold` (e.g. `0.05`) |
-| Capture quiet speech | Lower `threshold` (e.g. `0.005`) |
-
-
-### manifest.jsonl
-
-One JSON object per line — append-only, easy to query:
-
-```bash
-# All improved clips
-jq 'select(.is_improved == true)' ~/.listenr/audio_clips/manifest.jsonl
-
-# Clips tagged as commands
-jq 'select(.categories[] == "command")' ~/.listenr/audio_clips/manifest.jsonl
-
-# Load into pandas
-python -c "import pandas as pd; df = pd.read_json('~/.listenr/audio_clips/manifest.jsonl', lines=True); print(df.head())"
-```
-
-### manifest.jsonl
-
-**No transcriptions appear / `[SAVE SKIPPED] pcm_buffer is empty`**
-- Check that Lemonade is running: `curl http://localhost:8000/api/v1/health`
-- Run with `--debug` to see mic RMS values and WebSocket messages
-- If RMS stays near `0.000`, your `input_device` is wrong — list devices and update config (see above)
-- Lower `threshold` in `[VAD]` if your mic is quiet
-
-**LLM correction not working / model answers the transcription instead of fixing it**
-- Confirm `LLM.enabled = true` and the model name matches one loaded in Lemonade
-- Check `curl http://localhost:8000/api/v1/models` to see loaded models
-- LLM errors are non-fatal — the raw transcript is saved regardless
-
-**`Could not discover Lemonade websocket port`**
-Lemonade is not running or not reachable on port 8000. Run `lemonade-server serve` first.
-
-**Too many / too few segments**
-Adjust `[VAD] silence_duration_ms` and `threshold` in your config.
-
-## Available Models (via Lemonade)
-
-| Model | Type | Notes |
-|---|---|---|
-| `Whisper-Base` | ASR | Fast, lower accuracy |
-| `Whisper-Large-v3-Turbo` | ASR | Best accuracy |
-| `gpt-oss-20b-mxfp4-GGUF` | LLM | Good correction quality |
-| `Gemma-3-4b-it-GGUF` | LLM | Lighter alternative |
-| `DeepSeek-Qwen3-8B-GGUF` | LLM | Lighter alternative |
-
-List all models available on your Lemonade instance:
-```bash
-curl -s http://localhost:8000/api/v1/models | python3 -c "import sys,json; [print(m['id']) for m in json.load(sys.stdin)['data']]"
-```
+| [docs/setup.md](docs/setup.md) | Installation, Lemonade Server, microphone setup |
+| [docs/configuration.md](docs/configuration.md) | Full `config.ini` reference, VAD tuning, available models |
+| [docs/recording.md](docs/recording.md) | CLI usage, how recording works, batch transcription |
+| [docs/dataset.md](docs/dataset.md) | Building train/dev/test splits, CSV and HF formats |
+| [docs/finetune-amd.md](docs/finetune-amd.md) | Fine-tuning Whisper on AMD GPU via ROCm + Podman |
+| [docs/troubleshooting.md](docs/troubleshooting.md) | Common errors and fixes |
 
 ## License
 
@@ -208,4 +54,3 @@ Mozilla Public License Version 2.0 — see `LICENSE`.
 - [Lemonade Server](https://lemonade-server.ai) — unified local inference API
 - [whisper.cpp](https://github.com/ggerganov/whisper.cpp) — fast local ASR
 - [llama.cpp](https://github.com/ggerganov/llama.cpp) — fast local LLMs
-