Offline voice dictation for Windows. Press a hotkey or connect your Bluetooth earbuds, speak naturally, and get cleaned-up text pasted into the active app. No cloud. No subscription. Just your voice and your machine.
- Parakeet TDT transcription - NVIDIA NeMo ASR, more accurate than Whisper with no hallucinations
- Bluetooth earbud auto-lifecycle - Auto-starts when earbuds connect, auto-stops on disconnect (privacy-first, never falls back to laptop mic)
- LLM text cleanup - Local Ollama fixes grammar, punctuation, removes filler words
- Hotkey-activated - Press Scroll Lock to start/stop recording
- Voice Activity Detection - Silero VAD with audio preprocessing (high-pass filter + gain normalization)
- Voice commands - "new paragraph", "send", "delete last", slash commands
- Self-healing runtime - Watchdog auto-restarts on crash with circuit breaker, health monitoring for ASR/LLM backends
- System tray - Runs quietly in the background
- 100% local - Your audio never leaves your machine
[Hotkey / BT Earbuds] -> Mic -> VAD + HPF + AGC -> Parakeet ASR -> LLM Cleanup -> Paste
| |
(Docker, local) (Ollama, local)
All processing happens on your machine. Audio goes through a high-pass filter and gain normalization, then to a local Parakeet TDT container for transcription, then optionally through Ollama for text cleanup.
- Windows 10/11
- Python 3.11+
- Docker with an NVIDIA GPU
- Ollama (optional, for LLM text cleanup)
Parakeet TDT (recommended):
docker compose -f spike/parakeet/docker-compose.parakeet.yml up -dOr run directly:
docker run -d --gpus all -p 9410:8000 \
shinejh0528/parakeet-tdt-0.6b-v2:py3.12.10_torch-cu128Alternative: faster-whisper (legacy)
docker run -d --gpus all -p 10300:10300 \
rhasspy/wyoming-whisper:latest \
--model large-v3 --language enSet whisper.backend: wyoming and whisper.port: 10300 in config.yaml.
docker run -d --gpus all -p 11434:11434 \
-v ollama:/root/.ollama ollama/ollama
docker exec ollama ollama pull qwen3:14bgit clone https://github.com/cj-elevate/whisper-llm.git
cd whisper-llm
pip install -r requirements.txtcp config.example.yaml config.yaml
# Edit config.yaml to customize settingspython src/main.pyFor daily use, enable the watchdog via the system tray menu ("Run on Startup"). This launches a lightweight supervisor via Task Scheduler that auto-restarts the app after crashes and caps restart attempts to prevent loops.
Press Scroll Lock to start/stop recording. Speak naturally, and text appears in your active window.
Edit config.yaml to customize. Changes auto-restart the app.
| Setting | Default | Description |
|---|---|---|
whisper.backend |
parakeet |
ASR backend: parakeet or wyoming |
whisper.port |
9410 |
ASR server port (9410 for Parakeet, 10300 for Wyoming) |
hotkey |
scroll lock |
Key to toggle recording |
audio.preferred_device_pattern |
none | Substring match for preferred mic (e.g., "Buds3 Pro") |
audio.bud_presence_lifecycle |
true |
Auto start/stop on BT earbud connect/disconnect |
llm.enabled |
true |
Enable LLM text cleanup |
llm.model |
qwen3:14b |
Ollama model for cleanup |
output.method |
auto |
Text insertion: clipboard / sendinput / auto |
corrections.enabled |
true |
Post-transcription word corrections |
See config.example.yaml for all options.
| Command | Effect |
|---|---|
| "send" | Paste text and press Enter |
| "new paragraph" | Insert blank line |
| "new line" | Insert line break |
| "period" / "comma" / "dash" | Insert punctuation |
| "slash [command]" | Insert slash command (e.g., "slash team" -> /team) |
| "delete last" | Undo last output (Ctrl+Z) |
Slash commands spoken alone auto-press Enter for hands-free CLI use.
ASR sometimes misrecognizes domain-specific words. Add corrections in config.yaml:
corrections:
enabled: true
words:
cloud: Claude
cloud code: Claude CodeCorrections are case-insensitive, whole-word, longest-first. Standalone command aliases (entire utterance matches) are also supported via the standalone_commands config section.
| Mode | Description |
|---|---|
raw |
No processing, direct transcription |
clean |
Fix grammar, punctuation, remove fillers (default) |
Switch modes via the system tray menu.
See TROUBLESHOOTING.md for common issues.
src/
main.py # Entry point, bootstrap sequence
app.py # System tray, hotkey, lifecycle
pipeline.py # Async audio processing pipeline
audio.py # Microphone capture, VAD, device monitoring
transcriber_parakeet.py # Parakeet TDT HTTP client
transcriber.py # Wyoming protocol client (faster-whisper)
llm.py # Ollama integration
output.py # Text injection (clipboard/SendInput)
config.py # Pydantic configuration
runtime.py # Singleton, crash hooks, thread supervision
watchdog.pyw # Supervisor with circuit breaker
health.py # Backend health monitoring
commands.py # Voice command processing
notify.py # Desktop notifications
- No cloud services - All processing is local
- No telemetry - No data collection
- No network calls - Only connects to localhost containers
- Audio stays local - Never transmitted anywhere
- Privacy-first BT - Earbuds disconnect = immediate stop, no fallback to laptop mic
Contributions welcome! Please fork, branch, and submit a pull request.
- NVIDIA NeMo / Parakeet TDT - Recommended ASR model
- faster-whisper - Legacy ASR backend
- Wyoming Protocol - Audio streaming protocol
- Ollama - Local LLM server
- Silero VAD - Voice activity detection
