whisper-llm

Offline voice dictation for Windows. Press a hotkey or connect your Bluetooth earbuds, speak naturally, and get cleaned-up text pasted into the active app. No cloud. No subscription. Just your voice and your machine.

Features

Parakeet TDT transcription - NVIDIA NeMo ASR, more accurate than Whisper with no hallucinations
Bluetooth earbud auto-lifecycle - Auto-starts when earbuds connect, auto-stops on disconnect (privacy-first, never falls back to laptop mic)
LLM text cleanup - Local Ollama fixes grammar, punctuation, removes filler words
Hotkey-activated - Press Scroll Lock to start/stop recording
Voice Activity Detection - Silero VAD with audio preprocessing (high-pass filter + gain normalization)
Voice commands - "new paragraph", "send", "delete last", slash commands
Self-healing runtime - Watchdog auto-restarts on crash with circuit breaker, health monitoring for ASR/LLM backends
System tray - Runs quietly in the background
100% local - Your audio never leaves your machine

How It Works

[Hotkey / BT Earbuds] -> Mic -> VAD + HPF + AGC -> Parakeet ASR -> LLM Cleanup -> Paste
                                                        |                |
                                                   (Docker, local)  (Ollama, local)

All processing happens on your machine. Audio goes through a high-pass filter and gain normalization, then to a local Parakeet TDT container for transcription, then optionally through Ollama for text cleanup.

Requirements

Windows 10/11
Python 3.11+
Docker with an NVIDIA GPU
Ollama (optional, for LLM text cleanup)

Quick Start

1. Start the ASR backend

Parakeet TDT (recommended):

docker compose -f spike/parakeet/docker-compose.parakeet.yml up -d

Or run directly:

docker run -d --gpus all -p 9410:8000 \
  shinejh0528/parakeet-tdt-0.6b-v2:py3.12.10_torch-cu128

Alternative: faster-whisper (legacy)

docker run -d --gpus all -p 10300:10300 \
  rhasspy/wyoming-whisper:latest \
  --model large-v3 --language en

Set whisper.backend: wyoming and whisper.port: 10300 in config.yaml.

2. Start Ollama (optional)

docker run -d --gpus all -p 11434:11434 \
  -v ollama:/root/.ollama ollama/ollama

docker exec ollama ollama pull qwen3:14b

3. Install whisper-llm

git clone https://github.com/cj-elevate/whisper-llm.git
cd whisper-llm
pip install -r requirements.txt

4. Configure

cp config.example.yaml config.yaml
# Edit config.yaml to customize settings

5. Run

python src/main.py

For daily use, enable the watchdog via the system tray menu ("Run on Startup"). This launches a lightweight supervisor via Task Scheduler that auto-restarts the app after crashes and caps restart attempts to prevent loops.

Press Scroll Lock to start/stop recording. Speak naturally, and text appears in your active window.

Configuration

Edit config.yaml to customize. Changes auto-restart the app.

Setting	Default	Description
`whisper.backend`	`parakeet`	ASR backend: `parakeet` or `wyoming`
`whisper.port`	`9410`	ASR server port (9410 for Parakeet, 10300 for Wyoming)
`hotkey`	`scroll lock`	Key to toggle recording
`audio.preferred_device_pattern`	none	Substring match for preferred mic (e.g., "Buds3 Pro")
`audio.bud_presence_lifecycle`	`true`	Auto start/stop on BT earbud connect/disconnect
`llm.enabled`	`true`	Enable LLM text cleanup
`llm.model`	`qwen3:14b`	Ollama model for cleanup
`output.method`	`auto`	Text insertion: clipboard / sendinput / auto
`corrections.enabled`	`true`	Post-transcription word corrections

See config.example.yaml for all options.

Voice Commands

Command	Effect
"send"	Paste text and press Enter
"new paragraph"	Insert blank line
"new line"	Insert line break
"period" / "comma" / "dash"	Insert punctuation
"slash [command]"	Insert slash command (e.g., "slash team" -> `/team`)
"delete last"	Undo last output (Ctrl+Z)

Slash commands spoken alone auto-press Enter for hands-free CLI use.

Word Corrections

ASR sometimes misrecognizes domain-specific words. Add corrections in config.yaml:

corrections:
  enabled: true
  words:
    cloud: Claude
    cloud code: Claude Code

Corrections are case-insensitive, whole-word, longest-first. Standalone command aliases (entire utterance matches) are also supported via the standalone_commands config section.

LLM Modes

Mode	Description
`raw`	No processing, direct transcription
`clean`	Fix grammar, punctuation, remove fillers (default)

Switch modes via the system tray menu.

Troubleshooting

See TROUBLESHOOTING.md for common issues.

Project Structure

src/
  main.py               # Entry point, bootstrap sequence
  app.py                # System tray, hotkey, lifecycle
  pipeline.py           # Async audio processing pipeline
  audio.py              # Microphone capture, VAD, device monitoring
  transcriber_parakeet.py  # Parakeet TDT HTTP client
  transcriber.py        # Wyoming protocol client (faster-whisper)
  llm.py                # Ollama integration
  output.py             # Text injection (clipboard/SendInput)
  config.py             # Pydantic configuration
  runtime.py            # Singleton, crash hooks, thread supervision
  watchdog.pyw          # Supervisor with circuit breaker
  health.py             # Backend health monitoring
  commands.py           # Voice command processing
  notify.py             # Desktop notifications

Privacy

No cloud services - All processing is local
No telemetry - No data collection
No network calls - Only connects to localhost containers
Audio stays local - Never transmitted anywhere
Privacy-first BT - Earbuds disconnect = immediate stop, no fallback to laptop mic

Contributing

Contributions welcome! Please fork, branch, and submit a pull request.

License

MIT License

Acknowledgments

NVIDIA NeMo / Parakeet TDT - Recommended ASR model
faster-whisper - Legacy ASR backend
Wyoming Protocol - Audio streaming protocol
Ollama - Local LLM server
Silero VAD - Voice activity detection

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.blueprint_cache		.blueprint_cache
assets		assets
docs/plans		docs/plans
lib		lib
scripts		scripts
spike/parakeet		spike/parakeet
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
HANDOFF.md		HANDOFF.md
LICENSE		LICENSE
README.md		README.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
config.example.yaml		config.example.yaml
config.yaml		config.yaml
doctor.py		doctor.py
health-fv2dntjr.tmp		health-fv2dntjr.tmp
requirements.txt		requirements.txt
restart-stderr.txt		restart-stderr.txt
restart-stdout.txt		restart-stdout.txt
startup_err.txt		startup_err.txt
stderr.txt		stderr.txt
stdout.txt		stdout.txt
test_ghost_removal.py		test_ghost_removal.py
test_output_diagnostic.py		test_output_diagnostic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisper-llm

Features

How It Works

Requirements

Quick Start

1. Start the ASR backend

2. Start Ollama (optional)

3. Install whisper-llm

4. Configure

5. Run

Configuration

Voice Commands

Word Corrections

LLM Modes

Troubleshooting

Project Structure

Privacy

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

whisper-llm

Features

How It Works

Requirements

Quick Start

1. Start the ASR backend

2. Start Ollama (optional)

3. Install whisper-llm

4. Configure

5. Run

Configuration

Voice Commands

Word Corrections

LLM Modes

Troubleshooting

Project Structure

Privacy

Contributing

License

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages