Skip to content

cj-elevate/whisper-llm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

137 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

whisper-llm banner

whisper-llm

Offline voice dictation for Windows. Press a hotkey or connect your Bluetooth earbuds, speak naturally, and get cleaned-up text pasted into the active app. No cloud. No subscription. Just your voice and your machine.

Features

  • Parakeet TDT transcription - NVIDIA NeMo ASR, more accurate than Whisper with no hallucinations
  • Bluetooth earbud auto-lifecycle - Auto-starts when earbuds connect, auto-stops on disconnect (privacy-first, never falls back to laptop mic)
  • LLM text cleanup - Local Ollama fixes grammar, punctuation, removes filler words
  • Hotkey-activated - Press Scroll Lock to start/stop recording
  • Voice Activity Detection - Silero VAD with audio preprocessing (high-pass filter + gain normalization)
  • Voice commands - "new paragraph", "send", "delete last", slash commands
  • Self-healing runtime - Watchdog auto-restarts on crash with circuit breaker, health monitoring for ASR/LLM backends
  • System tray - Runs quietly in the background
  • 100% local - Your audio never leaves your machine

How It Works

[Hotkey / BT Earbuds] -> Mic -> VAD + HPF + AGC -> Parakeet ASR -> LLM Cleanup -> Paste
                                                        |                |
                                                   (Docker, local)  (Ollama, local)

All processing happens on your machine. Audio goes through a high-pass filter and gain normalization, then to a local Parakeet TDT container for transcription, then optionally through Ollama for text cleanup.

Requirements

  • Windows 10/11
  • Python 3.11+
  • Docker with an NVIDIA GPU
  • Ollama (optional, for LLM text cleanup)

Quick Start

1. Start the ASR backend

Parakeet TDT (recommended):

docker compose -f spike/parakeet/docker-compose.parakeet.yml up -d

Or run directly:

docker run -d --gpus all -p 9410:8000 \
  shinejh0528/parakeet-tdt-0.6b-v2:py3.12.10_torch-cu128
Alternative: faster-whisper (legacy)
docker run -d --gpus all -p 10300:10300 \
  rhasspy/wyoming-whisper:latest \
  --model large-v3 --language en

Set whisper.backend: wyoming and whisper.port: 10300 in config.yaml.

2. Start Ollama (optional)

docker run -d --gpus all -p 11434:11434 \
  -v ollama:/root/.ollama ollama/ollama

docker exec ollama ollama pull qwen3:14b

3. Install whisper-llm

git clone https://github.com/cj-elevate/whisper-llm.git
cd whisper-llm
pip install -r requirements.txt

4. Configure

cp config.example.yaml config.yaml
# Edit config.yaml to customize settings

5. Run

python src/main.py

For daily use, enable the watchdog via the system tray menu ("Run on Startup"). This launches a lightweight supervisor via Task Scheduler that auto-restarts the app after crashes and caps restart attempts to prevent loops.

Press Scroll Lock to start/stop recording. Speak naturally, and text appears in your active window.

Configuration

Edit config.yaml to customize. Changes auto-restart the app.

Setting Default Description
whisper.backend parakeet ASR backend: parakeet or wyoming
whisper.port 9410 ASR server port (9410 for Parakeet, 10300 for Wyoming)
hotkey scroll lock Key to toggle recording
audio.preferred_device_pattern none Substring match for preferred mic (e.g., "Buds3 Pro")
audio.bud_presence_lifecycle true Auto start/stop on BT earbud connect/disconnect
llm.enabled true Enable LLM text cleanup
llm.model qwen3:14b Ollama model for cleanup
output.method auto Text insertion: clipboard / sendinput / auto
corrections.enabled true Post-transcription word corrections

See config.example.yaml for all options.

Voice Commands

Command Effect
"send" Paste text and press Enter
"new paragraph" Insert blank line
"new line" Insert line break
"period" / "comma" / "dash" Insert punctuation
"slash [command]" Insert slash command (e.g., "slash team" -> /team)
"delete last" Undo last output (Ctrl+Z)

Slash commands spoken alone auto-press Enter for hands-free CLI use.

Word Corrections

ASR sometimes misrecognizes domain-specific words. Add corrections in config.yaml:

corrections:
  enabled: true
  words:
    cloud: Claude
    cloud code: Claude Code

Corrections are case-insensitive, whole-word, longest-first. Standalone command aliases (entire utterance matches) are also supported via the standalone_commands config section.

LLM Modes

Mode Description
raw No processing, direct transcription
clean Fix grammar, punctuation, remove fillers (default)

Switch modes via the system tray menu.

Troubleshooting

See TROUBLESHOOTING.md for common issues.

Project Structure

src/
  main.py               # Entry point, bootstrap sequence
  app.py                # System tray, hotkey, lifecycle
  pipeline.py           # Async audio processing pipeline
  audio.py              # Microphone capture, VAD, device monitoring
  transcriber_parakeet.py  # Parakeet TDT HTTP client
  transcriber.py        # Wyoming protocol client (faster-whisper)
  llm.py                # Ollama integration
  output.py             # Text injection (clipboard/SendInput)
  config.py             # Pydantic configuration
  runtime.py            # Singleton, crash hooks, thread supervision
  watchdog.pyw          # Supervisor with circuit breaker
  health.py             # Backend health monitoring
  commands.py           # Voice command processing
  notify.py             # Desktop notifications

Privacy

  • No cloud services - All processing is local
  • No telemetry - No data collection
  • No network calls - Only connects to localhost containers
  • Audio stays local - Never transmitted anywhere
  • Privacy-first BT - Earbuds disconnect = immediate stop, no fallback to laptop mic

Contributing

Contributions welcome! Please fork, branch, and submit a pull request.

License

MIT License

Acknowledgments

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages