Claude Code Voice Bridge

Hands-free voice conversations with Claude Code. Speak commands, hear responses. All voice processing runs locally.

You --mic--> Whisper STT --text--> Claude Code --text--> Kokoro TTS --audio--> Speakers
              (local)               (CLI)                 (local)

Why this project

Claude Code has voice input (/voice) but no voice output. You can speak to it, but it can't speak back. This bridge closes the loop — full two-way voice conversation with Claude Code, hands-free.

This matters for:

Accessibility — developers with vision or motor disabilities who need hands-free coding
Workflow — architecture discussions, code review, planning — tasks that are naturally conversational
Multitasking — talk to Claude while your hands are on the keyboard, soldering iron, or whiteboard

Everything runs locally. Whisper transcribes your voice on-device. Kokoro speaks Claude's responses on-device. Only the Claude Code CLI touches the network. No API keys for voice, no cloud TTS, no data leaves your machine for speech processing.

Related: anthropics/claude-code#42226 — feature request for native bidirectional voice support.

Quick start

git clone https://github.com/Quill-AI-Assistant/claude-code-voice-bridge.git
cd claude-code-voice-bridge
./setup.sh
python3 bridge.py

Speak into your mic. Claude responds with voice.

The bridge auto-starts STT/TTS services if they're not running.

Usage

python3 bridge.py                          # new session, full voice
python3 bridge.py --voice af_sarah         # pick a voice
python3 bridge.py --speed 1.3              # speak faster
python3 bridge.py --resume abc123          # resume a previous session
python3 bridge.py --mic-off               # type input, hear output
python3 bridge.py --tts-off               # voice input, read output
python3 bridge.py --calibrate             # auto-detect mic threshold

Runtime commands

Type or say these anytime — even while the mic is listening:

Command	What it does
`/help`	Show all commands
`/voice NAME`	Change voice
`/voices`	List available voices
`/speed N`	Set speed (0.5-2.0)
`/mic on\|off`	Toggle microphone
`/tts on\|off`	Toggle voice output
`/calibrate`	Auto-set mic threshold
`/sessions`	List recent sessions
`/attach ID`	Switch to a session
`/session`	Show current status
`/exit`	Quit
`Esc`	Interrupt Claude mid-response
`speak faster`	Increase speed
`speak slower`	Decrease speed
`change voice to X`	Switch voice by speaking

How it works

A persistent claude -p --input-format stream-json subprocess stays alive for the session
Your voice is captured by a persistent audio stream with energy-based VAD
Speech is sent to a local Whisper server for transcription with confidence scoring
Transcribed text goes to Claude Code as a stream-json message — full tool access, file edits, everything
Claude's streaming response is chunked by sentence boundaries and queued for TTS
A background thread generates audio via Kokoro and plays it with interruptible polling
Speak during playback or press Esc to interrupt — playback stops within 50ms

Confidence scoring

Every voice input shows a colour-coded confidence score from Whisper:

Green [85%] — high confidence, sent as-is
Yellow [62%] — medium confidence, may need clarification
Red [30%] — low confidence, single-word fragments are filtered

Low-confidence single words are automatically rejected to prevent Whisper hallucinations from triggering commands.

Architecture

+----------------------------------------------+
|          Claude Code Voice Bridge             |
|                                               |
|  Mic --> STT --> +---------------+ --> TTS    |
|                  |  Claude Code  |            |
|  Keyboard -----> |  (persistent  | ---------> |
|                  |   subprocess) |            |
|                  +---------------+            |
+----------------------------------------------+

Single process: Claude Code runs once, stays alive across turns
Bidirectional stream-json: stdin/stdout as newline-delimited JSON
Interruptible: Voice or Esc stops Claude mid-sentence
Type anytime: Keyboard works even while mic is listening
Session portable: Resume in the bridge or in regular claude --resume

Prerequisites

Requirement	Auto-installed by setup.sh
Python 3.10+	No
Claude Code CLI	No
PortAudio	Yes (via Homebrew/apt)
FFmpeg	Yes (via Homebrew/apt)
VoiceMode (Whisper + Kokoro)	Yes

Custom STT/TTS

Any OpenAI-compatible endpoint works:

python3 bridge.py --stt-url http://localhost:9000/v1/audio/transcriptions \
                  --tts-url http://localhost:9000/v1/audio/speech

Tested with: Whisper.cpp, Deepgram, Kokoro, OpenAI TTS, ElevenLabs.

Voices

With Kokoro TTS, you get 60+ voices. Run /voices for the full list. Some favorites:

Voice	ID
Sky (default)	`af_sky`
Nova	`af_nova`
Sarah	`af_sarah`
Adam	`am_adam`
Emma	`bf_emma`
George	`bm_george`

Attribution

Built by @Quill-AI-Assistant (AI agent). Reviewed, tested, and verified by @ahuzmeza (human operator).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bridge.py		bridge.py
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Code Voice Bridge

Why this project

Quick start

Usage

Runtime commands

How it works

Confidence scoring

Architecture

Prerequisites

Custom STT/TTS

Voices

Attribution

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Claude Code Voice Bridge

Why this project

Quick start

Usage

Runtime commands

How it works

Confidence scoring

Architecture

Prerequisites

Custom STT/TTS

Voices

Attribution

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages