Skip to content

feat(conductor): add opt-in voice STT to Telegram bridge#309

Open
Abeansits wants to merge 3 commits intoasheshgoplani:mainfrom
Abeansits:feat/voice-bridge-stt
Open

feat(conductor): add opt-in voice STT to Telegram bridge#309
Abeansits wants to merge 3 commits intoasheshgoplani:mainfrom
Abeansits:feat/voice-bridge-stt

Conversation

@Abeansits
Copy link
Contributor

@Abeansits Abeansits commented Mar 8, 2026

Summary

  • Opt-in voice STT: Telegram voice messages are transcribed locally using parakeet-mlx via a subprocess worker (stt_worker.py), crash-isolated from the bot event loop. No cloud API needed.
  • Disabled by default: Voice transcription only activates when BRIDGE_STT_ENABLED=true env var is set. Without it, voice messages are silently ignored.
  • No hardcoded paths: stt_worker.py uses shutil.which('parakeet-mlx') to find the CLI on PATH, with PARAKEET_CLI_PATH env var as an explicit override.
  • File logging: Bridge now logs to ~/.agent-deck/conductor/bridge.log in addition to stdout.

Changes

  • conductor/bridge.py: transcribe_voice() downloads voice files and calls stt_worker via async subprocess (60s timeout). handle_message() now handles message.voice when BRIDGE_STT_ENABLED=true.
  • conductor/stt_worker.py: New standalone STT worker that finds and invokes the parakeet-mlx CLI, reads output text files, and prints transcription to stdout.

Configuration

# Enable voice transcription
export BRIDGE_STT_ENABLED=true

# Optional: explicit path to parakeet-mlx CLI
export PARAKEET_CLI_PATH=/path/to/parakeet-mlx

Test plan

  • With BRIDGE_STT_ENABLED=true, send a voice message via Telegram and verify transcription
  • Verify voice messages produce a Transcribing... status then the transcribed text is forwarded to the conductor
  • Verify failed transcription returns [Could not transcribe voice message.]
  • With STT disabled (default), verify voice messages are silently ignored
  • Verify normal text message handling is unaffected

Abeansits and others added 3 commits March 7, 2026 16:10
Replace Groq Whisper API with local parakeet-mlx (parakeet-tdt-0.6b-v3)
for voice message transcription. Add TTS voice replies using macOS say
+ ffmpeg (OGG/Opus output), toggled via BRIDGE_TTS_ENABLED env var.

- stt_worker.py: standalone subprocess worker that normalizes audio to
  mono 16kHz WAV and runs parakeet-mlx inference, crash-isolated from
  the bot event loop
- bridge.py: transcribe_voice() calls stt_worker via async subprocess
  (60s timeout), generate_voice_reply() chains say + ffmpeg via async
  subprocesses with per-step timeouts and proper kill/cleanup

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Switch from importing parakeet-mlx as a Python library to invoking
the parakeet-mlx CLI binary. This avoids import/dependency issues
and is cleaner for subprocess-based isolation.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Strip generate_voice_reply(), BRIDGE_TTS_ENABLED/BRIDGE_TTS_VOICE
config, bot.send_voice() TTS response block, say+ffmpeg pipeline,
and FSInputFile import. Voice-to-text (STT) remains intact.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@asheshgoplani asheshgoplani changed the title feat(conductor): add local voice STT and TTS to Telegram bridge feat(conductor): add opt-in voice STT to Telegram bridge Mar 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant