Skip to content

feat(voice): Claude decides when to send voice notes via MCP tool#7

Merged
paulpwo merged 6 commits into
mainfrom
feat/voice-as-tool
Apr 13, 2026
Merged

feat(voice): Claude decides when to send voice notes via MCP tool#7
paulpwo merged 6 commits into
mainfrom
feat/voice-as-tool

Conversation

@paulpwo

@paulpwo paulpwo commented Apr 13, 2026

Copy link
Copy Markdown
Owner

Summary

  • Before: el orchestrator contaba las palabras de la respuesta de Claude y mandaba voz si era cortita. Claude no tenía ningún rol en la decisión.
  • After: Claude tiene una tool send_voice_reply(text) que llama explícitamente cuando el contexto lo justifica (el usuario pide audio, respuesta conversacional corta, etc.). Si no la llama, siempre responde en texto.

El viejo diseño era fundamentalmente incorrecto: ignoraba lo que el usuario decía y lo reemplazaba por un len(words) <= 200. Un usuario que decía "no me respondas con voz" igual recibía audio porque la respuesta era corta.

Cambios

Archivo Cambio
src/mcp/telegram_server.py Nueva tool send_voice_reply(text)
src/claude/sdk_integration.py System prompt actualizado — describe la tool y cuándo usarla
src/config/settings.py Eliminados voice_reply_mode + voice_reply_max_words (ya no existen)
src/bot/orchestrator.py Stream callback intercepta tool calls de voz; entrega desde mcp_voice_requests; elimina _should_send_voice, _user_wants_voice, _VOICE_REQUEST_KEYWORDS

Comportamiento nuevo

  • /voice on (default): Claude puede usar la tool, decide según contexto
  • /voice off: tool deshabilitada, siempre texto
  • /voice auto: alias para on (backwards compat)
  • ENABLE_MCP=true requerido para que la tool esté disponible

Test plan

  • CI verde
  • Decirle al bot "no me respondas con voz" → responde en texto aunque sea corto
  • Decirle al bot "mandame un audio de esto" → manda nota de voz
  • /voice off → nunca voz aunque se lo pidás
  • /voice on → Claude decide según contexto

🤖 Generated with Claude Code

paulpwo and others added 6 commits April 13, 2026 11:07
Replace the post-process word-count heuristic with a proper MCP tool.
Claude now calls `send_voice_reply(text)` explicitly when the context
warrants it — user asks for audio, conversational short reply, etc.

Changes:
- src/mcp/telegram_server.py: add send_voice_reply(text) tool
- src/claude/sdk_integration.py: update system prompt — explains the
  tool and when to use it; removes the old "bot layer handles TTS" lie
- src/config/settings.py: remove voice_reply_mode + voice_reply_max_words
  settings and validator (word-count gate no longer exists)
- src/bot/orchestrator.py:
  * _make_stream_callback: intercept send_voice_reply MCP tool calls,
    collect requested texts in mcp_voice_requests list
  * agentic_text + agentic_voice: deliver voice from mcp_voice_requests
    instead of applying _should_send_voice post-hoc
  * Remove _VOICE_REQUEST_KEYWORDS, _user_wants_voice, _should_send_voice
  * /voice on|off (auto accepted as alias for on); update /voice
    BotCommand description
  * Remove verbose suppression that was coupled to voice mode

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eeded

- config/mcp.json: committed (no secrets) — removed from .gitignore
- settings.py: mcp_config_path defaults to config/mcp.json so MCP_CONFIG_PATH
  is no longer required in .env
- Remove the "mcp_config_path required" validator — default covers it
- Update test_mcp_config_validation accordingly

To enable MCP (and voice-as-tool): only ENABLE_MCP=true needed in .env.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Simpler approach: point the default directly at the existing example file
instead of creating a redundant mcp.json. No new files needed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The internal telegram server (send_image_to_user, send_voice_reply) is
now wired unconditionally — it is part of the bot, not an opt-in feature.

ENABLE_MCP=true is now only needed when adding extra external MCP servers
via mcp_config_path. Those are merged on top of the telegram server.

Also: mcp_config_path defaults to config/mcp.example.json so no config
variable needed for the base case.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TestingConfig defaults voice/scheduler/project_threads to False, so CI
was registering only 10 commands instead of 13. Local .env was leaking
those flags and masking the failure. Now fixtures are explicit and
environment-independent.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@paulpwo paulpwo merged commit 0b15037 into main Apr 13, 2026
2 checks passed
@paulpwo paulpwo deleted the feat/voice-as-tool branch April 13, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant