diff --git a/README.de.md b/README.de.md index 49fb922..c063c3c 100644 --- a/README.de.md +++ b/README.de.md @@ -38,7 +38,7 @@ - ✅ Telegram zum Steuern von Codex / Copilot CLI verwenden - ✅ Antworten und geänderte Dateien bequem in Code-Blöcken prüfen - ✅ Folgefragen während eines laufenden Agentenlaufs in die Queue stellen - - ✅ Unterstützt Text- und Bildeingaben + - ✅ Akzeptiert ✏️ Text-, 🌄 Bild- und 🎙️ Sprachnachrichten sowie Audiodateien ## 🔁 Nahtlos zwischen Geräten und Sessions wechseln @@ -99,6 +99,7 @@ Vor dem Start des Servers brauchst du: - Lokal installiertes Codex CLI und/oder Copilot CLI - [Codex CLI Installation](https://developers.openai.com/codex/cli) - [Copilot CLI Installation](https://github.com/features/copilot/cli) +- [Optional] Whisper, ffmpeg @@ -126,7 +127,7 @@ cd coding-agent-telegram ./startup.sh ``` -### Bot-Server starten +### 🌐 Bot-Server starten ##### Beim ersten Start legt die App die Env-Datei an und sagt dir, welche Felder du ausfüllen musst. ##### Nach dem Bearbeiten der Env-Datei starte erneut: ```bash @@ -137,6 +138,32 @@ coding-agent-telegram ./startup.sh ``` +## 🎙️ [Optional] Speech-to-Text-Funktion: lokale OpenAI-Whisper-Voraussetzungen vorbereiten + +Damit aktivierst du optional lokale Whisper-basierte Sprach-zu-Text-Unterstützung für Telegram-Sprachnotizen. Audiodateien sind auf maximal `20 MB` begrenzt. + +```bash +# wenn du per pip oder per Einzeiler install.sh installiert hast +coding-agent-telegram-stt-install + +# wenn du aus einem geklonten Repository startest +./install-stt.sh +``` + +Empfohlene Env-Einstellungen: + +```text +ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=true +OPENAI_WHISPER_MODEL=base +OPENAI_WHISPER_TIMEOUT_SECONDS=120 +``` + +Hinweise: + +- Whisper lädt das ausgewählte Modell beim ersten Aufruf automatisch nach `~/.cache/whisper` herunter. +- Wenn du `OPENAI_WHISPER_MODEL=turbo` wählst, ist es wahrscheinlicher, dass die erste Sprachnachricht das Zeitlimit erreicht, während `large-v3-turbo.pt` noch heruntergeladen wird. +- Nach der Transkription einer Sprachnachricht sendet der Bot das erkannte Transkript zuerst zurück an Telegram und gibt es danach an den Agenten weiter. So lassen sich Erkennungsfehler leichter prüfen. + ## 🔑 Telegram-Einrichtung ### Bot-Token holen @@ -175,6 +202,7 @@ Der Bot akzeptiert derzeit: - Textnachrichten - Fotos +- Sprachnachrichten und Audiodateien, wenn `ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=true` gesetzt ist und die lokalen Whisper-Voraussetzungen installiert sind - Codex und Copilot unterstützen aktuell nur Text und Bilder, kein Video. ## 🤖 Telegram-Befehle @@ -329,6 +357,18 @@ Der Bot akzeptiert derzeit:
ENABLE_SECRET_SCRUB_FILTER.env-Werte, Zertifikate und ähnliche geheime Ausgaben vor dem Senden an Telegram unkenntlich machen. Standard: true (dringend empfohlen).ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXTfalse. Wenn true, werden Audionachrichten und Sprachdateien erkannt. Das System prüft die Voraussetzungen für benötigte Binärdateien oder Bibliotheken und fordert bei Bedarf zur Installation auf.OPENAI_WHISPER_MODELbasetiny ca. 72 MB, base ca. 139 MB, large-v3-turbo ca. 1.5 GBbase für den allgemeinen Einsatz. Für bessere Genauigkeit und Qualität kannst du turbo ausprobieren.OPENAI_WHISPER_TIMEOUT_SECONDS120. Zeitlimit für den STT-Prozess. Normalerweise ist die Verarbeitung schnell genug. Wenn du jedoch turbo wählst, kann der erste Download je nach Internetgeschwindigkeit das Zeitlimit überschreiten.SNAPSHOT_INCLUDE_PATH_GLOBS.github/*,.profile.test,.profile.prod/provider |
+ /provider |
Choose the provider for new sessions. The selection is stored per bot and chat until you change it. |
/project <project_folder> |
+ /project <project_folder> |
Set the current project folder. If the folder does not exist, the app creates it and marks it trusted. If it already exists and is still untrusted, the app asks you to trust it explicitly. |
/branch <new_branch> |
+ /branch <new_branch> |
Prepare or switch a branch for the current project. If the branch already exists, the bot treats that branch as the source candidate. Otherwise it uses the repository default branch as the source candidate. |
/branch <origin_branch> <new_branch> |
+ /branch <origin_branch> <new_branch> |
Prepare or switch a branch using <origin_branch> as the source candidate. For both forms, the bot then offers the source choices that actually exist: local/<branch> origin/<branch> If only one of those exists, only that option is shown. If neither exists, the bot tells you the branch source is missing. |
/current |
+ /current |
Show the active session for the current bot and chat. |
/new [session_name] |
+ /new [session_name] |
Create a new session for the current project. If you omit the name, the bot uses the real session ID. If provider, project, or branch is missing, the bot guides you through the missing step. |
/switch |
+ /switch |
Show the latest sessions, newest first. The list includes both bot-managed sessions and local Codex/Copilot CLI sessions for the current project. |
/switch page <number> |
+ /switch page <number> |
Show another page of stored sessions. |
/switch <session_id> |
+ /switch <session_id> |
Switch to a specific session by ID. If you choose a local CLI session, the bot imports it and continues from there. |
/compact |
+ /compact |
Create a fresh compacted session from the active session and switch to it. |
/commit <git commands> |
+ /commit <git commands> |
Run validated git commit-related commands inside the active session project. Available only when ENABLE_COMMIT_COMMAND=true. Mutating git commands require a trusted project. |
/push |
+ /push |
Push origin <branch> for the current active session. The bot asks for confirmation before pushing. |
/abort |
+ /abort |
Abort the current agent run for the current project. If queued questions are waiting, the bot asks whether to continue them. |
WORKSPACE_ROOT |
+ WORKSPACE_ROOT |
Parent folder that contains your project directories. | ||||||||||||||||||||||||||
APP_LOCALE |
+ APP_LOCALE |
UI locale for shared bot messages and command descriptions. Supported values: en, de, fr, ja, ko, nl, th, vi, zh-CN, zh-HK, zh-TW. |
ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT |
+ Default: false. If true, it enables the audio messages capability. System will check the prerequisites regarding required binaries or libraries on startup. |
+
OPENAI_WHISPER_MODEL |
+ Model for the Whisper SST. Default: baseAvailable models: tiny about 72 MB, base about large-v3-turbo about 1.5 GB+ Models will be automatically downloaded on your first voice message. Recommended: base for general usage. If you want better accuracy and quality, you can try with turbo
+ |
+
OPENAI_WHISPER_TIMEOUT_SECONDS |
+ Default: 120Timeout for the STT process. Usually the STT processing is fast enough. |
+
ENABLE_SECRET_SCRUB_FILTER |
在送往 Telegram 之前,對 tokens、keys、.env 值、certificates 及類似秘密輸出做遮罩。預設:true(強烈建議啟用)。 |
+
ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT |
+ 預設:false。如果為 true,就會啟用音訊訊息與語音檔案識別。系統會檢查所需的 binary 或 library 依賴,缺少時會提示用戶安裝。 |
+
OPENAI_WHISPER_MODEL |
+ Whisper STT 使用的模型。預設:base可用模型: tiny 約 72 MB、base 約 139 MB、large-v3-turbo 約 1.5 GB模型會在你第一次傳送語音訊息時自動下載。建議一般使用選 base。如果你想要更好的準確率與品質,可以嘗試 turbo。 |
+
OPENAI_WHISPER_TIMEOUT_SECONDS |
+ 預設:120。STT 進程的逾時時間。一般來說處理速度已足夠快,但如果你選擇 turbo,首次下載可能會視乎網速而超出逾時限制。 |
+
SNAPSHOT_INCLUDE_PATH_GLOBS |
強制把符合條件的 path 納入 diff。例子:.github/*,.profile.test,.profile.prod |
diff --git a/README.zh-TW.md b/README.zh-TW.md
index 689d0b0..6161ac6 100644
--- a/README.zh-TW.md
+++ b/README.zh-TW.md
@@ -38,7 +38,7 @@
- ✅ 使用 Telegram 控制 Codex / Copilot CLI
- ✅ 可以在 code block 中輕鬆檢視 agent 回覆與改動檔案
- ✅ agent 執行期間也能把後續問題排入佇列
- - ✅ 支援文字與圖片輸入
+ - ✅ 支援 ✏️ 文字、🌄 圖片、🎙️ 語音訊息以及音訊檔案
## 🔁 裝置與工作階段無縫切換
@@ -99,6 +99,7 @@ curl -fsSL https://raw.githubusercontent.com/daocha/coding-agent-telegram/main/i
- 已在本機安裝 Codex CLI 及/或 Copilot CLI
- [安裝 Codex CLI](https://developers.openai.com/codex/cli)
- [安裝 Copilot CLI](https://github.com/features/copilot/cli)
+- [可選] Whisper、ffmpeg
ENABLE_SECRET_SCRUB_FILTER |
在送往 Telegram 之前,對 tokens、keys、.env 值、certificates 及類似秘密輸出做遮罩。預設:true(強烈建議啟用)。 |
+
ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT |
+ 預設:false。如果為 true,就會啟用音訊訊息與語音檔案識別。系統會檢查所需的 binary 或 library 依賴,缺少時會提示使用者安裝。 |
+
OPENAI_WHISPER_MODEL |
+ Whisper STT 使用的模型。預設:base可用模型: tiny 約 72 MB、base 約 139 MB、large-v3-turbo 約 1.5 GB模型會在你第一次傳送語音訊息時自動下載。建議一般使用選 base。如果你想要更好的準確率與品質,可以嘗試 turbo。 |
+
OPENAI_WHISPER_TIMEOUT_SECONDS |
+ 預設:120。STT 進程的逾時時間。一般來說處理速度已足夠快,但如果你選擇 turbo,首次下載可能會視乎網速而超出逾時限制。 |
+
SNAPSHOT_INCLUDE_PATH_GLOBS |
強制把符合條件的 path 納入 diff。例子:.github/*,.profile.test,.profile.prod |
diff --git a/install-stt.sh b/install-stt.sh
new file mode 100755
index 0000000..a74dba7
--- /dev/null
+++ b/install-stt.sh
@@ -0,0 +1,37 @@
+#!/usr/bin/env bash
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+cd "$SCRIPT_DIR"
+
+PYTHON_BIN="${PYTHON_BIN:-python3}"
+VENV_DIR="${VENV_DIR:-.venv}"
+ENV_FILE="${ENV_FILE:-}"
+LOCAL_PRETEND_VERSION="${SETUPTOOLS_SCM_PRETEND_VERSION_FOR_CODING_AGENT_TELEGRAM:-0.0.dev0}"
+
+if ! command -v "$PYTHON_BIN" >/dev/null 2>&1; then
+ echo "Error: $PYTHON_BIN was not found in PATH." >&2
+ exit 1
+fi
+
+if [[ ! -d "$VENV_DIR" ]]; then
+ "$PYTHON_BIN" -m venv "$VENV_DIR"
+fi
+
+source "$VENV_DIR/bin/activate"
+python -m pip install --upgrade pip >/dev/null
+
+if ! python -c "import coding_agent_telegram" >/dev/null 2>&1; then
+ echo "Installing local package into $VENV_DIR so the shared STT installer is available."
+ SETUPTOOLS_SCM_PRETEND_VERSION_FOR_CODING_AGENT_TELEGRAM="$LOCAL_PRETEND_VERSION" \
+ python -m pip install -e .
+fi
+
+ARGS=("install")
+if [[ -n "$ENV_FILE" ]]; then
+ ARGS+=("--env-file" "$ENV_FILE")
+fi
+ARGS+=("--python-bin" "$(command -v python)")
+
+exec python -m coding_agent_telegram.stt_setup "${ARGS[@]}"
diff --git a/install.sh b/install.sh
index 4ff3b92..4424b95 100644
--- a/install.sh
+++ b/install.sh
@@ -28,7 +28,4 @@ if [[ -z "$COMMAND_PATH" && ":$PATH:" != *":$SCRIPT_DIR:"* ]]; then
fi
echo "Starting coding-agent-telegram..."
-if [[ -n "$COMMAND_PATH" ]]; then
- exec "$COMMAND_PATH"
-fi
exec "$PYTHON_BIN" -m coding_agent_telegram
diff --git a/pyproject.toml b/pyproject.toml
index 20ff289..59db972 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -18,6 +18,7 @@ dependencies = [
[project.scripts]
coding-agent-telegram = "coding_agent_telegram.cli:main"
+coding-agent-telegram-stt-install = "coding_agent_telegram.stt_setup:main"
[tool.setuptools]
package-dir = {"" = "src"}
diff --git a/src/coding_agent_telegram/bot.py b/src/coding_agent_telegram/bot.py
index c0fd0aa..1d4e585 100644
--- a/src/coding_agent_telegram/bot.py
+++ b/src/coding_agent_telegram/bot.py
@@ -20,6 +20,25 @@
TELEGRAM_GET_UPDATES_CONNECTION_POOL_SIZE = 2
+def _describe_message_types(message) -> list[str]:
+ types: list[str] = []
+ for field_name in (
+ "text",
+ "photo",
+ "voice",
+ "audio",
+ "document",
+ "video",
+ "video_note",
+ "animation",
+ "sticker",
+ ):
+ value = getattr(message, field_name, None)
+ if value:
+ types.append(field_name)
+ return types
+
+
def default_bot_commands(*, enable_commit_command: bool, locale: str = DEFAULT_LOCALE) -> list[BotCommand]:
commands = [
BotCommand("provider", translate(locale, "bot.command.provider")),
@@ -106,9 +125,22 @@ def build_application(token: str, router: CommandRouter, *, allowed_chat_ids: se
| tg_filters.Sticker.ALL
| tg_filters.VIDEO
| tg_filters.VIDEO_NOTE
- | tg_filters.VOICE
)
+ async def log_incoming_private_message(update, _context) -> None:
+ message = getattr(update, "message", None)
+ chat = getattr(update, "effective_chat", None)
+ if message is None or chat is None:
+ return
+ logger.info(
+ "Incoming Telegram message chat=%s message_id=%s types=%s text_preview=%.120r",
+ chat.id,
+ getattr(message, "message_id", None),
+ ",".join(_describe_message_types(message)) or "unknown",
+ getattr(message, "text", None) or "",
+ )
+
+ app.add_handler(MessageHandler(allowed_private, log_incoming_private_message, block=False), group=-1)
app.add_handler(CommandHandler("provider", router.handle_provider, filters=allowed_private))
app.add_handler(CommandHandler("project", router.handle_project, filters=allowed_private))
app.add_handler(CommandHandler("branch", router.handle_branch, filters=allowed_private))
@@ -127,6 +159,8 @@ def build_application(token: str, router: CommandRouter, *, allowed_chat_ids: se
app.add_handler(CallbackQueryHandler(router.handle_push_callback, pattern=r"^push:(confirm|cancel)$"))
app.add_handler(CallbackQueryHandler(router.handle_trust_project_callback, pattern=r"^trustproject:(yes|no):"))
app.add_handler(MessageHandler(allowed_private & tg_filters.PHOTO, router.handle_photo, block=False))
+ app.add_handler(MessageHandler(allowed_private & tg_filters.AUDIO, router.handle_audio, block=False))
+ app.add_handler(MessageHandler(allowed_private & tg_filters.VOICE, router.handle_voice, block=False))
app.add_handler(MessageHandler(allowed_private & tg_filters.TEXT & ~tg_filters.COMMAND, router.handle_message, block=False))
app.add_handler(MessageHandler(allowed_private & unsupported_media, router.handle_unsupported_message))
app.add_error_handler(build_error_handler(router.deps.cfg.locale))
diff --git a/src/coding_agent_telegram/cli.py b/src/coding_agent_telegram/cli.py
index 2debcd3..3cf2299 100644
--- a/src/coding_agent_telegram/cli.py
+++ b/src/coding_agent_telegram/cli.py
@@ -14,6 +14,7 @@
from coding_agent_telegram.i18n import translate
from coding_agent_telegram.logging_utils import setup_logging
from coding_agent_telegram.session_store import SessionStore
+from coding_agent_telegram.stt_setup import ensure_stt_runtime_or_exit, offer_stt_install_for_new_env
logger = logging.getLogger(__name__)
@@ -123,6 +124,11 @@ def main() -> None:
),
file=sys.stderr,
)
+ offer_stt_install_for_new_env(
+ env_file=str(env_path),
+ python_bin=sys.executable,
+ installer_label="coding-agent-telegram-stt-install",
+ )
try:
cfg = load_config(env_path)
except ValueError as exc:
@@ -140,6 +146,11 @@ def main() -> None:
log_file = setup_logging(cfg.log_level, cfg.log_dir)
logger.info("Logging to %s", log_file)
+ try:
+ ensure_stt_runtime_or_exit(cfg.enable_openai_whisper_speech_to_text)
+ except SystemExit as exc:
+ logger.error("%s", exc)
+ raise
store = SessionStore(cfg.state_file, cfg.state_backup_file)
runner = MultiAgentRunner(
diff --git a/src/coding_agent_telegram/config.py b/src/coding_agent_telegram/config.py
index 59c435a..be8eb09 100644
--- a/src/coding_agent_telegram/config.py
+++ b/src/coding_agent_telegram/config.py
@@ -21,6 +21,8 @@
DEFAULT_ENV_FILE_NAME = ".env_coding_agent_telegram"
# 0 = disabled. Set to a positive value to kill runaway agent processes.
DEFAULT_AGENT_HARD_TIMEOUT_SECONDS = 0
+DEFAULT_OPENAI_WHISPER_MODEL = "base"
+DEFAULT_OPENAI_WHISPER_TIMEOUT_SECONDS = 120
@dataclass(frozen=True)
@@ -51,6 +53,9 @@ class AppConfig:
max_telegram_message_length: int
enable_sensitive_diff_filter: bool
enable_secret_scrub_filter: bool
+ enable_openai_whisper_speech_to_text: bool
+ openai_whisper_model: str
+ openai_whisper_timeout_seconds: int
default_agent_provider: str
agent_hard_timeout_seconds: int
app_internal_root: Path
@@ -227,6 +232,15 @@ def load_config(env_file: Optional[Path] = None) -> AppConfig:
),
enable_sensitive_diff_filter=_parse_bool(os.getenv("ENABLE_SENSITIVE_DIFF_FILTER", "true"), default=True),
enable_secret_scrub_filter=_parse_bool(os.getenv("ENABLE_SECRET_SCRUB_FILTER", "true"), default=True),
+ enable_openai_whisper_speech_to_text=_parse_bool(
+ os.getenv("ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT", "false")
+ ),
+ openai_whisper_model=os.getenv("OPENAI_WHISPER_MODEL", DEFAULT_OPENAI_WHISPER_MODEL).strip()
+ or DEFAULT_OPENAI_WHISPER_MODEL,
+ openai_whisper_timeout_seconds=max(
+ 1,
+ int(os.getenv("OPENAI_WHISPER_TIMEOUT_SECONDS", str(DEFAULT_OPENAI_WHISPER_TIMEOUT_SECONDS))),
+ ),
default_agent_provider=provider,
agent_hard_timeout_seconds=int(
os.getenv("AGENT_HARD_TIMEOUT_SECONDS", str(DEFAULT_AGENT_HARD_TIMEOUT_SECONDS))
diff --git a/src/coding_agent_telegram/resources/.env.example b/src/coding_agent_telegram/resources/.env.example
index eac4569..582d866 100644
--- a/src/coding_agent_telegram/resources/.env.example
+++ b/src/coding_agent_telegram/resources/.env.example
@@ -90,6 +90,22 @@ ENABLE_SENSITIVE_DIFF_FILTER=true
# Strongly recommended: keep this set to true.
ENABLE_SECRET_SCRUB_FILTER=true
+# If true, enable Telegram voice-message speech-to-text through local openai-whisper.
+# Default: false. Run coding-agent-telegram-stt-install (pip install) or ./install-stt.sh (repo clone) first.
+# Estimated local footprint: openai-whisper package ~50 MB, ffmpeg ~50 MB, plus Whisper model downloads.
+# Example model cache sizes: tiny ~72 MB, base ~139 MB, large-v3-turbo ~1.5 GB.
+ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=false
+
+# Whisper model name to use for Telegram voice-message speech-to-text.
+# Recommended default: base. `turbo` downloads the large-v3-turbo model (~1.5 GB).
+# Models download automatically on first use into ~/.cache/whisper.
+# If the selected model is not cached yet, the first voice transcription may take longer.
+# With `turbo`, that first call is more likely to hit OPENAI_WHISPER_TIMEOUT_SECONDS before the download finishes.
+OPENAI_WHISPER_MODEL=base
+
+# Timeout for a single Whisper transcription call, in seconds.
+OPENAI_WHISPER_TIMEOUT_SECONDS=120
+
# Default agent provider for new sessions: codex or copilot.
DEFAULT_AGENT_PROVIDER=codex
diff --git a/src/coding_agent_telegram/resources/locales/de.json b/src/coding_agent_telegram/resources/locales/de.json
index 0efee11..1d48a75 100644
--- a/src/coding_agent_telegram/resources/locales/de.json
+++ b/src/coding_agent_telegram/resources/locales/de.json
@@ -26,7 +26,8 @@
"git.usage_push": "Verwendung: /push",
"message.photo_only_codex": "Fotoanhänge werden derzeit nur für Codex-Sitzungen unterstützt.",
"message.question_queued": "Frage als Q{question_number} in die Warteschlange gestellt. Sie wird verarbeitet, sobald die aktuelle Agent-Aufgabe abgeschlossen ist.",
- "message.unsupported_message_type": "Nicht unterstützter Nachrichtentyp.\nDieser Bot akzeptiert derzeit nur Textnachrichten und Fotos.",
+ "message.voice_speech_to_text_disabled": "Sprachnachrichten sind nicht aktiviert.\nSetze ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=true und installiere zuerst die lokalen Whisper-Voraussetzungen.",
+ "message.unsupported_message_type": "Nicht unterstützter Nachrichtentyp.\nDieser Bot akzeptiert derzeit Textnachrichten, Fotos, Sprachnachrichten und Audiodateien.",
"queue.button_group": "Fragen gruppieren",
"queue.button_no": "Nein",
"queue.button_single": "Einzeln verarbeiten",
@@ -54,6 +55,11 @@
"runtime.resume_created_new": "Das Fortsetzen ist fehlgeschlagen, daher wurde eine neue Sitzung erstellt.\nNeue Sitzungs-ID: {session_id}\nNeuer Sitzungsname: {session_name}",
"runtime.resume_id_changed": "Das Fortsetzen war erfolgreich, aber die Sitzungs-ID hat sich geändert.\nNeue Sitzungs-ID: {session_id}\nNeuer Sitzungsname: {session_name}",
"runtime.sensitive_diff_omitted": "{path}\nDiese Datei enthält sensible Inhalte und wurde ausgelassen.",
+ "runtime.voice_conversion_failed": "Sprachumwandlung fehlgeschlagen.",
+ "runtime.voice_conversion_timed_out": "Zeitlimit für Sprachumwandlung erreicht.",
+ "runtime.voice_model_initial_download_note": "Das gewählte Whisper-Modell wird beim ersten Aufruf möglicherweise noch heruntergeladen. Größere Modelle wie turbo erreichen dieses Zeitlimit eher.",
+ "runtime.voice_transcript_preview": "Erkanntes Sprachtranskript:\n{transcript}\n\nWird bearbeitet...",
+ "runtime.voice_transcript_queued_preview": "Erkanntes Sprachtranskript:\n{transcript}\n\nAls Q{question_number} in die Warteschlange gestellt. Es wird verarbeitet, sobald die aktuelle Agent-Aufgabe abgeschlossen ist.",
"runtime.working_on_it": "Wird bearbeitet...",
"status.abort_signal_sent": "Abbruchsignal für den aktuellen Projektlauf gesendet.",
"status.no_running_agent": "Für das aktuelle Projekt wurde kein laufender Agent-Prozess gefunden.",
diff --git a/src/coding_agent_telegram/resources/locales/en.json b/src/coding_agent_telegram/resources/locales/en.json
index 7748e71..c1e1ba0 100644
--- a/src/coding_agent_telegram/resources/locales/en.json
+++ b/src/coding_agent_telegram/resources/locales/en.json
@@ -26,7 +26,8 @@
"git.usage_push": "Usage: /push",
"message.photo_only_codex": "Photo attachments are currently supported only for codex sessions.",
"message.question_queued": "Question queued as Q{question_number}. It will run after the current agent task finishes.",
- "message.unsupported_message_type": "Unsupported message type.\nThis bot currently accepts only text messages and photos.",
+ "message.voice_speech_to_text_disabled": "Voice messages are not enabled.\nSet ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=true and install the local Whisper prerequisites first.",
+ "message.unsupported_message_type": "Unsupported message type.\nThis bot currently accepts text messages, photos, voice messages, and audio files.",
"queue.button_group": "Group the questions",
"queue.button_cancel": "Cancel",
"queue.button_no": "No",
@@ -56,6 +57,12 @@
"runtime.resume_created_new": "Resume failed, so a new session was created.\nNew session ID: {session_id}\nNew session name: {session_name}",
"runtime.resume_id_changed": "Resume succeeded, but the session ID changed.\nNew session ID: {session_id}\nNew session name: {session_name}",
"runtime.sensitive_diff_omitted": "{path}\nThis file contains sensitive content and was omitted.",
+ "runtime.voice_conversion_failed": "Voice conversion failed.",
+ "runtime.voice_conversion_timed_out": "Voice conversion timed out.",
+ "runtime.voice_audio_too_large": "Audio is too large for local speech-to-text. The maximum supported size is {max_size_mb} MB.",
+ "runtime.voice_model_initial_download_note": "The selected Whisper model may still be downloading on first use. Larger models such as turbo are more likely to hit this timeout.",
+ "runtime.voice_transcript_preview": "Recognized voice transcript:\n{transcript}\n\nWorking on it...",
+ "runtime.voice_transcript_queued_preview": "Recognized voice transcript:\n{transcript}\n\nQueued as Q{question_number}. It will run after the current agent task finishes.",
"runtime.working_on_it": "Working on it...",
"status.abort_signal_sent": "Abort signal sent for the current project run.",
"status.no_running_agent": "No running agent process was found for the current project.",
diff --git a/src/coding_agent_telegram/resources/locales/fr.json b/src/coding_agent_telegram/resources/locales/fr.json
index 2fb4187..9700b88 100644
--- a/src/coding_agent_telegram/resources/locales/fr.json
+++ b/src/coding_agent_telegram/resources/locales/fr.json
@@ -26,7 +26,8 @@
"git.usage_push": "Utilisation : /push",
"message.photo_only_codex": "Les pièces jointes photo sont actuellement prises en charge uniquement pour les sessions Codex.",
"message.question_queued": "Question mise en file d’attente sous Q{question_number}. Elle sera traitée une fois la tâche actuelle terminée.",
- "message.unsupported_message_type": "Type de message non pris en charge.\nCe bot accepte actuellement uniquement les messages texte et les photos.",
+ "message.voice_speech_to_text_disabled": "Les messages vocaux ne sont pas activés.\nDéfinissez ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=true et installez d'abord les prérequis locaux de Whisper.",
+ "message.unsupported_message_type": "Type de message non pris en charge.\nCe bot accepte actuellement les messages texte, les photos, les messages vocaux et les fichiers audio.",
"queue.button_group": "Regrouper les questions",
"queue.button_no": "Non",
"queue.button_single": "Traiter une par une",
@@ -54,6 +55,11 @@
"runtime.resume_created_new": "La reprise a échoué, donc une nouvelle session a été créée.\nNouvel ID de session : {session_id}\nNouveau nom de session : {session_name}",
"runtime.resume_id_changed": "La reprise a réussi, mais l’ID de session a changé.\nNouvel ID de session : {session_id}\nNouveau nom de session : {session_name}",
"runtime.sensitive_diff_omitted": "{path}\nCe fichier contient des données sensibles et a été omis.",
+ "runtime.voice_conversion_failed": "La conversion vocale a échoué.",
+ "runtime.voice_conversion_timed_out": "La conversion vocale a dépassé le délai.",
+ "runtime.voice_model_initial_download_note": "Le modèle Whisper sélectionné est peut-être encore en cours de téléchargement lors du premier usage. Les modèles plus volumineux comme turbo risquent davantage d’atteindre ce délai.",
+ "runtime.voice_transcript_preview": "Transcription vocale reconnue :\n{transcript}\n\nTraitement en cours...",
+ "runtime.voice_transcript_queued_preview": "Transcription vocale reconnue :\n{transcript}\n\nMise en file d’attente sous Q{question_number}. Elle sera traitée une fois la tâche actuelle terminée.",
"runtime.working_on_it": "Traitement en cours...",
"status.abort_signal_sent": "Signal d’arrêt envoyé pour l’exécution actuelle du projet.",
"status.no_running_agent": "Aucun processus d’agent en cours n’a été trouvé pour le projet actuel.",
diff --git a/src/coding_agent_telegram/resources/locales/ja.json b/src/coding_agent_telegram/resources/locales/ja.json
index c9aa776..6524730 100644
--- a/src/coding_agent_telegram/resources/locales/ja.json
+++ b/src/coding_agent_telegram/resources/locales/ja.json
@@ -26,7 +26,8 @@
"git.usage_push": "使い方: /push",
"message.photo_only_codex": "写真添付は現在 Codex セッションでのみサポートされています。",
"message.question_queued": "質問は Q{question_number} としてキューに追加されました。現在のエージェント処理が終わった後に実行されます。",
- "message.unsupported_message_type": "未対応のメッセージ種類です。\nこのボットは現在、テキストメッセージと写真のみ受け付けます。",
+ "message.voice_speech_to_text_disabled": "音声メッセージは有効になっていません。\nENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=true を設定し、先にローカル Whisper の前提条件をインストールしてください。",
+ "message.unsupported_message_type": "未対応のメッセージ種類です。\nこのボットは現在、テキストメッセージ、写真、音声メッセージ、音声ファイルを受け付けます。",
"queue.button_group": "質問をまとめる",
"queue.button_no": "いいえ",
"queue.button_single": "1つずつ処理",
@@ -54,6 +55,11 @@
"runtime.resume_created_new": "再開に失敗したため、新しいセッションを作成しました。\n新しいセッション ID: {session_id}\n新しいセッション名: {session_name}",
"runtime.resume_id_changed": "再開には成功しましたが、セッション ID が変わりました。\n新しいセッション ID: {session_id}\n新しいセッション名: {session_name}",
"runtime.sensitive_diff_omitted": "{path}\nこのファイルには機密内容が含まれているため省略されました。",
+ "runtime.voice_conversion_failed": "音声の変換に失敗しました。",
+ "runtime.voice_conversion_timed_out": "音声の変換がタイムアウトしました。",
+ "runtime.voice_model_initial_download_note": "選択した Whisper モデルは初回利用時にまだダウンロード中の可能性があります。turbo のような大きなモデルはこのタイムアウトに達しやすくなります。",
+ "runtime.voice_transcript_preview": "認識された音声文字起こし:\n{transcript}\n\n処理中です...",
+ "runtime.voice_transcript_queued_preview": "認識された音声文字起こし:\n{transcript}\n\nQ{question_number} としてキューに追加されました。現在のエージェント処理が終わった後に実行されます。",
"runtime.working_on_it": "処理中です...",
"status.abort_signal_sent": "現在のプロジェクト実行に中止シグナルを送信しました。",
"status.no_running_agent": "現在のプロジェクトで実行中のエージェントプロセスは見つかりませんでした。",
diff --git a/src/coding_agent_telegram/resources/locales/ko.json b/src/coding_agent_telegram/resources/locales/ko.json
index 5a18c0d..2680cd3 100644
--- a/src/coding_agent_telegram/resources/locales/ko.json
+++ b/src/coding_agent_telegram/resources/locales/ko.json
@@ -26,7 +26,8 @@
"git.usage_push": "사용법: /push",
"message.photo_only_codex": "사진 첨부는 현재 Codex 세션에서만 지원됩니다.",
"message.question_queued": "질문이 Q{question_number} 로 대기열에 추가되었습니다. 현재 에이전트 작업이 끝난 뒤 처리됩니다.",
- "message.unsupported_message_type": "지원되지 않는 메시지 유형입니다.\n이 봇은 현재 텍스트 메시지와 사진만 받습니다.",
+ "message.voice_speech_to_text_disabled": "음성 메시지가 활성화되어 있지 않습니다.\nENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=true 를 설정하고 먼저 로컬 Whisper 필수 요소를 설치하세요.",
+ "message.unsupported_message_type": "지원되지 않는 메시지 유형입니다.\n이 봇은 현재 텍스트 메시지, 사진, 음성 메시지, 오디오 파일을 받습니다.",
"queue.button_group": "질문 묶기",
"queue.button_no": "아니요",
"queue.button_single": "하나씩 처리",
@@ -54,6 +55,11 @@
"runtime.resume_created_new": "재개에 실패하여 새 세션이 생성되었습니다.\n새 세션 ID: {session_id}\n새 세션 이름: {session_name}",
"runtime.resume_id_changed": "재개에는 성공했지만 세션 ID가 변경되었습니다.\n새 세션 ID: {session_id}\n새 세션 이름: {session_name}",
"runtime.sensitive_diff_omitted": "{path}\n이 파일에는 민감한 내용이 포함되어 있어 생략되었습니다.",
+ "runtime.voice_conversion_failed": "음성 변환에 실패했습니다.",
+ "runtime.voice_conversion_timed_out": "음성 변환 시간이 초과되었습니다.",
+ "runtime.voice_model_initial_download_note": "선택한 Whisper 모델이 첫 사용 시 아직 다운로드 중일 수 있습니다. turbo 같은 큰 모델은 이 시간 제한에 더 걸리기 쉽습니다.",
+ "runtime.voice_transcript_preview": "인식된 음성 전사:\n{transcript}\n\n처리 중입니다...",
+ "runtime.voice_transcript_queued_preview": "인식된 음성 전사:\n{transcript}\n\nQ{question_number} 로 대기열에 추가되었습니다. 현재 에이전트 작업이 끝난 뒤 처리됩니다.",
"runtime.working_on_it": "처리 중...",
"status.abort_signal_sent": "현재 프로젝트 실행에 중단 신호를 보냈습니다.",
"status.no_running_agent": "현재 프로젝트에서 실행 중인 에이전트 프로세스를 찾지 못했습니다.",
diff --git a/src/coding_agent_telegram/resources/locales/nl.json b/src/coding_agent_telegram/resources/locales/nl.json
index ca860b9..401388b 100644
--- a/src/coding_agent_telegram/resources/locales/nl.json
+++ b/src/coding_agent_telegram/resources/locales/nl.json
@@ -26,7 +26,8 @@
"git.usage_push": "Gebruik: /push",
"message.photo_only_codex": "Foto-bijlagen worden momenteel alleen ondersteund voor Codex-sessies.",
"message.question_queued": "Vraag in de wachtrij geplaatst als Q{question_number}. Deze wordt verwerkt nadat de huidige agenttaak is voltooid.",
- "message.unsupported_message_type": "Niet-ondersteund berichttype.\nDeze bot accepteert momenteel alleen tekstberichten en foto's.",
+ "message.voice_speech_to_text_disabled": "Spraakberichten zijn niet ingeschakeld.\nZet ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=true en installeer eerst de lokale Whisper-vereisten.",
+ "message.unsupported_message_type": "Niet-ondersteund berichttype.\nDeze bot accepteert momenteel tekstberichten, foto's, spraakberichten en audiobestanden.",
"queue.button_group": "Vragen groeperen",
"queue.button_no": "Nee",
"queue.button_single": "Eén voor één verwerken",
@@ -54,6 +55,11 @@
"runtime.resume_created_new": "Hervatten is mislukt, daarom is een nieuwe sessie gemaakt.\nNieuwe sessie-ID: {session_id}\nNieuwe sessienaam: {session_name}",
"runtime.resume_id_changed": "Hervatten is gelukt, maar de sessie-ID is gewijzigd.\nNieuwe sessie-ID: {session_id}\nNieuwe sessienaam: {session_name}",
"runtime.sensitive_diff_omitted": "{path}\nDit bestand bevat gevoelige inhoud en is weggelaten.",
+ "runtime.voice_conversion_failed": "Spraakconversie mislukt.",
+ "runtime.voice_conversion_timed_out": "Time-out tijdens spraakconversie.",
+ "runtime.voice_model_initial_download_note": "Het gekozen Whisper-model wordt bij het eerste gebruik mogelijk nog gedownload. Grotere modellen zoals turbo lopen eerder tegen deze time-out aan.",
+ "runtime.voice_transcript_preview": "Herkend spraaktranscript:\n{transcript}\n\nBezig...",
+ "runtime.voice_transcript_queued_preview": "Herkend spraaktranscript:\n{transcript}\n\nIn de wachtrij geplaatst als Q{question_number}. Dit wordt verwerkt nadat de huidige agenttaak is voltooid.",
"runtime.working_on_it": "Bezig...",
"status.abort_signal_sent": "Afbreeksignaal verzonden voor de huidige projectrun.",
"status.no_running_agent": "Er is geen draaiend agentproces gevonden voor het huidige project.",
diff --git a/src/coding_agent_telegram/resources/locales/th.json b/src/coding_agent_telegram/resources/locales/th.json
index 9c4416e..449a3e0 100644
--- a/src/coding_agent_telegram/resources/locales/th.json
+++ b/src/coding_agent_telegram/resources/locales/th.json
@@ -26,7 +26,8 @@
"git.usage_push": "วิธีใช้: /push",
"message.photo_only_codex": "ขณะนี้รองรับไฟล์แนบรูปภาพเฉพาะสำหรับเซสชัน Codex เท่านั้น",
"message.question_queued": "จัดคิวคำถามเป็น Q{question_number} แล้ว จะประมวลผลหลังจากงานเอเจนต์ปัจจุบันเสร็จสิ้น",
- "message.unsupported_message_type": "ประเภทข้อความไม่รองรับ\nขณะนี้บอตนี้รองรับเฉพาะข้อความตัวอักษรและรูปภาพเท่านั้น",
+ "message.voice_speech_to_text_disabled": "ยังไม่ได้เปิดใช้งานข้อความเสียง\nตั้งค่า ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=true และติดตั้งส่วนที่ Whisper ต้องใช้ในเครื่องก่อน",
+ "message.unsupported_message_type": "ประเภทข้อความไม่รองรับ\nขณะนี้บอตนี้รองรับข้อความตัวอักษร รูปภาพ ข้อความเสียง และไฟล์เสียง",
"queue.button_group": "รวมคำถาม",
"queue.button_no": "ไม่",
"queue.button_single": "ประมวลผลทีละข้อ",
@@ -54,6 +55,11 @@
"runtime.resume_created_new": "กลับมาทำงานต่อไม่สำเร็จ จึงสร้างเซสชันใหม่แทน\nSession ID ใหม่: {session_id}\nชื่อเซสชันใหม่: {session_name}",
"runtime.resume_id_changed": "กลับมาทำงานต่อได้สำเร็จ แต่ session ID เปลี่ยนไป\nSession ID ใหม่: {session_id}\nชื่อเซสชันใหม่: {session_name}",
"runtime.sensitive_diff_omitted": "{path}\nไฟล์นี้มีข้อมูลสำคัญจึงถูกละไว้",
+ "runtime.voice_conversion_failed": "แปลงเสียงเป็นข้อความไม่สำเร็จ",
+ "runtime.voice_conversion_timed_out": "การแปลงเสียงเป็นข้อความหมดเวลา",
+ "runtime.voice_model_initial_download_note": "โมเดล Whisper ที่เลือกอาจกำลังดาวน์โหลดอยู่ในการใช้งานครั้งแรก โมเดลขนาดใหญ่เช่น turbo มีโอกาสเจอ timeout นี้มากกว่า",
+ "runtime.voice_transcript_preview": "ข้อความที่ถอดจากเสียง:\n{transcript}\n\nกำลังดำเนินการ...",
+ "runtime.voice_transcript_queued_preview": "ข้อความที่ถอดจากเสียง:\n{transcript}\n\nจัดคิวเป็น Q{question_number} แล้ว จะประมวลผลหลังจากงานเอเจนต์ปัจจุบันเสร็จสิ้น",
"runtime.working_on_it": "กำลังดำเนินการ...",
"status.abort_signal_sent": "ส่งสัญญาณยกเลิกสำหรับการทำงานของโปรเจ็กต์ปัจจุบันแล้ว",
"status.no_running_agent": "ไม่พบโปรเซสเอเจนต์ที่กำลังทำงานสำหรับโปรเจ็กต์ปัจจุบัน",
diff --git a/src/coding_agent_telegram/resources/locales/vi.json b/src/coding_agent_telegram/resources/locales/vi.json
index 6553de1..80e7fd9 100644
--- a/src/coding_agent_telegram/resources/locales/vi.json
+++ b/src/coding_agent_telegram/resources/locales/vi.json
@@ -26,7 +26,8 @@
"git.usage_push": "Cách dùng: /push",
"message.photo_only_codex": "Hiện tại tệp đính kèm ảnh chỉ được hỗ trợ cho các phiên Codex.",
"message.question_queued": "Câu hỏi đã được xếp hàng dưới dạng Q{question_number}. Nó sẽ được xử lý sau khi tác vụ hiện tại của tác nhân hoàn tất.",
- "message.unsupported_message_type": "Loại tin nhắn không được hỗ trợ.\nBot này hiện chỉ chấp nhận tin nhắn văn bản và ảnh.",
+ "message.voice_speech_to_text_disabled": "Tin nhắn thoại chưa được bật.\nHãy đặt ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=true và cài đặt trước các điều kiện cần cục bộ của Whisper.",
+ "message.unsupported_message_type": "Loại tin nhắn không được hỗ trợ.\nBot này hiện chấp nhận tin nhắn văn bản, ảnh, tin nhắn thoại và tệp âm thanh.",
"queue.button_group": "Gộp các câu hỏi",
"queue.button_no": "Không",
"queue.button_single": "Xử lý từng câu một",
@@ -54,6 +55,11 @@
"runtime.resume_created_new": "Tiếp tục thất bại, vì vậy một phiên mới đã được tạo.\nID phiên mới: {session_id}\nTên phiên mới: {session_name}",
"runtime.resume_id_changed": "Tiếp tục thành công, nhưng ID phiên đã thay đổi.\nID phiên mới: {session_id}\nTên phiên mới: {session_name}",
"runtime.sensitive_diff_omitted": "{path}\nTệp này chứa nội dung nhạy cảm và đã bị lược bỏ.",
+ "runtime.voice_conversion_failed": "Chuyển giọng nói thành văn bản thất bại.",
+ "runtime.voice_conversion_timed_out": "Chuyển giọng nói thành văn bản đã hết thời gian chờ.",
+ "runtime.voice_model_initial_download_note": "Model Whisper đã chọn có thể vẫn đang được tải xuống ở lần dùng đầu tiên. Các model lớn như turbo dễ chạm mốc timeout này hơn.",
+ "runtime.voice_transcript_preview": "Bản chép lời giọng nói đã nhận dạng:\n{transcript}\n\nĐang xử lý...",
+ "runtime.voice_transcript_queued_preview": "Bản chép lời giọng nói đã nhận dạng:\n{transcript}\n\nCâu hỏi đã được xếp hàng dưới dạng Q{question_number}. Nó sẽ được xử lý sau khi tác vụ hiện tại của tác nhân hoàn tất.",
"runtime.working_on_it": "Đang xử lý...",
"status.abort_signal_sent": "Đã gửi tín hiệu hủy cho lần chạy hiện tại của dự án.",
"status.no_running_agent": "Không tìm thấy tiến trình tác nhân đang chạy cho dự án hiện tại.",
diff --git a/src/coding_agent_telegram/resources/locales/zh-CN.json b/src/coding_agent_telegram/resources/locales/zh-CN.json
index 8268d6d..65c2959 100644
--- a/src/coding_agent_telegram/resources/locales/zh-CN.json
+++ b/src/coding_agent_telegram/resources/locales/zh-CN.json
@@ -26,7 +26,8 @@
"git.usage_push": "用法:/push",
"message.photo_only_codex": "当前仅 Codex 会话支持图片附件。",
"message.question_queued": "问题已加入队列,编号为 Q{question_number}。当前代理任务完成后将开始处理。",
- "message.unsupported_message_type": "不支持的消息类型。\n此 bot 当前仅接受文本消息和图片。",
+ "message.voice_speech_to_text_disabled": "语音消息功能尚未启用。\n请先设置 ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=true,并安装本地 Whisper 依赖。",
+ "message.unsupported_message_type": "不支持的消息类型。\n此 bot 当前接受文本消息、图片、语音消息和音频文件。",
"queue.button_group": "合并问题",
"queue.button_no": "否",
"queue.button_single": "逐个处理",
@@ -54,6 +55,11 @@
"runtime.resume_created_new": "恢复失败,因此已创建一个新会话。\n新的会话 ID:{session_id}\n新的会话名称:{session_name}",
"runtime.resume_id_changed": "恢复成功,但会话 ID 已更改。\n新的会话 ID:{session_id}\n新的会话名称:{session_name}",
"runtime.sensitive_diff_omitted": "{path}\n此文件包含敏感内容,已省略。",
+ "runtime.voice_conversion_failed": "语音转换失败。",
+ "runtime.voice_conversion_timed_out": "语音转换超时。",
+ "runtime.voice_model_initial_download_note": "所选 Whisper 模型在首次使用时可能仍在下载。像 turbo 这样更大的模型更容易触发这个超时。",
+ "runtime.voice_transcript_preview": "识别出的语音文本:\n{transcript}\n\n正在处理...",
+ "runtime.voice_transcript_queued_preview": "识别出的语音文本:\n{transcript}\n\n问题已加入队列,编号为 Q{question_number}。当前代理任务完成后将开始处理。",
"runtime.working_on_it": "正在处理...",
"status.abort_signal_sent": "已向当前项目运行发送中止信号。",
"status.no_running_agent": "当前项目未找到正在运行的代理进程。",
diff --git a/src/coding_agent_telegram/resources/locales/zh-HK.json b/src/coding_agent_telegram/resources/locales/zh-HK.json
index 5aaf6b5..8c4332f 100644
--- a/src/coding_agent_telegram/resources/locales/zh-HK.json
+++ b/src/coding_agent_telegram/resources/locales/zh-HK.json
@@ -26,7 +26,8 @@
"git.usage_push": "用法:/push",
"message.photo_only_codex": "目前只有 Codex 工作階段支援圖片附件。",
"message.question_queued": "問題已加入佇列,編號為 Q{question_number}。目前代理工作完成後會開始處理。",
- "message.unsupported_message_type": "不支援的訊息類型。\n此 bot 目前只接受文字訊息與圖片。",
+ "message.voice_speech_to_text_disabled": "語音訊息功能尚未啟用。\n請先設定 ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=true,並安裝本機 Whisper 依賴。",
+ "message.unsupported_message_type": "不支援的訊息類型。\n此 bot 目前接受文字訊息、圖片、語音訊息與音訊檔案。",
"queue.button_group": "合併問題",
"queue.button_no": "否",
"queue.button_single": "逐一處理",
@@ -54,6 +55,11 @@
"runtime.resume_created_new": "恢復失敗,因此已建立新的工作階段。\n新的工作階段 ID:{session_id}\n新的工作階段名稱:{session_name}",
"runtime.resume_id_changed": "恢復成功,但工作階段 ID 已變更。\n新的工作階段 ID:{session_id}\n新的工作階段名稱:{session_name}",
"runtime.sensitive_diff_omitted": "{path}\n此檔案包含敏感內容,已省略。",
+ "runtime.voice_conversion_failed": "語音轉換失敗。",
+ "runtime.voice_conversion_timed_out": "語音轉換逾時。",
+ "runtime.voice_model_initial_download_note": "所選的 Whisper 模型在首次使用時可能仍在下載。像 turbo 這類較大的模型更容易觸發這個逾時。",
+ "runtime.voice_transcript_preview": "辨識出的語音文字:\n{transcript}\n\n處理中...",
+ "runtime.voice_transcript_queued_preview": "辨識出的語音文字:\n{transcript}\n\n問題已加入佇列,編號為 Q{question_number}。目前代理工作完成後會開始處理。",
"runtime.working_on_it": "處理中...",
"status.abort_signal_sent": "已向目前專案執行送出中止訊號。",
"status.no_running_agent": "目前專案找不到正在執行的代理程序。",
diff --git a/src/coding_agent_telegram/resources/locales/zh-TW.json b/src/coding_agent_telegram/resources/locales/zh-TW.json
index 399a65a..ef3dc35 100644
--- a/src/coding_agent_telegram/resources/locales/zh-TW.json
+++ b/src/coding_agent_telegram/resources/locales/zh-TW.json
@@ -26,7 +26,8 @@
"git.usage_push": "用法:/push",
"message.photo_only_codex": "目前只有 Codex 工作階段支援圖片附件。",
"message.question_queued": "問題已加入佇列,編號為 Q{question_number}。目前代理工作完成後會開始處理。",
- "message.unsupported_message_type": "不支援的訊息類型。\n此 bot 目前只接受文字訊息與圖片。",
+ "message.voice_speech_to_text_disabled": "語音訊息功能尚未啟用。\n請先設定 ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT=true,並安裝本機 Whisper 依賴。",
+ "message.unsupported_message_type": "不支援的訊息類型。\n此 bot 目前接受文字訊息、圖片、語音訊息與音訊檔案。",
"queue.button_group": "合併問題",
"queue.button_no": "否",
"queue.button_single": "逐一處理",
@@ -54,6 +55,11 @@
"runtime.resume_created_new": "恢復失敗,因此已建立新的工作階段。\n新的工作階段 ID:{session_id}\n新的工作階段名稱:{session_name}",
"runtime.resume_id_changed": "恢復成功,但工作階段 ID 已變更。\n新的工作階段 ID:{session_id}\n新的工作階段名稱:{session_name}",
"runtime.sensitive_diff_omitted": "{path}\n此檔案包含敏感內容,已省略。",
+ "runtime.voice_conversion_failed": "語音轉換失敗。",
+ "runtime.voice_conversion_timed_out": "語音轉換逾時。",
+ "runtime.voice_model_initial_download_note": "所選的 Whisper 模型在首次使用時可能仍在下載。像 turbo 這類較大的模型更容易觸發這個逾時。",
+ "runtime.voice_transcript_preview": "辨識出的語音文字:\n{transcript}\n\n處理中...",
+ "runtime.voice_transcript_queued_preview": "辨識出的語音文字:\n{transcript}\n\n問題已加入佇列,編號為 Q{question_number}。目前代理工作完成後會開始處理。",
"runtime.working_on_it": "處理中...",
"status.abort_signal_sent": "已向目前專案執行送出中止訊號。",
"status.no_running_agent": "目前專案找不到正在執行的代理程序。",
diff --git a/src/coding_agent_telegram/router/base.py b/src/coding_agent_telegram/router/base.py
index 16e7817..17866f4 100644
--- a/src/coding_agent_telegram/router/base.py
+++ b/src/coding_agent_telegram/router/base.py
@@ -25,6 +25,7 @@
from coding_agent_telegram.i18n import translate
from coding_agent_telegram.session_runtime import PhotoAttachmentStore, SessionRuntime
from coding_agent_telegram.session_store import SessionStore
+from coding_agent_telegram.speech_to_text import WhisperSpeechToText
from coding_agent_telegram.telegram_sender import send_text
@@ -108,6 +109,7 @@ def __init__(self, deps: RouterDeps) -> None:
self.deps = deps
self.git = GitWorkspaceManager()
self.photo_attachments = PhotoAttachmentStore(deps.cfg.app_internal_root)
+ self.speech_to_text = WhisperSpeechToText(deps.cfg)
self.runtime = SessionRuntime(
cfg=deps.cfg,
store=deps.store,
@@ -224,6 +226,7 @@ async def _notify_if_current_project_busy(self, update: Update, context: Context
update,
context,
self._t(update, "common.project_busy", project_folder=project_folder),
+ reply_to_message_id=getattr(update.message, "message_id", None),
)
return True
@@ -245,6 +248,7 @@ async def _run_with_typing(self, update: Update, context: ContextTypes.DEFAULT_T
update,
context,
self._t(update, "common.agent_already_running", project_folder=workspace_lock_key),
+ reply_to_message_id=getattr(update.message, "message_id", None),
)
return None
async with lock:
@@ -350,8 +354,18 @@ async def publish(info: AgentProgressInfo) -> None:
text=message_text,
)
except BadRequest:
+ previous_message_id = progress_state["message_id"]
message = await context.bot.send_message(chat_id=chat.id, text=message_text)
message_id = getattr(message, "message_id", None)
+ if (
+ previous_message_id is not None
+ and previous_message_id != message_id
+ and hasattr(context.bot, "delete_message")
+ ):
+ try:
+ await context.bot.delete_message(chat_id=chat.id, message_id=previous_message_id)
+ except BadRequest:
+ pass
if progress_state.get("closed") and message_id is not None and hasattr(context.bot, "delete_message"):
try:
await context.bot.delete_message(chat_id=chat.id, message_id=message_id)
diff --git a/src/coding_agent_telegram/router/message_commands.py b/src/coding_agent_telegram/router/message_commands.py
index a514c35..23a7dd1 100644
--- a/src/coding_agent_telegram/router/message_commands.py
+++ b/src/coding_agent_telegram/router/message_commands.py
@@ -1,34 +1,61 @@
from __future__ import annotations
+import logging
+import tempfile
+from pathlib import Path
+
from telegram import Update
from telegram.ext import ContextTypes
from coding_agent_telegram.session_runtime import PhotoAttachmentError
+from coding_agent_telegram.speech_to_text import SpeechToTextError
from coding_agent_telegram.telegram_sender import send_text
from .base import require_allowed_chat
+logger = logging.getLogger(__name__)
+MAX_STT_AUDIO_BYTES = 20 * 1024 * 1024
+
+
class MessageCommandMixin:
- @require_allowed_chat()
- async def handle_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
- if update.message is None or not update.message.text:
- return
- user_message = update.message.text
+ async def _process_user_message(
+ self,
+ update: Update,
+ context: ContextTypes.DEFAULT_TYPE,
+ user_message: str,
+ *,
+ suppress_working_notice: bool = False,
+ ) -> None:
chat_id = update.effective_chat.id
- if self._is_project_busy(chat_id) or self._has_pending_queue_decision(chat_id):
- _queue_file, question_number = self._enqueue_chat_message(chat_id, user_message)
+ pending_action = self._pending_action(chat_id)
+ message_pending = isinstance(pending_action, dict) and pending_action.get("kind") == "message"
+ if self._is_project_busy(chat_id) or self._has_pending_queue_decision(chat_id) or message_pending:
+ _queue_file, question_number = self._enqueue_chat_message(
+ chat_id,
+ user_message,
+ reply_to_message_id=getattr(update.message, "message_id", None),
+ )
+ logger.info(
+ "Queued user message for chat %s as Q%s. Preview: %.120r",
+ chat_id,
+ question_number,
+ user_message,
+ )
await send_text(
update,
context,
self._t(update, "message.question_queued", question_number=question_number),
+ reply_to_message_id=getattr(update.message, "message_id", None),
)
return
+ logger.info("Processing user message immediately for chat %s. Preview: %.120r", chat_id, user_message)
self._store_pending_action(
chat_id,
{
"kind": "message",
"user_message": user_message,
+ "suppress_working_notice": suppress_working_notice,
},
)
try:
@@ -37,6 +64,12 @@ async def handle_message(self, update: Update, context: ContextTypes.DEFAULT_TYP
finally:
await self._drain_chat_message_queue(chat_id, context)
+ @require_allowed_chat()
+ async def handle_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
+ if update.message is None or not update.message.text:
+ return
+ await self._process_user_message(update, context, update.message.text)
+
@require_allowed_chat()
async def handle_photo(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
if update.message is None or not update.message.photo:
@@ -60,8 +93,186 @@ async def handle_photo(self, update: Update, context: ContextTypes.DEFAULT_TYPE)
prompt = self.photo_attachments.build_prompt(attachment_path, project_path, caption)
await self.runtime.run_active_session(update, context, user_message=prompt, image_paths=(attachment_path,))
+ async def _handle_audio_like(
+ self,
+ update: Update,
+ context: ContextTypes.DEFAULT_TYPE,
+ telegram_media,
+ *,
+ media_kind: str,
+ ) -> None:
+ if update.message is None or telegram_media is None:
+ return
+ logger.info(
+ "Received Telegram %s message for speech-to-text in chat %s.",
+ media_kind,
+ update.effective_chat.id if update.effective_chat is not None else "unknown",
+ )
+ if not self.speech_to_text.enabled:
+ await send_text(update, context, self._t(update, "message.voice_speech_to_text_disabled"))
+ return
+
+ suffix = Path(
+ getattr(telegram_media, "file_name", "") or getattr(telegram_media, "file_unique_id", "") or media_kind
+ ).suffix or ".ogg"
+ telegram_file = await telegram_media.get_file()
+ logger.debug(
+ "Speech-to-text input prepared for chat %s: media_kind=%s file_path=%r initial_suffix=%r model=%s timeout=%ss",
+ update.effective_chat.id if update.effective_chat is not None else "unknown",
+ media_kind,
+ getattr(telegram_file, "file_path", None),
+ suffix,
+ self.speech_to_text.model,
+ self.speech_to_text.timeout_seconds,
+ )
+ if suffix == ".ogg" and getattr(telegram_file, "file_path", None):
+ resolved_suffix = Path(telegram_file.file_path).suffix.lower()
+ if resolved_suffix:
+ suffix = resolved_suffix
+
+ declared_size = getattr(telegram_media, "file_size", None)
+ if isinstance(declared_size, int) and declared_size > MAX_STT_AUDIO_BYTES:
+ await send_text(
+ update,
+ context,
+ self._t(
+ update,
+ "runtime.voice_audio_too_large",
+ max_size_mb=MAX_STT_AUDIO_BYTES // (1024 * 1024),
+ ),
+ )
+ return
+
+ with tempfile.NamedTemporaryFile(prefix="coding-agent-telegram-voice-", suffix=suffix, delete=False) as handle:
+ temp_path = Path(handle.name)
+ try:
+ content = bytes(await telegram_file.download_as_bytearray())
+ if len(content) > MAX_STT_AUDIO_BYTES:
+ await send_text(
+ update,
+ context,
+ self._t(
+ update,
+ "runtime.voice_audio_too_large",
+ max_size_mb=MAX_STT_AUDIO_BYTES // (1024 * 1024),
+ ),
+ )
+ return
+ temp_path.write_bytes(content)
+ logger.debug(
+ "Downloaded Telegram %s message for chat %s to %s (%s bytes).",
+ media_kind,
+ update.effective_chat.id if update.effective_chat is not None else "unknown",
+ temp_path,
+ len(content),
+ )
+ result = await self._run_with_typing(
+ update,
+ context,
+ self.speech_to_text.transcribe_file,
+ temp_path,
+ )
+ except SpeechToTextError as exc:
+ logger.warning(
+ "Telegram %s speech-to-text failed for chat %s: code=%s detail=%s",
+ media_kind,
+ update.effective_chat.id if update.effective_chat is not None else "unknown",
+ exc.code,
+ exc.detail or "(none)",
+ )
+ if exc.code == "timeout":
+ message = self._t(update, "runtime.voice_conversion_timed_out")
+ else:
+ message = self._t(update, "runtime.voice_conversion_failed")
+ if exc.likely_first_download:
+ message = f"{message}\n\n{self._t(update, 'runtime.voice_model_initial_download_note')}"
+ await send_text(update, context, message)
+ return
+ except Exception:
+ logger.exception(
+ "Unexpected Telegram %s speech-to-text failure for chat %s.",
+ media_kind,
+ update.effective_chat.id if update.effective_chat is not None else "unknown",
+ )
+ await send_text(update, context, self._t(update, "runtime.voice_conversion_failed"))
+ return
+ finally:
+ temp_path.unlink(missing_ok=True)
+
+ if result is None:
+ return
+ chat_id = update.effective_chat.id
+ logger.info(
+ "Speech-to-text succeeded for Telegram %s message in chat %s. Transcript preview: %.120r",
+ media_kind,
+ chat_id,
+ result.text,
+ )
+ logger.debug(
+ "Transcript metadata for chat %s: media_kind=%s chars=%s reply_to_message_id=%s",
+ chat_id,
+ media_kind,
+ len(result.text),
+ getattr(update.message, "message_id", None),
+ )
+ pending_action = self._pending_action(chat_id)
+ message_pending = isinstance(pending_action, dict) and pending_action.get("kind") == "message"
+ if self._is_project_busy(chat_id) or self._has_pending_queue_decision(chat_id) or message_pending:
+ _queue_file, question_number = self._enqueue_chat_message(
+ chat_id,
+ result.text,
+ reply_to_message_id=getattr(update.message, "message_id", None),
+ )
+ logger.info(
+ "Queued transcript from Telegram %s message for chat %s as Q%s.",
+ media_kind,
+ chat_id,
+ question_number,
+ )
+ await send_text(
+ update,
+ context,
+ self._t(
+ update,
+ "runtime.voice_transcript_queued_preview",
+ transcript=result.text,
+ question_number=question_number,
+ ),
+ )
+ return
+ logger.info("Dispatching transcript from Telegram %s message immediately for chat %s.", media_kind, chat_id)
+ await send_text(
+ update,
+ context,
+ self._t(update, "runtime.voice_transcript_preview", transcript=result.text),
+ )
+ await self._process_user_message(update, context, result.text, suppress_working_notice=True)
+
+ @require_allowed_chat()
+ async def handle_voice(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
+ if update.message is None or not update.message.voice:
+ return
+ await self._handle_audio_like(update, context, update.message.voice, media_kind="voice")
+
+ @require_allowed_chat()
+ async def handle_audio(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
+ if update.message is None or not update.message.audio:
+ return
+ await self._handle_audio_like(update, context, update.message.audio, media_kind="audio")
+
@require_allowed_chat()
async def handle_unsupported_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
+ if update.message is not None:
+ unsupported_types = [
+ field_name
+ for field_name in ("animation", "audio", "document", "sticker", "video", "video_note")
+ if getattr(update.message, field_name, None) is not None
+ ]
+ logger.info(
+ "Unsupported Telegram message type from chat %s: %s",
+ update.effective_chat.id if update.effective_chat is not None else "unknown",
+ ", ".join(unsupported_types) or "unknown",
+ )
await send_text(
update,
context,
diff --git a/src/coding_agent_telegram/router/queue_processing.py b/src/coding_agent_telegram/router/queue_processing.py
index 00a44b5..a227f04 100644
--- a/src/coding_agent_telegram/router/queue_processing.py
+++ b/src/coding_agent_telegram/router/queue_processing.py
@@ -1,7 +1,9 @@
from __future__ import annotations
+import logging
import re
from collections import deque
+from dataclasses import dataclass
from pathlib import Path
from types import SimpleNamespace
@@ -12,6 +14,13 @@
QUEUED_QUESTIONS_DIR = "queued_questions"
+logger = logging.getLogger(__name__)
+
+
+@dataclass(frozen=True)
+class QueuedQuestion:
+ text: str
+ reply_to_message_id: int | None = None
class QueueProcessingMixin:
@@ -38,38 +47,89 @@ def _next_queue_file_path(self, chat_id: int) -> Path:
session_id = self._sanitize_queue_session_id(str(chat_state.get("active_session_id") or "session"))
return queue_dir / f"{session_id}-queue-{next_index}.txt"
- def _read_queue_questions(self, queue_file: Path) -> list[str]:
+ def _read_queue_questions(self, queue_file: Path) -> list[QueuedQuestion]:
if not queue_file.exists():
return []
raw = queue_file.read_text(encoding="utf-8")
- pattern = re.compile(r"^\[Question (\d+)\]\n(.*?)\n\[End Question \1\]\s*$", re.MULTILINE | re.DOTALL)
- return [match.group(2).strip() for match in pattern.finditer(raw) if match.group(2).strip()]
+ pattern = re.compile(
+ r"^\[Question (\d+)\]\n(?:\[ReplyToMessageId (\d+)\]\n)?(.*?)\n\[End Question \1\]\s*$",
+ re.MULTILINE | re.DOTALL,
+ )
+ questions: list[QueuedQuestion] = []
+ for match in pattern.finditer(raw):
+ text = match.group(3).strip()
+ if not text:
+ continue
+ questions.append(
+ QueuedQuestion(
+ text=text,
+ reply_to_message_id=int(match.group(2)) if match.group(2) else None,
+ )
+ )
+ logger.debug("Loaded %s queued question(s) from %s.", len(questions), queue_file)
+ return questions
- def _append_question_to_queue_file(self, queue_file: Path, user_message: str) -> int:
+ def _append_question_to_queue_file(
+ self,
+ queue_file: Path,
+ user_message: str,
+ *,
+ reply_to_message_id: int | None = None,
+ ) -> int:
questions = self._read_queue_questions(queue_file)
next_number = len(questions) + 1
with queue_file.open("a", encoding="utf-8") as fh:
if queue_file.stat().st_size > 0:
fh.write("\n")
- fh.write(f"[Question {next_number}]\n{user_message.strip()}\n[End Question {next_number}]\n")
+ fh.write(f"[Question {next_number}]\n")
+ if reply_to_message_id is not None:
+ fh.write(f"[ReplyToMessageId {reply_to_message_id}]\n")
+ fh.write(f"{user_message.strip()}\n[End Question {next_number}]\n")
+ logger.debug(
+ "Appended queued question Q%s to %s with reply_to_message_id=%s.",
+ next_number,
+ queue_file,
+ reply_to_message_id,
+ )
return next_number
- def _write_queue_questions(self, queue_file: Path, questions: list[str]) -> None:
+ def _write_queue_questions(self, queue_file: Path, questions: list[QueuedQuestion]) -> None:
with queue_file.open("w", encoding="utf-8") as fh:
for index, question in enumerate(questions, start=1):
if index > 1:
fh.write("\n")
- fh.write(f"[Question {index}]\n{question.strip()}\n[End Question {index}]\n")
+ fh.write(f"[Question {index}]\n")
+ if question.reply_to_message_id is not None:
+ fh.write(f"[ReplyToMessageId {question.reply_to_message_id}]\n")
+ fh.write(f"{question.text.strip()}\n[End Question {index}]\n")
+ logger.debug("Rewrote %s queued question(s) to %s.", len(questions), queue_file)
- def _enqueue_chat_message(self, chat_id: int, user_message: str) -> tuple[Path, int]:
+ def _enqueue_chat_message(
+ self,
+ chat_id: int,
+ user_message: str,
+ *,
+ reply_to_message_id: int | None = None,
+ ) -> tuple[Path, int]:
queue = self._chat_message_queue_files.setdefault(chat_id, deque())
queue_file = queue[-1] if queue else self._next_queue_file_path(chat_id)
if not queue:
queue.append(queue_file)
- question_number = self._append_question_to_queue_file(queue_file, user_message)
+ question_number = self._append_question_to_queue_file(
+ queue_file,
+ user_message,
+ reply_to_message_id=reply_to_message_id,
+ )
+ logger.debug(
+ "Queued message for chat %s in %s as Q%s with reply_to_message_id=%s.",
+ chat_id,
+ queue_file,
+ question_number,
+ reply_to_message_id,
+ )
return queue_file, question_number
- def _dequeue_chat_message_file(self, chat_id: int) -> tuple[Path | None, list[str]]:
+ def _dequeue_chat_message_file(self, chat_id: int) -> tuple[Path | None, list[QueuedQuestion]]:
queue = self._chat_message_queue_files.get(chat_id)
if not queue:
return None, []
@@ -81,12 +141,13 @@ def _dequeue_chat_message_file(self, chat_id: int) -> tuple[Path | None, list[st
return None, []
if not queue:
self._chat_message_queue_files.pop(chat_id, None)
+ logger.debug("Dequeued %s queued question(s) for chat %s from %s.", len(questions), chat_id, queue_file)
return queue_file, questions
- def _queued_batch_prompt(self, queued_messages: list[str]) -> str:
+ def _queued_batch_prompt(self, queued_messages: list[QueuedQuestion]) -> str:
lines = ["Answer the following queued user questions in order."]
for index, message in enumerate(queued_messages, start=1):
- lines.extend(["", f"[Question {index}]", message.strip(), f"[End Question {index}]"])
+ lines.extend(["", f"[Question {index}]", message.text.strip(), f"[End Question {index}]"])
return "\n".join(lines)
def _preview_queued_message(self, message: str, *, max_chars: int = 100) -> str:
@@ -97,10 +158,10 @@ def _preview_queued_message(self, message: str, *, max_chars: int = 100) -> str:
return stripped[:max_chars]
return f"{stripped[: max_chars - 3]}..."
- def _queued_batch_notice(self, chat_id: int, queued_messages: list[str]) -> str:
+ def _queued_batch_notice(self, chat_id: int, queued_messages: list[QueuedQuestion]) -> str:
lines = [translate(self._chat_locale(chat_id), "queue.working_on_queued")]
for index, message in enumerate(queued_messages, start=1):
- lines.append(f"{index}. {self._preview_queued_message(message)}")
+ lines.append(f"{index}. {self._preview_queued_message(message.text)}")
return "\n".join(lines)
def _has_pending_queue_decision(self, chat_id: int) -> bool:
@@ -134,7 +195,7 @@ async def _prompt_queue_batch_decision(
self,
chat_id: int,
context: ContextTypes.DEFAULT_TYPE,
- queued_messages: list[str],
+ queued_messages: list[QueuedQuestion],
) -> None:
if not hasattr(context.bot, "send_message"):
return
@@ -147,7 +208,7 @@ async def _prompt_queue_batch_decision(
translate(locale, "queue.here_are_questions"),
]
for index, message in enumerate(queued_messages, start=1):
- lines.append(f"Q{index}: {self._preview_queued_message(message)}")
+ lines.append(f"Q{index}: {self._preview_queued_message(message.text)}")
lines.extend(
[
"",
@@ -190,7 +251,7 @@ async def _dispatch_queued_questions(
context: ContextTypes.DEFAULT_TYPE,
*,
queue_file: Path,
- queued_messages: list[str],
+ queued_messages: list[QueuedQuestion],
grouped: bool,
) -> bool:
self._chat_processing_queue_files[chat_id] = queue_file
@@ -199,16 +260,25 @@ async def _dispatch_queued_questions(
queued_notice = self._queued_batch_notice(chat_id, current_batch)
queued_update = SimpleNamespace(
effective_chat=SimpleNamespace(id=chat_id, type="private"),
- message=SimpleNamespace(text=queued_notice, photo=None, caption=None),
+ message=SimpleNamespace(text=queued_notice, photo=None, caption=None, message_id=None),
)
await send_text(queued_update, context, queued_notice)
if grouped:
user_message = self._queued_batch_prompt(queued_messages)
+ reply_to_message_id = None
else:
- user_message = queued_messages[0]
+ user_message = queued_messages[0].text
+ reply_to_message_id = queued_messages[0].reply_to_message_id
+ logger.debug(
+ "Dispatching queued question(s) for chat %s grouped=%s count=%s reply_to_message_id=%s.",
+ chat_id,
+ grouped,
+ len(queued_messages),
+ reply_to_message_id,
+ )
queued_update = SimpleNamespace(
effective_chat=SimpleNamespace(id=chat_id, type="private"),
- message=SimpleNamespace(text=user_message, photo=None, caption=None),
+ message=SimpleNamespace(text=user_message, photo=None, caption=None, message_id=reply_to_message_id),
)
self.deps.store.set_pending_action(
self.deps.bot_id,
@@ -234,11 +304,19 @@ async def _dispatch_queued_questions(
async def _drain_chat_message_queue(self, chat_id: int, context: ContextTypes.DEFAULT_TYPE) -> None:
if chat_id in self._chat_message_queue_draining:
+ logger.debug("Queue drain already active for chat %s; skipping nested call.", chat_id)
return
self._chat_message_queue_draining.add(chat_id)
try:
while True:
if self._is_project_busy(chat_id):
+ logger.debug("Stopping queue drain for chat %s because project is busy.", chat_id)
+ return
+ if self._pending_action(chat_id):
+ logger.debug("Stopping queue drain for chat %s because a pending action is unresolved.", chat_id)
+ return
+ if self._has_pending_queue_decision(chat_id):
+ logger.debug("Stopping queue drain for chat %s because a queue batch decision is pending.", chat_id)
return
last_result = self._last_run_results.pop(chat_id, None)
if self._run_result_was_aborted(last_result) and self._has_pending_queue_files(chat_id):
@@ -256,6 +334,7 @@ async def _drain_chat_message_queue(self, chat_id: int, context: ContextTypes.DE
self._chat_processing_queue_files.pop(chat_id, None)
queue_file, queued_messages = self._dequeue_chat_message_file(chat_id)
if queue_file is None or not queued_messages:
+ logger.debug("No queued messages remain for chat %s.", chat_id)
if chat_id not in self._chat_processing_queue_files and chat_id not in self._chat_message_queue_files:
self._chat_queue_batch_modes.pop(chat_id, None)
self._chat_next_queue_file_index.pop(chat_id, None)
diff --git a/src/coding_agent_telegram/router/session_lifecycle_commands.py b/src/coding_agent_telegram/router/session_lifecycle_commands.py
index 92dde64..bcb7237 100644
--- a/src/coding_agent_telegram/router/session_lifecycle_commands.py
+++ b/src/coding_agent_telegram/router/session_lifecycle_commands.py
@@ -214,8 +214,15 @@ async def _continue_pending_action(self, update: Update, context: ContextTypes.D
return False
if not await self._ensure_active_session_ready_for_run(update, context):
return False
- self._store_pending_action(chat_id, None)
- self._last_run_results[chat_id] = await self.runtime.run_active_session(update, context, user_message=user_message)
+ try:
+ self._last_run_results[chat_id] = await self.runtime.run_active_session(
+ update,
+ context,
+ user_message=user_message,
+ suppress_working_notice=bool(pending_action.get("suppress_working_notice")),
+ )
+ finally:
+ self._store_pending_action(chat_id, None)
return True
self._store_pending_action(chat_id, None)
diff --git a/src/coding_agent_telegram/session_runtime.py b/src/coding_agent_telegram/session_runtime.py
index e5eea62..bab9478 100644
--- a/src/coding_agent_telegram/session_runtime.py
+++ b/src/coding_agent_telegram/session_runtime.py
@@ -58,6 +58,11 @@
_ABSOLUTE_PATH_RE = re.compile(r"(?:^|(?<=\s)|(?<=[\"'(]))((?:/[^\s\"',;)]+)+|[A-Za-z]:\\[^\s\"',;)]+)")
+def _reply_to_message_id(update: Update) -> int | None:
+ message = getattr(update, "message", None)
+ return getattr(message, "message_id", None)
+
+
def _load_secret_scrub_patterns() -> tuple[tuple[re.Pattern[str], str], ...]:
resource = importlib.resources.files("coding_agent_telegram").joinpath("resources/secret_scrub_patterns.properties")
compiled: list[tuple[re.Pattern[str], str]] = []
@@ -183,6 +188,11 @@ def _locale(self, update: Update | None) -> str:
def _t(self, update: Update | None, key: str, **kwargs) -> str:
return translate(self._locale(update), key, **kwargs)
+ def _take_reply_to_message_id(self, reply_state: dict[str, int | None]) -> int | None:
+ reply_to_message_id = reply_state.get("reply_to_message_id")
+ reply_state["reply_to_message_id"] = None
+ return reply_to_message_id
+
def _next_rotated_session_name(self, chat_id: int, base_name: str) -> str:
existing = {
data.get("name", "").strip().lower()
@@ -234,6 +244,7 @@ async def run_active_session(
*,
user_message: str,
image_paths: Sequence[Path] = (),
+ suppress_working_notice: bool = False,
) -> AgentRunResult | None:
chat_id = update.effective_chat.id
active_id, session, project_path = await self._active_session_or_notify(update, context)
@@ -264,7 +275,14 @@ async def run_active_session(
max_text_file_bytes=self.cfg.snapshot_text_file_max_bytes,
)
before = set(changed_files(project_path))
- await send_text(update, context, self._t(update, "runtime.working_on_it"))
+ reply_to_message_id = _reply_to_message_id(update)
+ if not suppress_working_notice:
+ await send_text(
+ update,
+ context,
+ self._t(update, "runtime.working_on_it"),
+ reply_to_message_id=reply_to_message_id,
+ )
result = await self.run_with_typing(
update,
context,
@@ -366,6 +384,7 @@ async def run_active_session(
result=result,
before_snapshot=before_snapshot,
before=before,
+ reply_to_message_id=reply_to_message_id,
)
return result
@@ -589,6 +608,7 @@ async def _send_run_results(
result,
before_snapshot: dict[str, str | None],
before: set[str],
+ reply_to_message_id: int | None,
) -> None:
after_snapshot = snapshot_project_files(
project_path,
@@ -603,8 +623,15 @@ async def _send_run_results(
for file_diff in collect_snapshot_diffs(before_snapshot, after_snapshot, files)
}
diffs = self._merge_snapshot_diffs(diffs, snapshot_diffs_by_path)
+ reply_state = {"reply_to_message_id": reply_to_message_id}
- await self._send_assistant_chunks(update, context, result.assistant_text, provider=provider)
+ await self._send_assistant_chunks(
+ update,
+ context,
+ result.assistant_text,
+ provider=provider,
+ reply_state=reply_state,
+ )
logger.info(
"Completed run for chat %s on session '%s' (%s); %d changed file(s).",
update.effective_chat.id,
@@ -622,8 +649,9 @@ async def _send_run_results(
branch_name=branch_name or None,
locale=self._locale(update),
),
+ reply_to_message_id=self._take_reply_to_message_id(reply_state),
)
- await self._send_diffs(update, context, diffs)
+ await self._send_diffs(update, context, diffs, reply_state=reply_state)
def _merge_snapshot_diffs(self, diffs, snapshot_diffs_by_path):
if not snapshot_diffs_by_path:
@@ -650,6 +678,7 @@ async def _send_assistant_chunks(
assistant_text: str,
*,
provider: str,
+ reply_state: dict[str, int | None],
) -> None:
if self.cfg.enable_secret_scrub_filter:
assistant_text = _scrub_secrets(assistant_text)
@@ -666,6 +695,7 @@ async def _send_assistant_chunks(
f"{segment.header} ({index}/{total})",
segment.text,
language=segment.language,
+ reply_to_message_id=self._take_reply_to_message_id(reply_state),
)
continue
@@ -682,7 +712,12 @@ async def _send_assistant_chunks(
)
)
for message in self._chunk_assistant_prose(title_prefix, segment.text):
- await send_html_text(update, context, message)
+ await send_html_text(
+ update,
+ context,
+ message,
+ reply_to_message_id=self._take_reply_to_message_id(reply_state),
+ )
def _chunk_assistant_prose(self, title_prefix: str, text: str) -> list[str]:
normalized = text.strip()
@@ -726,10 +761,22 @@ def _split_assistant_body(self, body: str) -> tuple[str, str]:
left = body[:-1].rstrip() or body[:1]
return left, right
- async def _send_diffs(self, update: Update, context: ContextTypes.DEFAULT_TYPE, diffs) -> None:
+ async def _send_diffs(
+ self,
+ update: Update,
+ context: ContextTypes.DEFAULT_TYPE,
+ diffs,
+ *,
+ reply_state: dict[str, int | None],
+ ) -> None:
for file_diff in diffs:
if self.cfg.enable_sensitive_diff_filter and is_sensitive_path(file_diff.path):
- await send_text(update, context, self._t(update, "runtime.sensitive_diff_omitted", path=file_diff.path))
+ await send_text(
+ update,
+ context,
+ self._t(update, "runtime.sensitive_diff_omitted", path=file_diff.path),
+ reply_to_message_id=self._take_reply_to_message_id(reply_state),
+ )
continue
for chunk in chunk_fenced_diff(
file_diff.path,
@@ -737,4 +784,11 @@ async def _send_diffs(self, update: Update, context: ContextTypes.DEFAULT_TYPE,
self.cfg.max_telegram_message_length,
locale=self._locale(update),
):
- await send_code_block(update, context, chunk.header, chunk.code, language=chunk.language)
+ await send_code_block(
+ update,
+ context,
+ chunk.header,
+ chunk.code,
+ language=chunk.language,
+ reply_to_message_id=self._take_reply_to_message_id(reply_state),
+ )
diff --git a/src/coding_agent_telegram/speech_to_text.py b/src/coding_agent_telegram/speech_to_text.py
new file mode 100644
index 0000000..6c55105
--- /dev/null
+++ b/src/coding_agent_telegram/speech_to_text.py
@@ -0,0 +1,140 @@
+from __future__ import annotations
+
+import json
+import logging
+import os
+import subprocess
+import sys
+import tempfile
+from dataclasses import dataclass
+from pathlib import Path
+
+from coding_agent_telegram.config import AppConfig, DEFAULT_OPENAI_WHISPER_MODEL
+
+
+logger = logging.getLogger(__name__)
+_MODEL_CACHE_FILENAMES = {
+ "tiny": "tiny.pt",
+ "tiny.en": "tiny.en.pt",
+ "base": "base.pt",
+ "base.en": "base.en.pt",
+ "small": "small.pt",
+ "small.en": "small.en.pt",
+ "medium": "medium.pt",
+ "medium.en": "medium.en.pt",
+ "large": "large-v3.pt",
+ "large-v1": "large-v1.pt",
+ "large-v2": "large-v2.pt",
+ "large-v3": "large-v3.pt",
+ "large-v3-turbo": "large-v3-turbo.pt",
+ "turbo": "large-v3-turbo.pt",
+}
+
+
+class SpeechToTextError(RuntimeError):
+ def __init__(self, code: str, *, likely_first_download: bool = False, detail: str | None = None) -> None:
+ super().__init__(code)
+ self.code = code
+ self.likely_first_download = likely_first_download
+ self.detail = detail
+
+
+@dataclass(frozen=True)
+class SpeechToTextResult:
+ text: str
+ model: str
+
+
+class WhisperSpeechToText:
+ def __init__(self, cfg: AppConfig) -> None:
+ self.enabled = cfg.enable_openai_whisper_speech_to_text
+ self.model = cfg.openai_whisper_model or DEFAULT_OPENAI_WHISPER_MODEL
+ self.timeout_seconds = cfg.openai_whisper_timeout_seconds
+
+ def _model_cache_path(self) -> Path:
+ cache_root = Path(os.getenv("XDG_CACHE_HOME", Path.home() / ".cache")).expanduser()
+ file_name = _MODEL_CACHE_FILENAMES.get(self.model, f"{self.model}.pt")
+ return cache_root / "whisper" / file_name
+
+ def _likely_first_download(self) -> bool:
+ return not self._model_cache_path().exists()
+
+ def _summarize_process_output(self, result: subprocess.CompletedProcess[str]) -> str:
+ parts: list[str] = [f"whisper exited with status {result.returncode}"]
+ stderr = (result.stderr or "").strip()
+ stdout = (result.stdout or "").strip()
+ if stderr:
+ parts.append(f"stderr: {stderr[:500]}")
+ if stdout:
+ parts.append(f"stdout: {stdout[:500]}")
+ return "; ".join(parts)
+
+ def transcribe_file(self, audio_path: Path) -> SpeechToTextResult:
+ likely_first_download = self._likely_first_download()
+
+ with tempfile.TemporaryDirectory(prefix="coding-agent-telegram-whisper-") as output_dir:
+ command = [
+ sys.executable,
+ "-m",
+ "whisper",
+ str(audio_path),
+ "--model",
+ self.model,
+ "--task",
+ "transcribe",
+ "--output_format",
+ "json",
+ "--output_dir",
+ output_dir,
+ "--verbose",
+ "False",
+ "--fp16",
+ "False",
+ "--condition_on_previous_text",
+ "False",
+ ]
+ try:
+ result = subprocess.run(
+ command,
+ check=False,
+ capture_output=True,
+ text=True,
+ timeout=self.timeout_seconds,
+ )
+ except subprocess.TimeoutExpired as exc:
+ raise SpeechToTextError(
+ "timeout",
+ likely_first_download=likely_first_download,
+ detail=f"whisper timed out after {self.timeout_seconds} seconds",
+ ) from exc
+
+ if result.returncode != 0:
+ detail = self._summarize_process_output(result)
+ logger.warning("Whisper transcription failed for %s using model %s: %s", audio_path, self.model, detail)
+ raise SpeechToTextError("failed", likely_first_download=likely_first_download, detail=detail)
+
+ transcript_path = Path(output_dir) / f"{audio_path.stem}.json"
+ if not transcript_path.exists():
+ raise SpeechToTextError(
+ "failed",
+ likely_first_download=likely_first_download,
+ detail=f"whisper finished without writing transcript json for {audio_path.name}",
+ )
+
+ try:
+ payload = json.loads(transcript_path.read_text(encoding="utf-8"))
+ except (OSError, json.JSONDecodeError) as exc:
+ raise SpeechToTextError(
+ "failed",
+ likely_first_download=likely_first_download,
+ detail=f"failed to parse whisper transcript json for {audio_path.name}: {exc}",
+ ) from exc
+
+ text = str(payload.get("text") or "").strip()
+ if not text:
+ raise SpeechToTextError(
+ "empty",
+ likely_first_download=likely_first_download,
+ detail=f"whisper returned an empty transcript for {audio_path.name}",
+ )
+ return SpeechToTextResult(text=text, model=self.model)
diff --git a/src/coding_agent_telegram/stt_setup.py b/src/coding_agent_telegram/stt_setup.py
new file mode 100644
index 0000000..90867be
--- /dev/null
+++ b/src/coding_agent_telegram/stt_setup.py
@@ -0,0 +1,303 @@
+from __future__ import annotations
+
+import argparse
+import importlib
+import importlib.util
+import os
+import shutil
+import subprocess
+import sys
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Optional
+
+from coding_agent_telegram.config import (
+ DEFAULT_OPENAI_WHISPER_MODEL,
+ DEFAULT_OPENAI_WHISPER_TIMEOUT_SECONDS,
+ create_initial_env_file,
+ resolve_env_file_path,
+)
+
+
+ENABLE_STT_ENV = "ENABLE_OPENAI_WHISPER_SPEECH_TO_TEXT"
+STT_INSTALL_HINT_ENV = "CODING_AGENT_TELEGRAM_STT_INSTALL_HINT"
+STT_SIZE_GUIDANCE = (
+ "Estimated local footprint: openai-whisper package about 50 MB, ffmpeg about 50 MB, "
+ "and Whisper model downloads vary by model size "
+ "(tiny about 72 MB, base about 139 MB, large-v3-turbo about 1.5 GB)."
+)
+
+
+@dataclass(frozen=True)
+class SttPrereqStatus:
+ ffmpeg: bool
+ whisper_module: bool
+
+ @property
+ def missing(self) -> list[str]:
+ missing: list[str] = []
+ if not self.ffmpeg:
+ missing.append("ffmpeg")
+ if not self.whisper_module:
+ missing.append("openai-whisper (Python module)")
+ return missing
+
+ @property
+ def ready(self) -> bool:
+ return not self.missing
+
+
+def _has_whisper_module(python_bin: str | None = None) -> bool:
+ if python_bin is None:
+ return importlib.util.find_spec("whisper") is not None
+ result = subprocess.run(
+ [python_bin, "-c", "import importlib.util, sys; raise SystemExit(0 if importlib.util.find_spec('whisper') else 1)"],
+ check=False,
+ capture_output=True,
+ text=True,
+ )
+ return result.returncode == 0
+
+
+def detect_stt_prereqs(*, python_bin: str | None = None) -> SttPrereqStatus:
+ importlib.invalidate_caches()
+ return SttPrereqStatus(
+ ffmpeg=shutil.which("ffmpeg") is not None,
+ whisper_module=_has_whisper_module(python_bin),
+ )
+
+
+def ensure_stt_runtime_or_exit(enabled: bool, *, install_hint: Optional[str] = None) -> None:
+ if not enabled:
+ return
+
+ status = detect_stt_prereqs()
+ if status.ready:
+ return
+
+ resolved_hint = (install_hint or os.getenv(STT_INSTALL_HINT_ENV, "")).strip() or "coding-agent-telegram-stt-install"
+ missing_text = ", ".join(status.missing)
+ raise SystemExit(
+ "\n".join(
+ [
+ f"Error: {ENABLE_STT_ENV}=true but speech-to-text prerequisites are missing: {missing_text}",
+ f"Run: {resolved_hint}",
+ STT_SIZE_GUIDANCE,
+ ]
+ )
+ )
+
+
+def _resolve_env_path(explicit: str | None = None) -> Path:
+ env_path = resolve_env_file_path(Path(explicit).expanduser() if explicit else None)
+ env_path.parent.mkdir(parents=True, exist_ok=True)
+ if not env_path.exists():
+ create_initial_env_file(env_path)
+ return env_path
+
+
+def _set_env_flag(env_path: Path, enabled: bool) -> None:
+ lines = []
+ if env_path.exists():
+ lines = env_path.read_text(encoding="utf-8").splitlines()
+
+ def upsert(key: str, value: str, comments: list[str] | None = None) -> None:
+ replacement = f"{key}={value}"
+ for index, line in enumerate(lines):
+ if line.startswith(f"{key}="):
+ lines[index] = replacement
+ return
+ if lines and lines[-1].strip():
+ lines.append("")
+ if comments:
+ lines.extend(comments)
+ lines.append(replacement)
+
+ upsert(
+ ENABLE_STT_ENV,
+ "true" if enabled else "false",
+ comments=[
+ "# If true, enable Telegram voice-message speech-to-text with local openai-whisper.",
+ "# Estimated local footprint: package ~50 MB, ffmpeg ~50 MB, model downloads vary by model size.",
+ ],
+ )
+ upsert(
+ "OPENAI_WHISPER_MODEL",
+ DEFAULT_OPENAI_WHISPER_MODEL,
+ comments=[
+ "# Whisper model name for Telegram voice-message speech-to-text.",
+ "# `turbo` downloads the large-v3-turbo model (~1.5 GB) on first use into ~/.cache/whisper.",
+ "# If turbo is not cached yet, the first voice transcription is more likely to hit the timeout.",
+ ],
+ )
+ upsert(
+ "OPENAI_WHISPER_TIMEOUT_SECONDS",
+ str(DEFAULT_OPENAI_WHISPER_TIMEOUT_SECONDS),
+ comments=["# Timeout for a single Whisper transcription call, in seconds."],
+ )
+
+ env_path.write_text("\n".join(lines) + "\n", encoding="utf-8")
+
+
+def _prompt_yes_no(prompt: str, *, default: bool = True) -> bool:
+ suffix = "[Y/n]" if default else "[y/N]"
+ while True:
+ try:
+ answer = input(f"{prompt} {suffix} ").strip().lower()
+ except EOFError:
+ return default
+ if not answer:
+ return default
+ if answer in {"y", "yes"}:
+ return True
+ if answer in {"n", "no"}:
+ return False
+ print("Please answer yes or no.")
+
+
+def _package_manager() -> tuple[str, list[str]] | tuple[None, None]:
+ if sys.platform == "darwin" and shutil.which("brew"):
+ return "brew", ["brew", "install", "ffmpeg"]
+ if sys.platform.startswith("linux"):
+ if shutil.which("apt-get"):
+ prefix = ["sudo"] if hasattr(os, "geteuid") and os.geteuid() != 0 and shutil.which("sudo") else []
+ return "apt-get", [*prefix, "apt-get", "update", "&&", *prefix, "apt-get", "install", "-y", "ffmpeg"]
+ if shutil.which("dnf"):
+ prefix = ["sudo"] if hasattr(os, "geteuid") and os.geteuid() != 0 and shutil.which("sudo") else []
+ return "dnf", [*prefix, "dnf", "install", "-y", "ffmpeg"]
+ if shutil.which("yum"):
+ prefix = ["sudo"] if hasattr(os, "geteuid") and os.geteuid() != 0 and shutil.which("sudo") else []
+ return "yum", [*prefix, "yum", "install", "-y", "ffmpeg"]
+ return None, None
+
+
+def _run_shell_command(command: str) -> bool:
+ print(f"Running: {command}")
+ result = subprocess.run(command, shell=True, check=False)
+ return result.returncode == 0
+
+
+def _ensure_ffmpeg_installed() -> bool:
+ while True:
+ status = detect_stt_prereqs()
+ if status.ffmpeg:
+ return True
+
+ print("Missing required system binary: ffmpeg")
+
+ manager, command_parts = _package_manager()
+ if manager == "apt-get":
+ install_command = " ".join(command_parts)
+ elif command_parts is not None:
+ install_command = " ".join(command_parts)
+ else:
+ install_command = ""
+
+ if install_command:
+ if not _prompt_yes_no(f"Install ffmpeg now using {manager}?"):
+ return False
+ if _run_shell_command(install_command):
+ continue
+ print("Automatic ffmpeg installation did not complete successfully.")
+ if not _prompt_yes_no("Retry ffmpeg installation?"):
+ return False
+ continue
+
+ print("Automatic ffmpeg installation is not available on this OS/package-manager combination.")
+ print("Install ffmpeg manually, then return here and choose continue.")
+ if not _prompt_yes_no("Continue after manual installation?", default=False):
+ return False
+
+
+def _ensure_whisper_installed(python_bin: str) -> bool:
+ while True:
+ status = detect_stt_prereqs(python_bin=python_bin)
+ if status.whisper_module:
+ return True
+
+ print("Missing required Python package: openai-whisper")
+ if not _prompt_yes_no(f"Install openai-whisper with {python_bin} -m pip?"):
+ return False
+ command = f"{python_bin} -m pip install --upgrade openai-whisper"
+ if _run_shell_command(command):
+ continue
+ print("openai-whisper installation did not complete successfully.")
+ if not _prompt_yes_no("Retry openai-whisper installation?"):
+ return False
+
+
+def install_stt_dependencies(*, env_file: str | None = None, python_bin: str | None = None) -> int:
+ env_path = _resolve_env_path(env_file)
+ resolved_python = python_bin or sys.executable
+
+ print(STT_SIZE_GUIDANCE)
+ print(f"Using env file: {env_path}")
+
+ if not _ensure_ffmpeg_installed():
+ print("Speech-to-text installation aborted before ffmpeg prerequisites were satisfied.")
+ return 1
+ if not _ensure_whisper_installed(resolved_python):
+ print("Speech-to-text installation aborted before openai-whisper was installed.")
+ return 1
+
+ _set_env_flag(env_path, True)
+ print(f"Speech-to-text prerequisites are ready. Enabled {ENABLE_STT_ENV}=true in {env_path}.")
+ return 0
+
+
+def offer_stt_install_for_new_env(
+ *,
+ env_file: str | None = None,
+ python_bin: str | None = None,
+ installer_label: str,
+) -> int:
+ env_path = _resolve_env_path(env_file)
+ print("A new env file was created for coding-agent-telegram.")
+ print(STT_SIZE_GUIDANCE)
+ if not _prompt_yes_no(
+ f"Do you want to enable local Whisper speech-to-text now? This will run {installer_label}.",
+ default=False,
+ ):
+ print(f"Keeping {ENABLE_STT_ENV}=false in {env_path}.")
+ return 0
+
+ result = install_stt_dependencies(env_file=str(env_path), python_bin=python_bin)
+ if result != 0:
+ print(f"Speech-to-text setup did not complete. Keeping {ENABLE_STT_ENV}=false unless you enable it later.")
+ _set_env_flag(env_path, False)
+ return 0
+ return 0
+
+
+def main(argv: Optional[list[str]] = None) -> int:
+ if argv is None:
+ argv = sys.argv[1:]
+ if not argv:
+ argv = ["install"]
+
+ parser = argparse.ArgumentParser(description="Install or validate local Whisper speech-to-text support.")
+ subparsers = parser.add_subparsers(dest="command", required=True)
+
+ install_parser = subparsers.add_parser("install", help="Install missing speech-to-text prerequisites.")
+ install_parser.add_argument("--env-file", help="Explicit env file path to update.")
+ install_parser.add_argument("--python-bin", help="Python executable to use for pip installation.")
+ offer_parser = subparsers.add_parser("offer", help="Prompt whether to enable speech-to-text for a new env file.")
+ offer_parser.add_argument("--env-file", help="Explicit env file path to update.")
+ offer_parser.add_argument("--python-bin", help="Python executable to use for pip installation.")
+ offer_parser.add_argument("--installer-label", required=True, help="User-facing installer command label.")
+
+ args = parser.parse_args(argv)
+
+ if args.command == "install":
+ return install_stt_dependencies(env_file=args.env_file, python_bin=args.python_bin)
+ if args.command == "offer":
+ return offer_stt_install_for_new_env(
+ env_file=args.env_file,
+ python_bin=args.python_bin,
+ installer_label=args.installer_label,
+ )
+ return 1
+
+
+if __name__ == "__main__":
+ raise SystemExit(main())
diff --git a/src/coding_agent_telegram/telegram_sender.py b/src/coding_agent_telegram/telegram_sender.py
index e8a243b..aa0d5a2 100644
--- a/src/coding_agent_telegram/telegram_sender.py
+++ b/src/coding_agent_telegram/telegram_sender.py
@@ -1,6 +1,7 @@
from __future__ import annotations
import html
+import logging
import re
from dataclasses import dataclass
from typing import Optional
@@ -56,6 +57,7 @@
)
SHELL_LANGUAGES = {"bash", "console", "shell", "sh", "zsh"}
DEFAULT_TELEGRAM_MESSAGE_LENGTH = 3000
+logger = logging.getLogger(__name__)
@dataclass(frozen=True)
@@ -75,45 +77,92 @@ def _max_telegram_message_length(context: ContextTypes.DEFAULT_TYPE) -> int:
return DEFAULT_TELEGRAM_MESSAGE_LENGTH
-async def send_text(update: Update, context: ContextTypes.DEFAULT_TYPE, text: str) -> None:
+def _default_reply_to_message_id(update: Update, explicit_reply_to_message_id: Optional[int] = None) -> Optional[int]:
+ return explicit_reply_to_message_id
+
+
+async def send_text(
+ update: Update,
+ context: ContextTypes.DEFAULT_TYPE,
+ text: str,
+ *,
+ reply_to_message_id: Optional[int] = None,
+) -> None:
if update.effective_chat is None:
return
max_length = _max_telegram_message_length(context)
- for chunk in _split_text_chunks(text, max_length=max_length):
+ resolved_reply_to_message_id = _default_reply_to_message_id(update, reply_to_message_id)
+ chunks = _split_text_chunks(text, max_length=max_length)
+ logger.debug(
+ "Sending Telegram text message chat=%s chunks=%s reply_to_message_id=%s preview=%.120r",
+ update.effective_chat.id,
+ len(chunks),
+ resolved_reply_to_message_id,
+ text,
+ )
+ for index, chunk in enumerate(chunks):
await context.bot.send_message(
chat_id=update.effective_chat.id,
text=html.escape(chunk),
parse_mode=ParseMode.HTML,
+ reply_to_message_id=resolved_reply_to_message_id if index == 0 else None,
)
-async def send_markdown_text(update: Update, context: ContextTypes.DEFAULT_TYPE, text: str) -> None:
+async def send_markdown_text(
+ update: Update,
+ context: ContextTypes.DEFAULT_TYPE,
+ text: str,
+ *,
+ reply_to_message_id: Optional[int] = None,
+) -> None:
if update.effective_chat is None:
return
+ logger.debug(
+ "Sending Telegram markdown message chat=%s reply_to_message_id=%s preview=%.120r",
+ update.effective_chat.id,
+ reply_to_message_id,
+ text,
+ )
await context.bot.send_message(
chat_id=update.effective_chat.id,
text=text,
parse_mode=ParseMode.MARKDOWN,
+ reply_to_message_id=_default_reply_to_message_id(update, reply_to_message_id),
)
-async def send_html_text(update: Update, context: ContextTypes.DEFAULT_TYPE, text: str) -> None:
+async def send_html_text(
+ update: Update,
+ context: ContextTypes.DEFAULT_TYPE,
+ text: str,
+ *,
+ reply_to_message_id: Optional[int] = None,
+) -> None:
if update.effective_chat is None:
return
max_length = _max_telegram_message_length(context)
+ logger.debug(
+ "Sending Telegram HTML message chat=%s reply_to_message_id=%s length=%s preview=%.120r",
+ update.effective_chat.id,
+ reply_to_message_id,
+ len(text),
+ text,
+ )
if len(text) > max_length:
- await send_text(update, context, _strip_html_tags(text))
+ await send_text(update, context, _strip_html_tags(text), reply_to_message_id=reply_to_message_id)
return
try:
await context.bot.send_message(
chat_id=update.effective_chat.id,
text=text,
parse_mode=ParseMode.HTML,
+ reply_to_message_id=_default_reply_to_message_id(update, reply_to_message_id),
)
except BadRequest as exc:
if "Can't parse entities" not in str(exc):
raise
- await send_text(update, context, _strip_html_tags(text))
+ await send_text(update, context, _strip_html_tags(text), reply_to_message_id=reply_to_message_id)
def markdownish_to_html(text: str) -> str:
@@ -276,12 +325,22 @@ async def send_code_block(
code: str,
*,
language: Optional[str] = None,
+ reply_to_message_id: Optional[int] = None,
) -> None:
if update.effective_chat is None:
return
max_length = _max_telegram_message_length(context)
chunks = _split_code_chunks(code, language, max_length=max_length)
total = len(chunks)
+ resolved_reply_to_message_id = _default_reply_to_message_id(update, reply_to_message_id)
+ logger.debug(
+ "Sending Telegram code block chat=%s header=%r chunks=%s reply_to_message_id=%s language=%r",
+ update.effective_chat.id,
+ header,
+ total,
+ resolved_reply_to_message_id,
+ language,
+ )
for index, chunk in enumerate(chunks, start=1):
current_header = header if total == 1 else f"{header} ({index}/{total})"
escaped_code = html.escape(chunk)
@@ -289,6 +348,7 @@ async def send_code_block(
chat_id=update.effective_chat.id,
text=html.escape(current_header),
parse_mode=ParseMode.HTML,
+ reply_to_message_id=resolved_reply_to_message_id if index == 1 else None,
)
if language:
text = f"