channels/telegram: propagate send failures and cache terminal reaction errors#1100
Open
Streamweaver wants to merge 2 commits intoRightNow-AI:mainfrom
Open
channels/telegram: propagate send failures and cache terminal reaction errors#1100Streamweaver wants to merge 2 commits intoRightNow-AI:mainfrom
Streamweaver wants to merge 2 commits intoRightNow-AI:mainfrom
Conversation
The six outbound helpers in the Telegram adapter (sendMessage, sendPhoto, sendDocument, sendDocument_upload, sendVoice, sendLocation) previously logged a `warn!` on HTTP non-success and still returned `Ok(())`. Callers interpreted that as successful delivery and told the agent "Message sent" even when Telegram had rejected the request (e.g. 400 Bad Request from malformed HTML entities with parse_mode=HTML). The agent recorded phantom success in its session history, corrupting subsequent behavior. The fix returns `Err(format!(...).into())` on HTTP non-success in all six helpers, matching the error-handling convention documented in CONTRIBUTING.md. `api_send_message` is slightly different because it splits long messages into chunks via `split_message(4096)`. Naively returning `Err` on any chunk failure would create a partial-delivery-then-error regression — worse than the original silent success. The function now tracks `delivered_any` across chunks: - First-chunk failure (nothing delivered yet) → return `Err` to surface the failure. This is where the motivating HTML-parse-error bug lives, so the fix is fully effective. - Subsequent-chunk failure (user already received preceding chunks) → log `warn!` and continue with best-effort delivery, matching the convention used by every other adapter in the crate that calls `split_message` (Discord, Gitter, Mattermost, Nextcloud, Twitch, Pumble, etc.). Tests: 4 new tests using a small in-crate stub server (axum on an ephemeral port, reached via the existing `api_url` constructor seam — zero new dependencies). 41 telegram tests pass (37 existing + 4 new).
…, emoji) `fire_reaction` calls `setMessageReaction` fire-and-forget on every agent lifecycle event. When Telegram returns a terminal error like `REACTION_INVALID` (emoji not in the bot's free-reaction allowlist), `REACTION_NOT_AVAILABLE` (chat admin restricted this emoji), or `REACTION_TOO_MANY` (per-message cap), retrying on every subsequent turn is pointless log spam and wasted API quota. This adds a per-bot-instance `HashSet<(i64, String)>` keyed by `(chat_id, emoji)` that records terminal rejections and short-circuits future calls for the same pair. Keyed by chat, not just emoji, because `Chat.available_reactions` varies across chats and is admin-mutable (https://core.telegram.org/bots/api#setmessagereaction) — an emoji rejected in chat A may still be valid in chat B. Cache is per-process; on restart it rebuilds naturally, which handles any runtime allowlist change without needing persistence. The terminal-error match uses a small private helper `is_terminal_reaction_error` that substring-matches the three permanent errors. Transient errors (429, 5xx, `MESSAGE_NOT_MODIFIED`, unrelated 400s) are deliberately NOT cached. Concurrency: the cache uses `std::sync::Mutex` — critical section is two `HashSet` ops (contains + insert), never held across `.await`. Endorsed by the Tokio shared-state tutorial (https://tokio.rs/tokio/tutorial/shared-state) for exactly this shape. Two concurrent `fire_reaction` calls for the same (chat, emoji) can both pass the cache check before either rejection lands, producing up to N duplicate API calls on the first rejection; the duplicate `insert` is idempotent so this is benign and self-limits on the second turn. Documented in-code. Tests: 6 new tests covering terminal-error matching, cache insertion, per-chat key isolation, and non-caching of transient and successful responses. Total 47 telegram tests pass (41 existing + 6 new). No new clippy warnings.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Telegram adapter currently masks two classes of real failures as success.
Send path —
api_send_messageand its five sibling helpers (api_send_photo,api_send_document,api_send_document_upload,api_send_voice,api_send_location) log awarn!on HTTP non-success and returnOk(()). Callers (kernel::send_channel_message→tool_channel_send) interpret that as successful delivery and report"Message sent to <id> via telegram"back to the agent. The user receives nothing; the agent records phantom success in its session history and acts on it in future turns.Reaction path —
fire_reactionretriessetMessageReactionwith the same emoji on every turn even when Telegram returns terminal errors likeREACTION_INVALID(emoji not in the free-reaction allowlist),REACTION_NOT_AVAILABLE(chat-admin restriction), orREACTION_TOO_MANY(per-message cap). Pure log spam plus wasted API calls on every agent turn.This PR fixes both, adds regression tests, and brings the adapter into line with the crate's error-handling convention documented in
CONTRIBUTING.md(lines 116, 239–247).Evidence
Real silent-failure log line from production:
The agent loop that produced this warning reported success to its calling tool. The message was never delivered. The agent's session history recorded "Message sent," which then corrupted future behavior.
Reaction spam from the same session, every turn forever:
Changes
Commit 1 — propagate send failures from
api_send_*crates/openfang-channels/src/telegram.rs:api_send_photo,api_send_document,api_send_document_upload,api_send_voice,api_send_locationnow returnErr(format!(...).into())on HTTP non-success in addition to the existingwarn!.api_send_messagesplits long text viasplit_message(4096). Naively returningErron any chunk failure would introduce a "partial-delivery-then-error" regression — worse than the original silent success. The function now tracksdelivered_anyacross chunks and:Erron first-chunk failure (nothing delivered yet; surface the failure — this is where the motivating HTML-parse-error bug lives).warn!and continues on subsequent-chunk failure (user already received preceding chunks; best-effort completes the send).split_message(Discord, Gitter, Mattermost, Nextcloud, Twitch, Pumble).Commit 2 — cache terminal
setMessageReactionerrors per(chat, emoji)TelegramAdapterfield:rejected_reactions: Arc<Mutex<HashSet<(i64, String)>>>. Usesstd::sync::Mutex— critical section is twoHashSetops (contains + insert), never held across.await. Endorsed by the Tokio shared-state tutorial for this exact shape.is_terminal_reaction_error(body_text: &str) -> boolmatchesREACTION_INVALID | REACTION_NOT_AVAILABLE | REACTION_TOO_MANY. Transient errors (429, 5xx,MESSAGE_NOT_MODIFIED, unrelated 400s) are NOT cached.fire_reactionshort-circuits early if(chat_id, emoji)is cached; inserts on terminal-error response. Key-by-chat honorsChat.available_reactionsvarying per chat and being admin-mutable (API ref). Cache is per-process — rebuilds naturally on restart, which handles any runtime allowlist change without needing persistence.fire_reactioncalls for the same(chat, emoji)both pass the cache check before either rejection lands, producing up to N duplicate API calls on the first rejection. The duplicateinsertis idempotent so this is benign and self-limits on the second turn; documented in-code.Tests
10 new tests in the same file using a small in-crate stub server (axum on an ephemeral port, reached via the existing
api_urlconstructor seam — zero new dependencies):Send path:
test_api_send_message_single_chunk_400_returns_errtest_api_send_message_single_chunk_200_returns_oktest_api_send_message_first_chunk_fail_returns_errtest_api_send_message_partial_delivery_returns_ok(multi-chunk B1 regression guard)Reaction cache:
test_is_terminal_reaction_error_matchestest_is_terminal_reaction_error_rejects_transienttest_fire_reaction_caches_on_reaction_invalidtest_fire_reaction_cache_is_per_chat(proves the per-chat key invariant)test_fire_reaction_does_not_cache_non_terminaltest_fire_reaction_does_not_cache_on_successAll 47 telegram tests pass (37 pre-existing + 10 new). No new clippy warnings on the changed file.
Compatibility
ChannelAdaptertrait signatures unchanged.Err. Every call site in the repo was audited and handlesErrcorrectly:kernel::send_channel_message(kernel/src/kernel.rs:6988-6996) — already maps via.map_err(|e| format!("Channel send failed: {e}"))?.tool_channel_send(runtime/src/tool_runner.rs:2430) — propagates to the agent as a tool-result error string.bridge::send_response_to_channel— logs and swallows (no behavior change; log line is now truthful).cron_delivery::deliver_to_target— per-target best-effort (no behavior change).bridge::spawn_typing_loop— discards withlet _(no behavior change).Follow-ups (out of scope)
discord.rs:131-135,slack.rs:127-132,feishu.rs:409-427,line.rs:494-498,revolt.rs:187-194. Happy to file individual issues if welcome.types.rs—ALLOWED_REACTION_EMOJI/default_phase_emoji) include several outside Telegram's free-reaction allowlist (⏳ ⚙️ ✅ ❌ 🔄). A per-channel emoji default set would eliminate the reaction-cache churn entirely for Telegram while keeping the current semantic set for other adapters. Behavior-defining rather than bug-fixing — belongs in a separate discussion.Test plan
Live verification on a running instance: user sent real Telegram messages, reply landed, log showed first-rejection-then-cached behavior for
⏳and✅, subsequent turns short-circuited with zero REACTION_INVALID lines and no send-path regressions.