channels/telegram: propagate send failures and cache terminal reaction errors by Streamweaver · Pull Request #1100 · RightNow-AI/openfang

Streamweaver · 2026-04-21T22:05:59Z

Summary

The Telegram adapter currently masks two classes of real failures as success.

Send path — api_send_message and its five sibling helpers (api_send_photo, api_send_document, api_send_document_upload, api_send_voice, api_send_location) log a warn! on HTTP non-success and return Ok(()). Callers (kernel::send_channel_message → tool_channel_send) interpret that as successful delivery and report "Message sent to <id> via telegram" back to the agent. The user receives nothing; the agent records phantom success in its session history and acts on it in future turns.
Reaction path — fire_reaction retries setMessageReaction with the same emoji on every turn even when Telegram returns terminal errors like REACTION_INVALID (emoji not in the free-reaction allowlist), REACTION_NOT_AVAILABLE (chat-admin restriction), or REACTION_TOO_MANY (per-message cap). Pure log spam plus wasted API calls on every agent turn.

This PR fixes both, adds regression tests, and brings the adapter into line with the crate's error-handling convention documented in CONTRIBUTING.md (lines 116, 239–247).

Evidence

Real silent-failure log line from production:

WARN openfang_channels::telegram: Telegram sendMessage failed (400 Bad Request): {"ok":false,"error_code":400,"description":"Bad Request: can't parse entities: Unmatched end tag at byte offset 988, expected \"</i>\", found \"</code>\""}

The agent loop that produced this warning reported success to its calling tool. The message was never delivered. The agent's session history recorded "Message sent," which then corrupted future behavior.

Reaction spam from the same session, every turn forever:

DEBUG Telegram setMessageReaction failed: {"ok":false,"error_code":400,"description":"Bad Request: REACTION_INVALID"}
DEBUG Telegram setMessageReaction failed: {"ok":false,"error_code":400,"description":"Bad Request: REACTION_INVALID"}

Changes

Commit 1 — propagate send failures from `api_send_*`

crates/openfang-channels/src/telegram.rs:

api_send_photo, api_send_document, api_send_document_upload, api_send_voice, api_send_location now return Err(format!(...).into()) on HTTP non-success in addition to the existing warn!.
api_send_message splits long text via split_message(4096). Naively returning Err on any chunk failure would introduce a "partial-delivery-then-error" regression — worse than the original silent success. The function now tracks delivered_any across chunks and:
- Returns Err on first-chunk failure (nothing delivered yet; surface the failure — this is where the motivating HTML-parse-error bug lives).
- Logs warn! and continues on subsequent-chunk failure (user already received preceding chunks; best-effort completes the send).
- This matches the existing convention used by every other adapter in the crate that calls split_message (Discord, Gitter, Mattermost, Nextcloud, Twitch, Pumble).

Commit 2 — cache terminal `setMessageReaction` errors per `(chat, emoji)`

New TelegramAdapter field: rejected_reactions: Arc<Mutex<HashSet<(i64, String)>>>. Uses std::sync::Mutex — critical section is two HashSet ops (contains + insert), never held across .await. Endorsed by the Tokio shared-state tutorial for this exact shape.
New private helper is_terminal_reaction_error(body_text: &str) -> bool matches REACTION_INVALID | REACTION_NOT_AVAILABLE | REACTION_TOO_MANY. Transient errors (429, 5xx, MESSAGE_NOT_MODIFIED, unrelated 400s) are NOT cached.
fire_reaction short-circuits early if (chat_id, emoji) is cached; inserts on terminal-error response. Key-by-chat honors Chat.available_reactions varying per chat and being admin-mutable (API ref). Cache is per-process — rebuilds naturally on restart, which handles any runtime allowlist change without needing persistence.
A brief race is possible where two concurrent fire_reaction calls for the same (chat, emoji) both pass the cache check before either rejection lands, producing up to N duplicate API calls on the first rejection. The duplicate insert is idempotent so this is benign and self-limits on the second turn; documented in-code.

Tests

10 new tests in the same file using a small in-crate stub server (axum on an ephemeral port, reached via the existing api_url constructor seam — zero new dependencies):

Send path:

test_api_send_message_single_chunk_400_returns_err
test_api_send_message_single_chunk_200_returns_ok
test_api_send_message_first_chunk_fail_returns_err
test_api_send_message_partial_delivery_returns_ok (multi-chunk B1 regression guard)

Reaction cache:

test_is_terminal_reaction_error_matches
test_is_terminal_reaction_error_rejects_transient
test_fire_reaction_caches_on_reaction_invalid
test_fire_reaction_cache_is_per_chat (proves the per-chat key invariant)
test_fire_reaction_does_not_cache_non_terminal
test_fire_reaction_does_not_cache_on_success

All 47 telegram tests pass (37 pre-existing + 10 new). No new clippy warnings on the changed file.

Compatibility

Public API: unchanged. ChannelAdapter trait signatures unchanged.
Behavior change for legitimate callers: previously-silent send failures now surface as Err. Every call site in the repo was audited and handles Err correctly:
- kernel::send_channel_message (kernel/src/kernel.rs:6988-6996) — already maps via .map_err(|e| format!("Channel send failed: {e}"))?.
- tool_channel_send (runtime/src/tool_runner.rs:2430) — propagates to the agent as a tool-result error string.
- bridge::send_response_to_channel — logs and swallows (no behavior change; log line is now truthful).
- cron_delivery::deliver_to_target — per-target best-effort (no behavior change).
- bridge::spawn_typing_loop — discards with let _ (no behavior change).
No config, schema, or dependency changes.

Follow-ups (out of scope)

Five sibling adapters share the same log-and-Ok silent-failure pattern: discord.rs:131-135, slack.rs:127-132, feishu.rs:409-427, line.rs:494-498, revolt.rs:187-194. Happy to file individual issues if welcome.
OpenFang's default lifecycle emojis (types.rs — ALLOWED_REACTION_EMOJI / default_phase_emoji) include several outside Telegram's free-reaction allowlist (⏳ ⚙️ ✅ ❌ 🔄). A per-channel emoji default set would eliminate the reaction-cache churn entirely for Telegram while keeping the current semantic set for other adapters. Behavior-defining rather than bug-fixing — belongs in a separate discussion.

Test plan

cargo test --release -p openfang-channels -- telegram    # 47/47 pass
cargo clippy --release -p openfang-channels --lib --tests

Live verification on a running instance: user sent real Telegram messages, reply landed, log showed first-rejection-then-cached behavior for ⏳ and ✅, subsequent turns short-circuited with zero REACTION_INVALID lines and no send-path regressions.

The six outbound helpers in the Telegram adapter (sendMessage, sendPhoto, sendDocument, sendDocument_upload, sendVoice, sendLocation) previously logged a `warn!` on HTTP non-success and still returned `Ok(())`. Callers interpreted that as successful delivery and told the agent "Message sent" even when Telegram had rejected the request (e.g. 400 Bad Request from malformed HTML entities with parse_mode=HTML). The agent recorded phantom success in its session history, corrupting subsequent behavior. The fix returns `Err(format!(...).into())` on HTTP non-success in all six helpers, matching the error-handling convention documented in CONTRIBUTING.md. `api_send_message` is slightly different because it splits long messages into chunks via `split_message(4096)`. Naively returning `Err` on any chunk failure would create a partial-delivery-then-error regression — worse than the original silent success. The function now tracks `delivered_any` across chunks: - First-chunk failure (nothing delivered yet) → return `Err` to surface the failure. This is where the motivating HTML-parse-error bug lives, so the fix is fully effective. - Subsequent-chunk failure (user already received preceding chunks) → log `warn!` and continue with best-effort delivery, matching the convention used by every other adapter in the crate that calls `split_message` (Discord, Gitter, Mattermost, Nextcloud, Twitch, Pumble, etc.). Tests: 4 new tests using a small in-crate stub server (axum on an ephemeral port, reached via the existing `api_url` constructor seam — zero new dependencies). 41 telegram tests pass (37 existing + 4 new).

…, emoji) `fire_reaction` calls `setMessageReaction` fire-and-forget on every agent lifecycle event. When Telegram returns a terminal error like `REACTION_INVALID` (emoji not in the bot's free-reaction allowlist), `REACTION_NOT_AVAILABLE` (chat admin restricted this emoji), or `REACTION_TOO_MANY` (per-message cap), retrying on every subsequent turn is pointless log spam and wasted API quota. This adds a per-bot-instance `HashSet<(i64, String)>` keyed by `(chat_id, emoji)` that records terminal rejections and short-circuits future calls for the same pair. Keyed by chat, not just emoji, because `Chat.available_reactions` varies across chats and is admin-mutable (https://core.telegram.org/bots/api#setmessagereaction) — an emoji rejected in chat A may still be valid in chat B. Cache is per-process; on restart it rebuilds naturally, which handles any runtime allowlist change without needing persistence. The terminal-error match uses a small private helper `is_terminal_reaction_error` that substring-matches the three permanent errors. Transient errors (429, 5xx, `MESSAGE_NOT_MODIFIED`, unrelated 400s) are deliberately NOT cached. Concurrency: the cache uses `std::sync::Mutex` — critical section is two `HashSet` ops (contains + insert), never held across `.await`. Endorsed by the Tokio shared-state tutorial (https://tokio.rs/tokio/tutorial/shared-state) for exactly this shape. Two concurrent `fire_reaction` calls for the same (chat, emoji) can both pass the cache check before either rejection lands, producing up to N duplicate API calls on the first rejection; the duplicate `insert` is idempotent so this is benign and self-limits on the second turn. Documented in-code. Tests: 6 new tests covering terminal-error matching, cache insertion, per-chat key isolation, and non-caching of transient and successful responses. Total 47 telegram tests pass (41 existing + 6 new). No new clippy warnings.

Streamweaver added 2 commits April 21, 2026 18:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

channels/telegram: propagate send failures and cache terminal reaction errors#1100

channels/telegram: propagate send failures and cache terminal reaction errors#1100
Streamweaver wants to merge 2 commits intoRightNow-AI:mainfrom
Streamweaver:fix/telegram-silent-failures

Streamweaver commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Streamweaver commented Apr 21, 2026

Summary

Evidence

Changes

Commit 1 — propagate send failures from api_send_*

Commit 2 — cache terminal setMessageReaction errors per (chat, emoji)

Tests

Compatibility

Follow-ups (out of scope)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Commit 1 — propagate send failures from `api_send_*`

Commit 2 — cache terminal `setMessageReaction` errors per `(chat, emoji)`