Skip to content

Collector resolve_username without rate-limit triggers 16h flood-wait on all accounts #464

@axisrow

Description

@axisrow

Problem

Action-only pipelines (e.g. pipeline 1 «Реакции @claudecodeteam») rely on ClientPool.get_available_client() returning a live client. But all three accounts in the pool regularly end up simultaneously flood-waited for 12-16 hours, which makes every scheduled pipeline run return ${\tt result_count=0}$ and forcibly skips every action.

Observed in `data/app.log` on 2026-04-18 22:43:55 — both primary and secondary accounts got 57891s flood-wait back-to-back within minutes:

```
22:43:55 [WARNING] src.telegram.client_pool: Flood wait for +66982102247: 57891 seconds (until 2026-04-19 06:48:46 UTC)
22:43:55 [WARNING] src.telegram.collector: collect_channel_resolve_username: Flood wait 57891s until 2026-04-19T06:48:46 UTC for +66982102247
```

All three accounts (+66982102247, +66990712629, +8613392919509) were flood-locked in the same run, ruining pipeline execution for the next 12-16 hours.

Root cause

`src/telegram/collector.py:546-565` — `_collect_channel` calls `session.resolve_entity(channel.username)` for every channel in a loop. On large channel lists this blast triggers Telegram's `auth.resolveUsername` rate-limit, which is account-level and yields multi-hour flood-waits.

```python
try:
entity = await run_with_flood_wait(
session.resolve_entity(channel.username),
operation="collect_channel_resolve_username",
phone=phone,
pool=self._pool,
...
)
except HandledFloodWaitError as exc:
flood_wait_sec = exc.info.wait_seconds
raise # ← propagates up, collector keeps dispatching to other accounts
```

Because the collector keeps trying further channels on the next account after catching flood, it walks through the whole pool and poisons every account within a single cycle.

Impact

Proposed fix

  1. Rate-limit `resolve_entity` per account: cap to N calls per minute with jitter.
  2. Respect `flood_wait_until` at the lease level: `acquire_by_phone` already checks `flood_wait_until > now`, but `collect_channel_resolve_username` should abort early and reschedule the whole collect cycle instead of hammering remaining accounts.
  3. Fallback to cached entity in `channels` table where `access_hash` is present — avoid resolve entirely for known channels.
  4. Back off globally once a single account hits FloodWaitError > 300s: stop dispatching resolve to other accounts for at least `min(wait_seconds, 3600)` to avoid poisoning the whole pool.

Related

Follow-up of #463 (pipeline result observability). The observability fix now surfaces `no_available_client` node-errors, but root cause (pool-wide flood storm) needs addressing here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions