feat(dispatch): turn-boundary batching dispatcher v2 by brettchien · Pull Request #686 · openabdev/openab

brettchien · 2026-05-01T15:49:40Z

Summary

Implements the turn-boundary message batching dispatcher per ADR v0.3 (docs/adr/turn-boundary-batching.md).

Changes

src/dispatch.rs (new): Dispatcher, BufferedMessage, ThreadHandle, consumer_loop, dispatch_batch, estimate_tokens, unit tests
src/config.rs: MessageProcessingMode enum, max_buffered_messages, max_batch_tokens config fields for Discord/Slack/Gateway
src/adapter.rs: AdapterRouter::pack_arrival_event uniform packing (§3.3), ChannelRef/MessageRef types
src/discord.rs: branch on message_processing_mode — per-message vs batched dispatch path
src/slack.rs: same branching; KeyedAsyncQueue replaced by Dispatcher consumer task
src/gateway.rs: same branching
src/main.rs: wire Dispatcher instances per adapter

ADR compliance

Implements Phase 1 scope from ADR §4.4: I1 zero-latency first message, I2 at-most-one-in-flight-turn, I3 broker structural fidelity, SendError eviction (§2.5), other_bot_present freshness (§2.6), batch reaction UX (§6.7), graceful shutdown (§6.8).

Testing

Unit tests in src/dispatch.rs: estimate_tokens, pack_arrival_event single/batch/extra-blocks scenarios.

https://discord.com/channels/1491295327620169908/1497977225314832536

brettchien · 2026-05-02T05:26:24Z

Status update — SendError testing approach

Added in latest amend (5b7e08c):

Extracted Dispatcher::try_evict_locked(map, key, my_generation) -> bool from submit()'s inline eviction.
Fixed the §2.5 generation mechanism: ThreadHandle.generation was hardcoded to 0, making race-safe eviction degenerate (any stale producer could remove a freshly-inserted handle). Now backed by Dispatcher.next_generation: AtomicU64, pre-fetched per submit() and consumed only on lazy insert. Wasted values are fine — generations need only be monotonic, not contiguous.
3 unit tests covering the eviction predicate: match → remove, mismatch → keep, absent → false.

Not in this PR — full SendError end-to-end recovery:
The user-visible path (submit → tx.send → SendError → try_evict_locked → adapter.add_reaction + send_message → return Err) needs a Dispatcher → DispatchTarget trait seam to mock AdapterRouter (currently a concrete struct with SessionPool + real CLI subprocesses). That refactor is wider than Phase 1 scope.

Phase 1 coverage strategy: unit-test the predicate (here), defer the integrated path to a manual staging smoke matrix entry in the ADR (PR #598). Smoke procedure draft (TBD landing decision):

SIGKILL the agent CLI subprocess for an active session.
Send a new mention into the same thread.
Verify ❌ on trigger_msg + ⚠️ dispatch consumer exited unexpectedly reply + next mention runs cleanly on a fresh consumer (logs show generation = N+1).

Decision pending: land the §6.10 smoke entry in PR #598's ADR doc, or co-locate here.

brettchien · 2026-05-02T05:31:24Z

What's in PR 686 — feature checklist

Core architecture

New per-thread Dispatcher (src/dispatch.rs, 722 lines) — bounded tokio::sync::mpsc::channel + dedicated consumer task per active thread. Replaces v0.8.2-beta.1's per-message direct dispatch and Slack's KeyedAsyncQueue (now removed).
BufferedMessage carries sender_json, prompt, extra_blocks, trigger_msg, arrived_at, estimated_tokens.
consumer_loop blocks on rx.recv() for the first message (zero added latency), then greedy try_recv drain up to max_buffered_messages or max_batch_tokens.

Three invariants enforced (ADR §1)

I1 Zero added latency on the first message after idle.
I2 At most one in-flight ACP turn per thread (consumer task serializes).
I3 Broker structural fidelity — no merge/split/reorder; each BufferedMessage becomes its own pack_arrival_event block sequence concatenated in arrival order.

Modes & config (src/config.rs)

New MessageProcessingMode enum: PerMessage (default, v0.8.2-beta.1 behaviour) / Batched (opt-in).
Per-platform on DiscordConfig / SlackConfig / GatewayConfig: message_processing_mode, max_buffered_messages (default 10), max_batch_tokens (default 24000).

Adapter wiring

Discord (src/discord.rs), Slack (src/slack.rs), Gateway (src/gateway.rs) all route through the same Dispatcher when in Batched mode.
Dispatcher instances are tracked in main.rs and shut down ahead of pool.shutdown() on SIGTERM.
Per-message mode unchanged.

Slash commands

/reset now also cancels in-flight buffered messages — merges the originally-planned standalone /cancel-all into the existing reset path. Calls Dispatcher::cancel_buffered(thread_key) before pool.reset_session(...). Reports the dropped count when non-zero (e.g. 🔄 Session reset. Dropped 3 buffered message(s). Start a new conversation!).
Discord-only; Slack has no /reset to extend.

Packing (AdapterRouter::pack_arrival_event, ADR §3.6 Scenarios A-D)

One BufferedMessage → [<sender_context> + prompt header, ...extra_blocks]. Multi-message batches concatenate sequence-preserving.
Scenario D ordering fix: voice transcript text now follows <sender_context> (was prepended in v0.8.2-beta.1) — matches the position used for image attachments.

Observability (ADR §6.6)

info_span!("dispatch", channel = %channel_id, adapter = %platform) per dispatched batch.
Five fields per span: events_per_dispatch, packed_block_count, agent_dispatch_ms, tokens_per_event, wait_ms.

Reaction UX (ADR §6.7)

👀 applied to every message in a batch before dispatch (not just the trigger).

Error handling (ADR §2.5)

tokio::sync::mpsc::error::SendError (consumer task died) → ❌ on trigger_msg + ⚠️ dispatch consumer exited unexpectedly reply + DispatchError::ConsumerDead returned.
Race-safe eviction: try_evict_locked helper + monotonic Dispatcher.next_generation: AtomicU64. Replaces previously-broken hardcoded generation: 0 (which made any stale producer able to remove a freshly-inserted handle).

Graceful shutdown (ADR §6.8)

Dispatcher::shutdown() invoked on SIGTERM ahead of pool.shutdown(). Logs buffered_lost per thread for any messages dropped before they could dispatch.

Token estimation

estimate_tokens(prompt, extra_blocks): ~4 chars/token for text, fixed 512 per image block. Coarse — guard rail, not pre-flight.

Tests (11 unit tests in src/dispatch.rs)

Token estimation: empty / text / image / oversized single message / cumulative cap.
Pack arrival event Scenarios A–D: single text, separate-message image (B), multi-author interleaved (C), voice transcript ordering (D).
try_evict_locked: generation match → remove, mismatch → keep, key absent → false.

Removed

Slack KeyedAsyncQueue and its acquire() call sites in app_mention / message handlers — per-thread serialization moves to the Dispatcher consumer task.

Deferred to follow-up PRs

Async integration tests for consumer_loop (mid-turn arrival, fragmentation, buffer-full parking) — needs a Dispatcher → DispatchTarget trait seam that's wider than Phase 1 scope.
Full SendError end-to-end coverage — same trait-seam blocker. Manual staging smoke procedure being added to the ADR (PR docs: add ADR for turn-boundary message batching #598).
Cross-agent Scenario D runtime smoke (ADR §6.9) — needs per-agent API credentials not present in CI.

brettchien · 2026-05-02T05:34:29Z

ADR Phase 1 (§4.4) — item-by-item checklist

Mapping each deliverable in ADR §4.4 to the code in 16bed85. Legend: ✅ done · ⚠️ done with caveat · ❌ deferred.

Mechanism deliverables

Tests required by §4.4

Tally

Mechanism: 17/19 ✅, 1 ⚠️ (/cancel-all intentionally folded into /reset; max_buffered_messages multi-bot 30 default left to per-channel config override / Phase 2), 1 ❌ (cross-agent smoke).
Tests: 5/15 ✅, 3 ⚠️ (packing-side proxies for full consumer-loop coverage), 7 ❌ (consumer-loop integration / freshness behaviour / queued-reaction — all blocked by the same Dispatcher → DispatchTarget trait seam).

Recommended follow-ups (separate PRs)

Trait seam for Dispatcher (introduce DispatchTarget trait wrapping the AdapterRouter-facing surface) → unlocks the 7 ❌ tests and the §6.10 SendError end-to-end test in CI.
Cross-agent Scenario D smoke — once CI fixture for per-agent credentials is in place.

Edits

2026-05-02: §2.6 freshness re-rated ✅ after 16bed85 (per-message snapshot replaces Arc<AtomicBool> mirror); follow-up perf: cache deps layer + drop arm64 QEMU build #2 (per-thread Arc map fix) struck — no longer needed.

shaun-agent · 2026-05-02T11:01:27Z

OpenAB PR Screening

This is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Click 👍 if you find this useful. Human review will be done within 24 hours. We appreciate your support and contribution 🙏

Title: feat(dispatch): turn-boundary batching dispatcher v2
Source: feat(dispatch): turn-boundary batching dispatcher v2 #686
Status: moved to PR-Screening
Generated at: 2026-05-02T11:01:26.958Z
Discord thread: https://discord.com/channels/1488041051187974246/1500089673958035536

Screening report

## Intent

PR #686 aims to add turn-boundary message batching across Discord, Slack, and Gateway adapters. The operator-visible problem is that rapid follow-up messages can currently be processed as separate turns, creating unnecessary agent runs, fragmented context, and awkward response timing. This PR introduces a dispatcher that starts the first message immediately, buffers later messages while a turn is in flight, and dispatches them as a batch at the next turn boundary.

Feat

Feature: a new dispatch layer for batched message processing.

Behavioral change: adapters can run either in existing per-message mode or in batched mode via configuration. Batched mode enforces one active turn per thread/channel context, preserves structured arrival metadata, evicts failed send paths, and adds buffer/token limits.

Who It Serves

Primary beneficiaries: Discord and Slack end users who send multi-message bursts, plus agent runtime operators who need lower duplicate-run pressure and cleaner turn semantics.

Secondary beneficiaries: maintainers and reviewers, because adapter behavior becomes more explicit through Dispatcher, ThreadHandle, BufferedMessage, and shared packing logic.

Rewritten Prompt

Implement Phase 1 of the turn-boundary batching ADR.

Add a shared dispatcher that supports configurable per-message or batched processing for Discord, Slack, and Gateway. In batched mode, the first message for a thread should dispatch immediately, subsequent messages should buffer while the turn is active, and exactly one buffered batch should dispatch when the active turn completes. Preserve message ordering, channel/message references, adapter routing metadata, and existing per-message behavior when batching is disabled.

Add config fields for processing mode, max buffered messages, and max batch tokens. Add focused tests for token estimation, arrival-event packing, batching shape, buffer limits, SendError eviction, and no overlapping turns for the same thread.

Merge Pitch

This is worth advancing because it addresses a real interaction quality issue: users often send thoughts in bursts, but agents should usually answer once per coherent turn. The PR also centralizes dispatch behavior instead of leaving each adapter to approximate batching independently.

Risk profile: moderate to high. It adds a new concurrency primitive and touches Discord, Slack, Gateway, config, and main wiring. Likely reviewer concerns are message loss, duplicate dispatch, ordering guarantees, shutdown behavior, backpressure, and whether the test coverage is strong enough for async edge cases.

Best-Practice Comparison

OpenClaw principles that apply:

Gateway-owned scheduling: partially relevant. This is not scheduled-job execution, but OpenAB benefits from having dispatch ownership centralized rather than spread across adapters.
Durable job persistence: not implemented here. Buffered messages appear in-memory, so process crashes can lose pending batches. Acceptable for Phase 1 only if documented.
Isolated executions: relevant. The at-most-one-in-flight-turn invariant is aligned with isolated per-thread execution.
Explicit delivery routing: relevant. ChannelRef, MessageRef, and uniform arrival packing move in this direction.
Retry/backoff and run logs: only partly covered. SendError eviction helps, but reviewers should look for observability around dropped batches, buffer overflow, and dispatch failures.

Hermes Agent principles that apply:

Gateway daemon tick model: mostly not relevant; this is event-driven message dispatch, not periodic scheduled execution.
File locking to prevent overlap: conceptually relevant. The dispatcher’s per-thread in-flight guard is the equivalent overlap-prevention mechanism.
Atomic writes for persisted state: not relevant unless batching state becomes durable later.
Fresh session per scheduled run: not relevant to message batching.
Self-contained prompts for scheduled tasks: partly relevant through structured batch packing; each batched dispatch should include enough arrival context to be independently understandable.

Implementation Options

Conservative option: merge only the shared packing/config groundwork and keep adapters in per-message mode by default. Land MessageProcessingMode, ChannelRef, MessageRef, and pack_arrival_event, but defer the dispatcher until tests and rollout semantics are tighter.

Balanced option: merge the dispatcher behind opt-in config. Keep current behavior as the default, enable batched mode per adapter, and require async tests for ordering, no overlap, overflow handling, and shutdown before merge.

Ambitious option: evolve the dispatcher into a durable runtime queue. Persist buffered messages, add retry/backoff, structured run logs, metrics, and recovery after restart. This would align more closely with OpenClaw-style durable execution but is larger than the current PR.

Comparison Table

Option	Speed to ship	Complexity	Reliability	Maintainability	User impact	Fit for OpenAB right now
Conservative groundwork only	High	Low	Medium	High	Low	Good if reviewers are unsure about async behavior
Opt-in dispatcher behind config	Medium	Medium	Medium-High with tests	Medium-High	High for bursty chat users	Best current fit
Durable dispatcher/runtime queue	Low	High	High	Medium	High	Better as follow-up work

Recommendation

Advance the balanced option: review PR #686 as an opt-in Phase 1 dispatcher, with current per-message behavior preserved as the safe default.

Before merge discussion, require focused async coverage for the core invariants: first-message zero latency, at-most-one in-flight turn, ordered batch dispatch, buffer/token limits, SendError eviction, and graceful shutdown. Split durable persistence, retry/backoff, and richer run logs into a follow-up PR so this change stays reviewable.

chaodu-agent

CHANGES_REQUESTED — Strong architecture with thorough ADR backing and good test coverage, but three issues need attention before merge.

Baseline Check

Main has per-message dispatch (each inbound → one tokio::spawn → one ACP turn), inline block-packing in adapter.rs:131-152 that reorders extra_blocks (text blocks before sender_context, images after), Slack KeyedAsyncQueue for FIFO ordering, and no batching mechanism.

PR adds: src/dispatch.rs (724 lines) with Dispatcher, BufferedMessage, consumer_loop, dispatch_batch; MessageProcessingMode enum; pack_arrival_event() unified packing; SenderContext.timestamp; ReactionsConfig gains Clone; Slack KeyedAsyncQueue removed; /reset integration.

四問框架

What problem? Multiple messages arriving during an in-flight ACP turn become separate sequential turns — wasting tokens, losing attachment attribution, non-deterministic ordering.
How? Per-thread bounded mpsc::channel with consumer task. First message fires immediately (I1), subsequent messages buffer and batch at turn boundary. Config-gated: per-message (default) or batched.
Alternatives? Pre-turn debouncing (rejected: adds latency), mutex-level coalescing (rejected: opaque), <message index=N> wrapper (rejected: attribution loss). All documented in ADR #598.
Best approach? Architecture is sound. Three issues below need resolution.

Traffic Light

🔴 SUGGESTED CHANGES

1. Block ordering semantic change in pack_arrival_event

On main, adapter.rs:131-152 reorders extra_blocks: text blocks (voice transcripts) go before the sender_context header, image blocks go after. The new pack_arrival_event places all extra_blocks after the header in arrival order. This changes behavior even in PerMessage mode (which also calls pack_arrival_event). Voice transcript blocks will now appear after the prompt instead of before it.

Recommendation: Document this ordering change explicitly. If intentional (ADR says it is), confirm existing agents handle transcripts appearing after the prompt correctly. Consider whether this warrants a CHANGELOG entry.

2. ReactionsConfig::default() used for queued emoji in dispatch_batch

let queued_emoji = crate::config::ReactionsConfig::default().emojis.queued;

This ignores the user's actual reactions config (custom emojis, enabled flag). The router's config is available via router.reactions_config() (which this PR itself adds). Should use the actual config.

3. Slack KeyedAsyncQueue removal affects PerMessage mode

The PR removes KeyedAsyncQueue entirely from Slack, but in PerMessage mode there's now no per-thread serialization — messages go directly to router.handle_message() without the semaphore guard. On main, KeyedAsyncQueue ensured FIFO ordering even in per-message mode. This is a regression for PerMessage Slack users.

Recommendation: Either keep KeyedAsyncQueue for PerMessage mode, or document that PerMessage mode on Slack no longer guarantees strict FIFO ordering.

🟡 NIT

Duplicated days_to_ymd / timestamp conversion between slack.rs and gateway.rs — extract to shared utility
sender_name field from ADR §2.3 missing in BufferedMessage — note the divergence
DispatchError doesn't implement std::error::Error — limits composability with anyhow

🟢 INFO

Excellent test coverage for packing logic (single message, batch of 2, extra blocks, all four ADR §3.6 scenarios)
Clean config gating — default PerMessage is fully backward-compatible
Graceful shutdown with buffered_lost counts per thread
/reset integration drops pending messages and reports count to user
stream_prompt_blocks extraction enables clean reuse

Previously slack_ts_to_iso8601 split on '.' and parsed the fractional substring as an integer, treating ".12" as 12 ms instead of 120 ms. Parsing the entire string as f64 carries decimal semantics correctly without any string-padding logic. Co-Authored-By: Claude Opus 4.7 <[email protected]>

The buffered-message count is approximate (sweep races with new arrivals) so surfacing an exact number to users was misleading. Show a binary "cleared / nothing" signal instead. The pending_count() API stays for logs and metrics. Co-Authored-By: Claude Opus 4.7 <[email protected]>

…ents Make the no-.await-while-locked invariant explicit at each lock acquisition site so future edits can't silently introduce an .await without tripping the comment. The struct-level note at line 183 stays as the higher-level explanation. Co-Authored-By: Claude Opus 4.7 <[email protected]>

Replace futures_util::future::join_all with a sequential await loop. Batches are typically small (low single digits) so the serialization cost is sub-second and not user-visible, and the dispatch path no longer pulls in join_all just for one call. Co-Authored-By: Claude Opus 4.7 <[email protected]>

Per-message mode (cap=1) doesn't benefit from holding consumers across message gaps — there is no batch window to preserve — so a 5-minute idle timeout left consumer tasks lingering long after they were useful. Add PER_MESSAGE_CONSUMER_IDLE_TIMEOUT (10s), wire it through main.rs based on each adapter's message_processing_mode, and drop the unused Dispatcher::new wrapper. By Little's Law (steady-state idle count = arrival rate × idle window), this cuts per-message-mode idle dispatcher footprint by 30x for the same arrival rate while keeping batched modes' 5-minute window so between-trigger lanes aren't torn down on every message. Co-Authored-By: Claude Opus 4.7 <[email protected]>

…ry-batching-v2 # Conflicts: # src/gateway.rs

…e 1 implementation Updates the ADR to match decisions made during PR openabdev#686 review: - 3-valued MessageProcessingMode (per-message / per-thread / per-lane) replacing the earlier 2-valued (per-message / batched) design. §4.1 documents per-mode (cap, dispatcher key, idle timeout) tuple; §4.4 Phase 1 bullets reflect the unified Dispatcher::submit path; legacy "batched" alias is rejected at config parse. - Standalone <sender_context> Text block (commit 072010c). §3.1 / §3.3 / §3.4 / §3.5 / §3.6 + §6.4 rule 4 now describe the split-block layout: delimiter + transcripts + prompt + images. Transcripts move from before the envelope to inside the arrival event (between delimiter and prompt); images stay after prompt as in the pre-batching adapter. Empty prompt is omitted from the block stream. - New §6.10 — per-mode consumer idle timeout (PER_MESSAGE = 10s, DEFAULT = 300s) with Little's-Law rationale and sweep_stale eviction. - New §6.11 — SendError manual staging smoke matrix (the entry deferred out of CI in PR openabdev#686's first status update). - §6.7 batch reactions now explicitly sequential (not join_all parallel) so reaction-list ordering across a batch matches message-ID order. - Frontmatter: drop the self-referential "Supersedes: PR openabdev#598" line; add "Implementation PR: openabdev#686" so the ADR points at the wiring it documents.

…enabdev#686 head Five rounds of fact-check + proofread against PR openabdev#686 (feature/turn-boundary-batching-v2 @ e119abf) caught two threads of drift: - Design contract: §2.5 SendError handler now matches commit afd6fff — proactive consumer.is_finished() check at submit head + transparent retry once on SendError; ❌ + ⚠️ + Err(ConsumerDead) only if the retry also fails. Motivated by the first-message-after-idle race; one-attempt bound preserves the no-spin-loop property. §6.11 staging smoke matrix split into Path A (PANIC_ONCE happy path, no user-visible signal) and Path B (PANIC_ALWAYS failing-retry surfaces ❌+⚠️). §4.4 Phase 1 plan + test list updated to the new contract. - Anchor audit vs declared base v0.8.2-beta.1 (52052b8): pre-existing drift fixed in adapter.rs references that had been wrong since the SHA pin was set in v0.2 — :131-152→:156-172, :138-143→:158-162 (7 sites), :148-152→:165-169, :154-161→:173-180, :181→:254 (was pointing at the wrong call), :240→:260. acp/connection.rs / acp/pool.rs / discord.rs / slack.rs anchors verified clean against 52052b8. - §2.6 rewritten: other_bot_present is a bool snapshot carried on BufferedMessage and read from batch.last() at dispatch time — not the Arc<AtomicBool> mirror of an earlier draft. §2.3 struct + submit signature corrected to match. - Anchor-pinning preamble (line 9) expanded to pin both SHAs explicitly: released-code anchors → 52052b8; conceptual descriptions of new modules → cross-checked against e119abf. - Appendix A replaced with a signatures-only skeleton pointing at src/dispatch.rs — drops the ~200-line body sketch that had drifted from the implementation; rationale moved into a short shape-choices list. - Path anchors swept: pool.rs → acp/pool.rs, connection.rs → acp/connection.rs (modules live under src/acp/ in v0.8.2-beta.1). - §6.6 metric table cell tokens_per_event (was context_tokens_per_event, inconsistent with the code block immediately below). Co-Authored-By: Claude Opus 4.7 <[email protected]>

brettchien · 2026-05-05T15:44:31Z

Status update — review feedback addressed + ADR re-synced

Four-reviewer review (CHANGES REQUESTED) — disposition

🔴 Suggested changes

perf: cache dependency build layer in Dockerfile #1 std::sync::Mutex SAFETY comments — addressed in
3f19d48. Each
per_thread.lock() call site now carries // SAFETY: no .await while this guard is held with the guard-drop boundary explicit.
perf: cache deps layer + drop arm64 QEMU build #2 per-message mode overhead (cap=1 idling 5 min) — addressed in
963b96c. Per-mode
idle timeout: PER_MESSAGE_CONSUMER_IDLE_TIMEOUT = 10s,
DEFAULT_CONSUMER_IDLE_TIMEOUT = 300s. ADR §6.10 documents the
Little's-Law rationale; sweep_stale on 60s ticker bounds the worst-case
linger for one-shot threads that never wake to self-evict.
fix: use app bot identity for chart bump commits #3 missing Cargo.toml — futures-util = "0.3" is in Cargo.toml.
Now also moot: 93765a1
dropped join_all entirely in favor of sequential reaction application,
which also resolves nit fix: pull --rebase before push in bump-chart #4 (no JoinSet swap needed) and aligns with
ADR §6.7 "applied sequentially to preserve message-ID order in the
Discord/Slack reaction list."
fix: pull --rebase before push in bump-chart #4 timestamp.rs millisecond padding bug — addressed in
0f339e2. Now
parses the full <unix>.<usec> string as f64 so .12 carries decimal
semantics (120ms, not 12ms). Tests: slack_ts_two_digit_fraction_is_120ms_not_12ms,
slack_ts_one_digit_fraction_is_100ms_not_1ms.

🟡 Nits

perf: cache dependency build layer in Dockerfile #1 ADR not in diff — addressed via PR #598;
the ADR was just re-synced to match this PR's head (e119abf). Five
rounds of fact-check.
fix: use app bot identity for chart bump commits #3 pending_count() shown to users — addressed in
956461b. The
/cancel-all message no longer surfaces the approximate count.
perf: cache deps layer + drop arm64 QEMU build #2 ReactionsConfig made Clone — noted; minor public-surface
change, will mention in PR description before merge.
fix: use PR for chart bump instead of direct push #5 main.rs match duplication, fix: use app token for checkout so push bypasses ruleset #6 cleanup task lock contention —
not addressed; both are non-blocking quality-of-life items. Happy to
fold them into a follow-up if reviewers prefer.

ADR (PR #598) — what's now in sync

Pushed 771e87a
on top of b9cf3f8. One substantive design contract caught up to the
implementation:

§2.5 SendError handler. Was still written as "early-error, no
auto-retry" — this PR's afd6fff
added proactive consumer.is_finished() check at submit head +
transparent retry once on SendError. ❌ + ⚠️ + Err(ConsumerDead)
only fires if the retry also fails. Motivated by the
first-message-after-idle race; one-attempt bound preserves the
no-spin-loop property. ADR now documents this as the design.
§6.11 staging smoke matrix split into Path A (PANIC_ONCE, retry
succeeds → no user-visible signal) and Path B (PANIC_ALWAYS,
retry-also-fails → ❌ + ⚠️ + Err).

Also picked up: §2.6 rewrite (other_bot_present is a bool snapshot
on BufferedMessage, read from batch.last() at dispatch — matches the
16bed85 fix), Appendix A skeletonization, and 8 pre-existing
adapter.rs anchor drift fixes (anchors had been wrong since the v0.2
SHA pin was set, never re-validated — all corrected vs declared base
52052b8).

Remaining open items

Nits perf: cache deps layer + drop arm64 QEMU build #2 / fix: use PR for chart bump instead of direct push #5 / fix: use app token for checkout so push bypasses ruleset #6 from the four-reviewer review — please advise whether
to fold into this PR or punt to follow-ups.
Test coverage gaps in ADR §4.4 checklist (consumer-loop integration,
buffer-full parking, freshness behaviour) remain blocked on the
Dispatcher → DispatchTarget trait seam — partial seam exists (see
98c2086),
worth confirming whether the deferred entries are now reachable.

Ready for re-review.

…ts, fix ADR path - main.rs: collapse 3x repeated (cap, grouping, idle) match blocks into dispatch::dispatch_params(mode, max_buffered). - dispatch.rs: replace magic 4 / 512 in estimate_tokens with named CHARS_PER_TOKEN_ESTIMATE / TOKENS_PER_IMAGE_ESTIMATE constants. - dispatch.rs: fix top-level ADR reference to point at the actual docs/adr/turn-boundary-batching.md path landing in openabdev#598. Addresses chaodu-agent NITs openabdev#1, openabdev#2, openabdev#5 from PR openabdev#686. Co-Authored-By: Claude Opus 4.7 <[email protected]>

brettchien · 2026-05-05T17:19:28Z

Status update — chaodu-agent NITs #1 / #2 / #5 addressed

Pushed aea6782 addressing 3 of the 5 nits.

NIT perf: cache dependency build layer in Dockerfile #1 (ADR dead link) — fixed. The top-of-file dispatch.rs comment was pointing at turn-boundary-batching-adr.md, but the actual file landing in docs: add ADR for turn-boundary message batching #598 is docs/adr/turn-boundary-batching.md (verified against docs: add ADR for turn-boundary message batching #598 head 771e87a). Updated the reference so it resolves once docs: add ADR for turn-boundary message batching #598 merges. The 6 in-body ADR §X.Y section refs are kept — they're section pointers, not filename refs.
NIT perf: cache deps layer + drop arm64 QEMU build #2 (main.rs repetition) — fixed. Extracted dispatch::dispatch_params(mode, max_buffered) -> (cap, grouping, idle) and collapsed the 3× repeated 8-line match blocks (Slack / Gateway / Discord) into single calls. Net: -29 / +40 with the helper + doc.
NIT fix: use PR for chart bump instead of direct push #5 (estimate_tokens magic numbers) — fixed. Extracted CHARS_PER_TOKEN_ESTIMATE = 4 and TOKENS_PER_IMAGE_ESTIMATE = 512 as consts with brief rationale. Did not make these configurable per the reviewer's note that "for now it is fine as a soft cap" — that's premature.

Not addressed (intentionally):

NIT fix: use app bot identity for chart bump commits #3 (cancel_buffered_thread prefix) — reviewer marked "no action needed — just noting it was verified". Code at dispatch.rs:400-417 is correct (exact-match OR prefix: with trailing colon) and covered by cancel_buffered_thread_does_not_match_thread_id_prefix.
NIT fix: pull --rebase before push in bump-chart #4 (SenderContext.timestamp additive) — reviewer was confirming, not requesting. Already Option<String> + skip_serializing_if = "Option::is_none", schema stays openab.sender.v1.

Verification

cargo build — clean
cargo clippy -- -D warnings — 0 warnings
cargo test --bin openab — 238 passed
cargo clippy --all-targets — 2 pre-existing bool_assert_comparison warnings in src/cron.rs:448 / :465, not introduced by this commit; tracked separately.

…nale - adapter.rs: note that future breaking changes should bump to v1.1+ - main.rs: explain why Arc<Mutex<Vec<Arc<Dispatcher>>>> is necessary (shared with cleanup task + shutdown; pushes at startup only) Addresses maintainer NITs from PR openabdev#686 review. Co-Authored-By: 超渡法師 <[email protected]>

chaodu-agent · 2026-05-05T20:05:24Z

四法師 Review — All findings addressed ✅

All 🔴 and 🟡 items from the initial review have been resolved:

#	Finding	Resolution	Commit
🔴 1	`std::sync::Mutex` SAFETY comments	✅ 7 lock sites annotated	`3f19d48`
🔴 2	per-message mode overhead	✅ `PER_MESSAGE_CONSUMER_IDLE_TIMEOUT = 10s`	`963b96c`
🔴 3	`Cargo.toml` missing	✅ `futures_util` already present	—
🔴 4	timestamp ms padding bug	✅ f64 parse + `.12 → 120ms` test	`0f339e2`
🟡 5	`main.rs` match duplication	✅ `dispatch_params()` helper	`aea6782`
🟡 NIT	`/cancel-all` approximate count	✅ Binary signal only	`956461b`
🟡 NIT	Schema version comment	✅ Note v1.1+ for breaking changes	`1bc3c0c`
🟡 NIT	`dispatchers` triple-Arc rationale	✅ Comment explaining necessity	`1bc3c0c`

Verdict: Ready for maintainer approval. Architecture is sound, backward compatible by default, well-tested, and all review findings addressed.

Reviewers

超渡法師 (primary reviewer + maintainer NIT fixes)
覺渡法師 (Gemini) — timestamp bug catch
普渡法師 (Claude) — dispatch_params extraction
擺渡法師 (Codex) — confirmed

chaodu-agent

All four-monk review findings addressed. Architecture is sound, backward compatible, well-tested. LGTM ✅

…per-lane) Decision guide for operators choosing between the three modes, with config examples and trade-off explanations. Co-Authored-By: 超渡法師 <[email protected]>

Visual explanation of per-message vs per-thread vs per-lane behavior, plus the internal consumer_loop batching flow. Co-Authored-By: 超渡法師 <[email protected]>

…mode

brettchien requested a review from thepagent as a code owner May 1, 2026 15:49

brettchien force-pushed the feature/turn-boundary-batching-v2 branch 6 times, most recently from c408a5a to 00d7b25 Compare May 2, 2026 04:16

github-actions Bot removed needs-rebase pending-contributor labels May 2, 2026

brettchien force-pushed the feature/turn-boundary-batching-v2 branch from 00d7b25 to 5b7e08c Compare May 2, 2026 05:17

github-actions Bot added the pending-maintainer label May 2, 2026

brettchien force-pushed the feature/turn-boundary-batching-v2 branch from 5b7e08c to 16bed85 Compare May 2, 2026 06:21

chaodu-agent requested changes May 2, 2026

View reviewed changes

chaodu-agent mentioned this pull request May 2, 2026

docs: add ADR for turn-boundary message batching #598

Merged

3 tasks

chaodu-agent added pending-contributor and removed pending-maintainer labels May 2, 2026

github-actions Bot added the pending-maintainer label May 3, 2026

github-actions Bot added pending-maintainer and removed pending-contributor labels May 5, 2026

This comment has been minimized.

Sign in to view

chaodu-agent added pending-contributor and removed pending-maintainer labels May 5, 2026

This comment has been minimized.

Sign in to view

github-actions Bot added the needs-rebase label May 5, 2026

brettchien and others added 6 commits May 5, 2026 13:42

Merge remote-tracking branch 'upstream/main' into feature/turn-bounda…

e119abf

…ry-batching-v2 # Conflicts: # src/gateway.rs

github-actions Bot removed the needs-rebase label May 5, 2026

This comment has been minimized.

Sign in to view

chaodu-agent previously approved these changes May 5, 2026

View reviewed changes

chaodu-agent added 4 commits May 5, 2026 20:08

docs: add message dispatch modes guide (per-message vs per-thread vs …

73f4cc5

…per-lane) Decision guide for operators choosing between the three modes, with config examples and trade-off explanations. Co-Authored-By: 超渡法師 <[email protected]>

docs(dispatch): add ASCII diagrams for all three modes + consumer loop

beb5493

Visual explanation of per-message vs per-thread vs per-lane behavior, plus the internal consumer_loop batching flow. Co-Authored-By: 超渡法師 <[email protected]>

docs: clarify per-message is the default behavior

b078046

docs(dispatch): add explicit pros/cons and comparison table for each …

c324989

…mode

thepagent approved these changes May 5, 2026

View reviewed changes

obrutjack mentioned this pull request May 6, 2026

docs: fix ADR turn-boundary-batching compliance rule precision #754

Closed

Conversation

brettchien commented May 1, 2026 • edited by thepagent Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

ADR compliance

Testing

Uh oh!

brettchien commented May 2, 2026

Status update — SendError testing approach

Uh oh!

brettchien commented May 2, 2026

What's in PR 686 — feature checklist

Uh oh!

brettchien commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ADR Phase 1 (§4.4) — item-by-item checklist

Mechanism deliverables

Tests required by §4.4

Tally

Recommended follow-ups (separate PRs)

Edits

Uh oh!

shaun-agent commented May 2, 2026

OpenAB PR Screening

Feat

Who It Serves

Rewritten Prompt

Merge Pitch

Best-Practice Comparison

Implementation Options

Comparison Table

Recommendation

Uh oh!

chaodu-agent left a comment

Choose a reason for hiding this comment

🔴 SUGGESTED CHANGES

🟡 NIT

🟢 INFO

Uh oh!

This comment has been minimized.

This comment has been minimized.

brettchien commented May 5, 2026

Status update — review feedback addressed + ADR re-synced

Four-reviewer review (CHANGES REQUESTED) — disposition

🔴 Suggested changes

🟡 Nits

ADR (PR #598) — what's now in sync

Remaining open items

Uh oh!

This comment has been minimized.

brettchien commented May 5, 2026

Status update — chaodu-agent NITs #1 / #2 / #5 addressed

Verification

Uh oh!

This comment has been minimized.

chaodu-agent commented May 5, 2026

四法師 Review — All findings addressed ✅

Uh oh!

chaodu-agent left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

brettchien commented May 1, 2026 •

edited by thepagent

Loading

brettchien commented May 2, 2026 •

edited

Loading