Reject concurrent chat runs on the same conversation with 409 by mgoldsborough · Pull Request #51 · NimbleBrainInc/nimblebrain

mgoldsborough · 2026-04-20T21:02:49Z

Summary

Adds a per-conversation in-memory lock on Runtime.chat(): a second concurrent call on the same conversationId rejects with RunInProgressError instead of starting a duplicate run on in-flight state.
POST /v1/chat and POST /v1/chat/stream surface the rejection as HTTP 409 {error: "run_in_progress"}. The stream path also pre-checks so the client gets an HTTP code, not an SSE error mid-stream. For shared conversations, the user.message broadcast to other participants is deferred until the engine emits chat.start, so a rejected request never broadcasts a phantom message with no assistant reply.
Web client rolls back the optimistic user + assistant placeholders when it sees 409 so the failed message doesn't stick in history as if it had succeeded, shows a clear "assistant is still working" message via formatSendError, and captures a web.chat_run_in_progress telemetry event so we can measure whether the fix actually stops the bleeding.

Why

A single conversation in production burned 9.3M input tokens in one day. Root cause wasn't an agent loop — it was the user sending follow-up messages while a 27-minute agent turn was still in flight. Each follow-up started a new run on a conversation with in-progress assistant state, which the Anthropic API rejected with:

This model does not support assistant message prefill. The conversation must end with a user message.

Five back-to-back prefill errors on conv_ce7995f740af44d4 between 17:33 and 17:50, while one long-running turn held the conversation. From the user's seat this looks like the app is frozen. The runtime had no concept of "one active turn per conversation," so retries silently corrupted state instead of being rejected cleanly.

This is the smallest change that stops the bleeding. It does not queue deferred messages or cancel the in-flight run — those are follow-ups. It does give the client a typed HTTP signal it can render correctly.

Scope & assumptions

The lock is an in-memory Set<string> on the Runtime instance. Correct today because each tenant runs with platform.replicas: 1 — all traffic for a conversation lands on the same pod. Documented on the field; flagged to be revisited if a tenant is ever scaled past one replica (the conversation JSONL on the shared PVC has the same single-writer assumption, so they would move together).

Test plan

bun run verify — 1843 unit + 103 web + 358 integration + 16 smoke tests pass
test/integration/runtime/concurrent-chat.test.ts (runtime-level):
- Second chat on the same convId rejects with RunInProgressError
- Lock releases so a subsequent chat on the same convId succeeds
- Different convIds are never blocked (parallel resume works)
- Lock releases on thrown error
test/integration/chat-stream-concurrent.test.ts (HTTP-level):
- /v1/chat/stream pre-check returns 409 (held open deterministically via a gated mock model)
- 5 concurrent /v1/chat/stream requests: ≥1 winner, all rejections carry run_in_progress code — covers both the pre-check 409 path and the SSE-error-on-race path
test/integration/api-integration.test.ts concurrent-requests test updated to assert the new contract: ≥1 request returns 200, the rest return 409 with error: "run_in_progress".
Manual: once deployed, confirm HQ tenant no longer produces the prefill-error pattern seen in conv_ce7995f740af44d4, and watch web.chat_run_in_progress event rate.

Out of scope (tracked for follow-ups)

Queue or cancel-and-merge semantics instead of rejecting
Turn-level circuit breakers (failure-pattern detection, wall-clock cap)
Content-addressed blob store to stop re-sending images on every turn
MIME magic-byte validation at file ingress

When a user sends a new message while an agent turn on the same conversation is still in flight, the runtime currently starts a second run on overlapping state. The Anthropic API then rejects the call with "model does not support assistant message prefill" because the serialized history ends with in-progress assistant content. Observed in production on conv_ce7995f740af44d4 as five back-to-back prefill failures while one long-running turn (27 minutes of image/Typst thrash) held the conversation. This adds an in-memory Set of conversation IDs with an in-flight chat() call to Runtime. A second concurrent call on the same conversation throws RunInProgressError; /v1/chat and /v1/chat/stream surface it as HTTP 409 run_in_progress. The web client drops its optimistic placeholders and shows a clear "assistant is still working" banner so the user's draft is preserved for retry. Out of scope: queueing deferred messages, canceling the in-flight run, or any scheduler changes. Those land in a follow-up; this is the smallest change that stops the bleeding.

- Defer user.message broadcast on the stream path until the engine emits chat.start. Prevents a phantom broadcast to other participants of a shared conversation when the request loses a concurrency race and is rejected before the run actually starts. - Capture web.chat_run_in_progress telemetry so we can measure whether the fix stops the bleeding in production. - Add HTTP-level tests for /v1/chat/stream covering both rejection paths: pre-check 409 (held open deterministically via a gated mock model) and the concurrent-requests invariant (≥1 winner, every rejected request carries the stable run_in_progress code). - Document the single-replica assumption on the activeConversations lock; if a tenant is ever scaled past 1 replica the invariant needs to move to a shared store.

mgoldsborough added 2 commits April 20, 2026 11:02

mgoldsborough added the qa-reviewed QA review completed with no critical issues label Apr 20, 2026

mgoldsborough merged commit 2e29ea8 into main Apr 20, 2026
4 checks passed

mgoldsborough deleted the fix/concurrent-chat-rejection branch April 20, 2026 21:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reject concurrent chat runs on the same conversation with 409#51

Reject concurrent chat runs on the same conversation with 409#51
mgoldsborough merged 2 commits intomainfrom
fix/concurrent-chat-rejection

mgoldsborough commented Apr 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mgoldsborough commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Scope & assumptions

Test plan

Out of scope (tracked for follow-ups)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mgoldsborough commented Apr 20, 2026 •

edited

Loading