Skip to content

Reject concurrent chat runs on the same conversation with 409#51

Merged
mgoldsborough merged 2 commits intomainfrom
fix/concurrent-chat-rejection
Apr 20, 2026
Merged

Reject concurrent chat runs on the same conversation with 409#51
mgoldsborough merged 2 commits intomainfrom
fix/concurrent-chat-rejection

Conversation

@mgoldsborough
Copy link
Copy Markdown
Contributor

@mgoldsborough mgoldsborough commented Apr 20, 2026

Summary

  • Adds a per-conversation in-memory lock on Runtime.chat(): a second concurrent call on the same conversationId rejects with RunInProgressError instead of starting a duplicate run on in-flight state.
  • POST /v1/chat and POST /v1/chat/stream surface the rejection as HTTP 409 {error: "run_in_progress"}. The stream path also pre-checks so the client gets an HTTP code, not an SSE error mid-stream. For shared conversations, the user.message broadcast to other participants is deferred until the engine emits chat.start, so a rejected request never broadcasts a phantom message with no assistant reply.
  • Web client rolls back the optimistic user + assistant placeholders when it sees 409 so the failed message doesn't stick in history as if it had succeeded, shows a clear "assistant is still working" message via formatSendError, and captures a web.chat_run_in_progress telemetry event so we can measure whether the fix actually stops the bleeding.

Why

A single conversation in production burned 9.3M input tokens in one day. Root cause wasn't an agent loop — it was the user sending follow-up messages while a 27-minute agent turn was still in flight. Each follow-up started a new run on a conversation with in-progress assistant state, which the Anthropic API rejected with:

This model does not support assistant message prefill. The conversation must end with a user message.

Five back-to-back prefill errors on conv_ce7995f740af44d4 between 17:33 and 17:50, while one long-running turn held the conversation. From the user's seat this looks like the app is frozen. The runtime had no concept of "one active turn per conversation," so retries silently corrupted state instead of being rejected cleanly.

This is the smallest change that stops the bleeding. It does not queue deferred messages or cancel the in-flight run — those are follow-ups. It does give the client a typed HTTP signal it can render correctly.

Scope & assumptions

  • The lock is an in-memory Set<string> on the Runtime instance. Correct today because each tenant runs with platform.replicas: 1 — all traffic for a conversation lands on the same pod. Documented on the field; flagged to be revisited if a tenant is ever scaled past one replica (the conversation JSONL on the shared PVC has the same single-writer assumption, so they would move together).

Test plan

  • bun run verify — 1843 unit + 103 web + 358 integration + 16 smoke tests pass
  • test/integration/runtime/concurrent-chat.test.ts (runtime-level):
    • Second chat on the same convId rejects with RunInProgressError
    • Lock releases so a subsequent chat on the same convId succeeds
    • Different convIds are never blocked (parallel resume works)
    • Lock releases on thrown error
  • test/integration/chat-stream-concurrent.test.ts (HTTP-level):
    • /v1/chat/stream pre-check returns 409 (held open deterministically via a gated mock model)
    • 5 concurrent /v1/chat/stream requests: ≥1 winner, all rejections carry run_in_progress code — covers both the pre-check 409 path and the SSE-error-on-race path
  • test/integration/api-integration.test.ts concurrent-requests test updated to assert the new contract: ≥1 request returns 200, the rest return 409 with error: "run_in_progress".
  • Manual: once deployed, confirm HQ tenant no longer produces the prefill-error pattern seen in conv_ce7995f740af44d4, and watch web.chat_run_in_progress event rate.

Out of scope (tracked for follow-ups)

  • Queue or cancel-and-merge semantics instead of rejecting
  • Turn-level circuit breakers (failure-pattern detection, wall-clock cap)
  • Content-addressed blob store to stop re-sending images on every turn
  • MIME magic-byte validation at file ingress

When a user sends a new message while an agent turn on the same
conversation is still in flight, the runtime currently starts a second
run on overlapping state. The Anthropic API then rejects the call with
"model does not support assistant message prefill" because the
serialized history ends with in-progress assistant content. Observed in
production on conv_ce7995f740af44d4 as five back-to-back prefill
failures while one long-running turn (27 minutes of image/Typst thrash)
held the conversation.

This adds an in-memory Set of conversation IDs with an in-flight chat()
call to Runtime. A second concurrent call on the same conversation
throws RunInProgressError; /v1/chat and /v1/chat/stream surface it as
HTTP 409 run_in_progress. The web client drops its optimistic
placeholders and shows a clear "assistant is still working" banner so
the user's draft is preserved for retry.

Out of scope: queueing deferred messages, canceling the in-flight run,
or any scheduler changes. Those land in a follow-up; this is the
smallest change that stops the bleeding.
- Defer user.message broadcast on the stream path until the engine emits
  chat.start. Prevents a phantom broadcast to other participants of a
  shared conversation when the request loses a concurrency race and is
  rejected before the run actually starts.
- Capture web.chat_run_in_progress telemetry so we can measure whether
  the fix stops the bleeding in production.
- Add HTTP-level tests for /v1/chat/stream covering both rejection
  paths: pre-check 409 (held open deterministically via a gated mock
  model) and the concurrent-requests invariant (≥1 winner, every
  rejected request carries the stable run_in_progress code).
- Document the single-replica assumption on the activeConversations
  lock; if a tenant is ever scaled past 1 replica the invariant needs
  to move to a shared store.
@mgoldsborough mgoldsborough added the qa-reviewed QA review completed with no critical issues label Apr 20, 2026
@mgoldsborough mgoldsborough merged commit 2e29ea8 into main Apr 20, 2026
4 checks passed
@mgoldsborough mgoldsborough deleted the fix/concurrent-chat-rejection branch April 20, 2026 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

qa-reviewed QA review completed with no critical issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant