Skip to content

Fix oversized chat session message loads#874

Merged
simple-agent-manager[bot] merged 2 commits intomainfrom
sam/hey-ive-given-token-01kqjd
May 1, 2026
Merged

Fix oversized chat session message loads#874
simple-agent-manager[bot] merged 2 commits intomainfrom
sam/hey-ive-given-token-01kqjd

Conversation

@simple-agent-manager
Copy link
Copy Markdown
Contributor

@simple-agent-manager simple-agent-manager Bot commented May 1, 2026

Summary

  • cap chat session detail message loads to SAM_HISTORY_LOAD_LIMIT/default 200 instead of allowing 1000/5000 message batches
  • add regression coverage for default and configured session detail limits
  • file backlog bug for tail-worker log ingest auth failures discovered during production log investigation

Production Investigation

  • Cloudflare Workers Observability showed GET /api/projects/01KMZVCFWBCX98D6S7DT1RZFVM/sessions/... failures at 2026-05-01T17:23:57Z, 17:33:29Z, and 17:34:07Z
  • error was Cloudflare DO RPC serialization limit: return values over 32 MiB, with observed payload sizes around 56 MB and 83 MB
  • this matches the red Internal Server Error and empty chat UI when loading message-heavy sessions

Agent Preflight

  • Preflight completed before code changes

Classification

  • external-api-change
  • cross-component-change
  • business-logic-change
  • public-surface-change
  • docs-sync-change
  • security-sensitive-change
  • ui-change
  • infra-change

External References

Cloudflare Workers Observability logs were queried directly with the production debugging token to identify the runtime exception. No external API contract changed.

Codebase Impact Analysis

Changed apps/api/src/routes/chat.ts session-detail loading behavior and apps/api/tests/unit/routes/chat-session-agent-routing.test.ts regression coverage. Added tasks/backlog/2026-05-01-tail-worker-log-ingest-auth.md for a separate log-ingest issue found during investigation.

Documentation & Specs

No behavior docs required for the API cap itself; the discovered tail-worker observability bug was documented as a backlog task with context and acceptance criteria.

Constitution & Risk Check

Checked Principle XI by using existing configurable SAM_HISTORY_LOAD_LIMIT with DEFAULT_SAM_HISTORY_LOAD_LIMIT rather than adding a new hardcoded limit. Risk is bounded to initial/paginated chat history batch size; UI already supports hasMore pagination.

Verification

  • pnpm install
  • pnpm --filter @simple-agent-manager/shared build
  • pnpm --filter @simple-agent-manager/providers build
  • pnpm --filter @simple-agent-manager/cloud-init build
  • pnpm --filter @simple-agent-manager/api test -- chat-session-agent-routing.test.ts
  • pnpm --filter @simple-agent-manager/api typecheck

Specialist Review Evidence

  • Cloudflare specialist checklist applied for Workers/DO/D1 investigation: direct Workers Observability and D1 read-only queries were used before code changes.

Staging

  • Not deployed yet. This PR should go through the normal staging merge gate before merge.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 1, 2026

@simple-agent-manager simple-agent-manager Bot merged commit 091f2b6 into main May 1, 2026
22 checks passed
@simple-agent-manager simple-agent-manager Bot deleted the sam/hey-ive-given-token-01kqjd branch May 1, 2026 19:26
simple-agent-manager Bot pushed a commit that referenced this pull request May 5, 2026
Two regressions from PR #874 (May 1):

1. Message limit dropped from 1000 to 200 by reusing SAM_HISTORY_LOAD_LIMIT
   (designed for SAM conversation persistence) for chat session REST endpoints.
   Added DEFAULT_CHAT_SESSION_MESSAGE_LIMIT (3000) with CHAT_SESSION_MESSAGE_LIMIT
   env var override.

2. Polling fallback (every 3s) and WebSocket catch-up used 'replace' merge
   strategy that discarded all earlier-loaded messages. After clicking
   "Load earlier messages", the next poll cycle would reset back to 200.
   Fixed mergeReplace() to preserve messages older than the incoming window,
   and removed hasMore reset from polling/catch-up (only initial load and
   explicit loadMore should update pagination state).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
simple-agent-manager Bot added a commit that referenced this pull request May 5, 2026
#898)

* task: activate fix-chat-message-loading-regression

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: restore chat message loading limits and preserve earlier messages

Two regressions from PR #874 (May 1):

1. Message limit dropped from 1000 to 200 by reusing SAM_HISTORY_LOAD_LIMIT
   (designed for SAM conversation persistence) for chat session REST endpoints.
   Added DEFAULT_CHAT_SESSION_MESSAGE_LIMIT (3000) with CHAT_SESSION_MESSAGE_LIMIT
   env var override.

2. Polling fallback (every 3s) and WebSocket catch-up used 'replace' merge
   strategy that discarded all earlier-loaded messages. After clicking
   "Load earlier messages", the next poll cycle would reset back to 200.
   Fixed mergeReplace() to preserve messages older than the incoming window,
   and removed hasMore reset from polling/catch-up (only initial load and
   explicit loadMore should update pagination state).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: correct test syntax error and update catch-up test expectation

- Fix extra closing brace in merge-messages.test.ts that caused parse error
- Update project-message-view catch-up test: empty incoming now correctly
  preserves existing messages (intentional behavior change from the fix)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address review findings — move constant, clean onCatchUp interface

- Move DEFAULT_CHAT_SESSION_MESSAGE_LIMIT from sam.ts to defaults.ts
  (it's a chat REST limit, not a SAM agent constant)
- Remove hasMore param from onCatchUp interface in useChatWebSocket.ts
  (catch-up should not reset pagination state)
- Update WorkspaceChatView.tsx onCatchUp to match new 2-param contract
- Update test mocks to match new interface

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prevent silent message drop at same-timestamp boundary

The mergeReplace boundary condition now preserves messages from prev
at the same timestamp as oldestIncoming when their ID is not in the
incoming set. This prevents silent drops when the server returns only
some messages at the boundary timestamp (e.g., batched tool calls).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Raphaël Titsworth-Morin <raphael@raphaeltm.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant