Skip to content

fix(api): enable gzip middleware to keep graph payload parseable#1731

Merged
nicoloboschi merged 1 commit into
mainfrom
fix/graph-payload-gzip
May 25, 2026
Merged

fix(api): enable gzip middleware to keep graph payload parseable#1731
nicoloboschi merged 1 commit into
mainfrom
fix/graph-payload-gzip

Conversation

@nicoloboschi
Copy link
Copy Markdown
Collaborator

Summary

Add starlette's GZipMiddleware to the FastAPI app. Three-line change to the create_app factory.

Why this and not the clamp from #1605

I closed #1605 (cap limit server-side at 200) because it silently broke the Control Plane's "Load more" button on dense banks and assumed the bytes came from node count. They don't — they come from edges.

I ingested 19 LoCoMo sessions into a bank, profiled the /banks/{bank_id}/graph response at limit=500, and got:

nodes=491 edges=75718 table_rows=491

=== section sizes ===
  nodes:       194.8 KiB  (  0.9%)  avg=406 B/node
  edges:       21.2 MiB  ( 97.8%)  avg=293 B/edge
  table_rows:  282.4 KiB  (  1.3%)  avg=588 B/row

=== wire size ===
  raw (Accept-Encoding: identity): 21.7 MiB
  gzip-requested (wire):           1.6 MiB
  compression ratio:               13.8x

Edges are 98% of the payload and gzip-compresses ~14× because the list is extremely repetitive: same JSON keys per edge, UUIDs sharing session prefixes, identical linkType / color / lineStyle strings. Node text, context, and table_rows combined are <3% of bytes — trimming them would be invisible.

1.6 MiB on the wire is comfortably under V8's ~512 MiB string-length cap that was breaking the Control Plane graph view in production. No UI change required; the Control Plane's data-view.tsx useState(1000) continues to work and "Load more" still does something.

What this isn't

A real fix. The endpoint returns 75k edges from only 11.5k DB rows — the bulk are observation-inferred links built in-memory and uncapped (memory_engine.py:4880-4913). Capping that at 10k like the direct links would cut edges ~85% and is the next PR. Tracking separately so this lands fast.

Test plan

  • ruff check, ruff format, ty check clean.
  • Existing tests pass locally on the touched module (no behavioral changes besides response encoding).
  • Verified on a 491-node / 75k-edge bank: response Content-Encoding: gzip when client sends Accept-Encoding: gzip, decompresses to identical bytes as the identity variant.
  • Reviewer: sanity-check that no internal callers send Accept-Encoding: identity and then try to parse Content-Length for sizing (none I can find).

Notes

  • minimum_size=1024 skips compression on small responses where the per-response gzip overhead would dominate.
  • One unrelated doc-skill regen got picked up by the pre-commit hook (upstream alibaba reranker docs were never synced to skills/). Three lines, same commit.

The /banks/{bank_id}/graph response is dominated by edges (~98% of bytes)
and gzip-compresses ~14x because the edge list is extremely repetitive
(same keys, UUIDs sharing prefixes, repeated linkType / color strings).

On a 491-node bank with 75k edges this drops the wire payload from
21.7 MiB to 1.6 MiB, well under V8's ~512 MiB string-length cap that
was breaking the Control Plane graph view on dense production banks.

minimum_size=1024 skips compression on small responses where the gzip
overhead would dominate.

Also includes a hindsight-docs skill regen picked up by pre-commit
(upstream alibaba reranker docs not previously synced into skills/).
@nicoloboschi nicoloboschi merged commit 31d1e17 into main May 25, 2026
66 of 70 checks passed
r0gig0r added a commit to r0gig0r/hindsight that referenced this pull request May 26, 2026
Notable upstream additions pulled in:
- feat(api): clear endpoint for mental model content (vectorize-io#1706)
- feat(api): per-operation LLM concurrency caps (vectorize-io#1738)
- feat(typescript-client): concrete generated types (replace Promise<any>)
- feat(reranker): Alibaba Qwen3-Rerank support (vectorize-io#1501)
- feat: opencode-go LLM provider (vectorize-io#1652)
- feat(extensions): OperationValidator.precheck pre-body-parse hook (vectorize-io#1548)
- feat(right-agent): new Right Agent integration (vectorize-io#1599)
- fix(ollama): ollama-cloud provider + native API auth (vectorize-io#1734)
- fix(reflect): hide disabled tools from agent system prompt (vectorize-io#1740)
- fix(retain): split oversized single items in batch retain (vectorize-io#1736)
- fix: escape literal braces in user-supplied prompt fields (vectorize-io#1728)
- fix(mental-models): full refresh pending delta baselines (vectorize-io#1684)
- fix(api): lazy load reflect tiktoken encoding (vectorize-io#1654)
- fix(api): reject blank retain content (vectorize-io#1685)
- fix(api): auto-refresh openai-codex OAuth access_token (vectorize-io#1637)
- fix(api): gzip middleware for graph payloads (vectorize-io#1731)
- fix(reranker): detect pre-normalized scores; rank-based fallback (vectorize-io#1512)

Conflicts: only package-lock.json files (took upstream, npm install verified)

Fork customizations verified intact (all 14 checks):
- duplicate_checker_fn streaming Phase 1.5 in orchestrator
- FallbackLLMProvider + CircuitBreaker (fallback_llm.py)
- Single-fact consolidation mode (is_fallback_active routing)
- recallExp + Jaccard dedup + compact memory formatter (plugin)
- Codex 5.1-codex-mini reasoning guard
- Infinity reranker /models fallback in cross_encoder.py
- diversity.py + deduplication.py fork-only modules retained

Tests:
- openclaw vitest: 267/267 pass
- ruff: clean
- tsc --noEmit: clean
- pytest: pre-existing env-config flakes (need HINDSIGHT_API_LLM_API_KEY);
  upstream commit 90cb145 acknowledged as pre-existing CI flakes

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant