fix(api): enable gzip middleware to keep graph payload parseable by nicoloboschi · Pull Request #1731 · vectorize-io/hindsight

nicoloboschi · 2026-05-25T10:08:46Z

Summary

Add starlette's GZipMiddleware to the FastAPI app. Three-line change to the create_app factory.

Why this and not the clamp from #1605

I closed #1605 (cap limit server-side at 200) because it silently broke the Control Plane's "Load more" button on dense banks and assumed the bytes came from node count. They don't — they come from edges.

I ingested 19 LoCoMo sessions into a bank, profiled the /banks/{bank_id}/graph response at limit=500, and got:

nodes=491 edges=75718 table_rows=491

=== section sizes ===
  nodes:       194.8 KiB  (  0.9%)  avg=406 B/node
  edges:       21.2 MiB  ( 97.8%)  avg=293 B/edge
  table_rows:  282.4 KiB  (  1.3%)  avg=588 B/row

=== wire size ===
  raw (Accept-Encoding: identity): 21.7 MiB
  gzip-requested (wire):           1.6 MiB
  compression ratio:               13.8x

Edges are 98% of the payload and gzip-compresses ~14× because the list is extremely repetitive: same JSON keys per edge, UUIDs sharing session prefixes, identical linkType / color / lineStyle strings. Node text, context, and table_rows combined are <3% of bytes — trimming them would be invisible.

1.6 MiB on the wire is comfortably under V8's ~512 MiB string-length cap that was breaking the Control Plane graph view in production. No UI change required; the Control Plane's data-view.tsx useState(1000) continues to work and "Load more" still does something.

What this isn't

A real fix. The endpoint returns 75k edges from only 11.5k DB rows — the bulk are observation-inferred links built in-memory and uncapped (memory_engine.py:4880-4913). Capping that at 10k like the direct links would cut edges ~85% and is the next PR. Tracking separately so this lands fast.

Test plan

ruff check, ruff format, ty check clean.
Existing tests pass locally on the touched module (no behavioral changes besides response encoding).
Verified on a 491-node / 75k-edge bank: response Content-Encoding: gzip when client sends Accept-Encoding: gzip, decompresses to identical bytes as the identity variant.
Reviewer: sanity-check that no internal callers send Accept-Encoding: identity and then try to parse Content-Length for sizing (none I can find).

Notes

minimum_size=1024 skips compression on small responses where the per-response gzip overhead would dominate.
One unrelated doc-skill regen got picked up by the pre-commit hook (upstream alibaba reranker docs were never synced to skills/). Three lines, same commit.

The /banks/{bank_id}/graph response is dominated by edges (~98% of bytes) and gzip-compresses ~14x because the edge list is extremely repetitive (same keys, UUIDs sharing prefixes, repeated linkType / color strings). On a 491-node bank with 75k edges this drops the wire payload from 21.7 MiB to 1.6 MiB, well under V8's ~512 MiB string-length cap that was breaking the Control Plane graph view on dense production banks. minimum_size=1024 skips compression on small responses where the gzip overhead would dominate. Also includes a hindsight-docs skill regen picked up by pre-commit (upstream alibaba reranker docs not previously synced into skills/).

Notable upstream additions pulled in: - feat(api): clear endpoint for mental model content (vectorize-io#1706) - feat(api): per-operation LLM concurrency caps (vectorize-io#1738) - feat(typescript-client): concrete generated types (replace Promise<any>) - feat(reranker): Alibaba Qwen3-Rerank support (vectorize-io#1501) - feat: opencode-go LLM provider (vectorize-io#1652) - feat(extensions): OperationValidator.precheck pre-body-parse hook (vectorize-io#1548) - feat(right-agent): new Right Agent integration (vectorize-io#1599) - fix(ollama): ollama-cloud provider + native API auth (vectorize-io#1734) - fix(reflect): hide disabled tools from agent system prompt (vectorize-io#1740) - fix(retain): split oversized single items in batch retain (vectorize-io#1736) - fix: escape literal braces in user-supplied prompt fields (vectorize-io#1728) - fix(mental-models): full refresh pending delta baselines (vectorize-io#1684) - fix(api): lazy load reflect tiktoken encoding (vectorize-io#1654) - fix(api): reject blank retain content (vectorize-io#1685) - fix(api): auto-refresh openai-codex OAuth access_token (vectorize-io#1637) - fix(api): gzip middleware for graph payloads (vectorize-io#1731) - fix(reranker): detect pre-normalized scores; rank-based fallback (vectorize-io#1512) Conflicts: only package-lock.json files (took upstream, npm install verified) Fork customizations verified intact (all 14 checks): - duplicate_checker_fn streaming Phase 1.5 in orchestrator - FallbackLLMProvider + CircuitBreaker (fallback_llm.py) - Single-fact consolidation mode (is_fallback_active routing) - recallExp + Jaccard dedup + compact memory formatter (plugin) - Codex 5.1-codex-mini reasoning guard - Infinity reranker /models fallback in cross_encoder.py - diversity.py + deduplication.py fork-only modules retained Tests: - openclaw vitest: 267/267 pass - ruff: clean - tsc --noEmit: clean - pytest: pre-existing env-config flakes (need HINDSIGHT_API_LLM_API_KEY); upstream commit 90cb145 acknowledged as pre-existing CI flakes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nicoloboschi merged commit 31d1e17 into main May 25, 2026
66 of 70 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): enable gzip middleware to keep graph payload parseable#1731

fix(api): enable gzip middleware to keep graph payload parseable#1731
nicoloboschi merged 1 commit into
mainfrom
fix/graph-payload-gzip

nicoloboschi commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicoloboschi commented May 25, 2026

Summary

Why this and not the clamp from #1605

What this isn't

Test plan

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant