fix(api): enable gzip middleware to keep graph payload parseable#1731
Merged
Conversation
The /banks/{bank_id}/graph response is dominated by edges (~98% of bytes)
and gzip-compresses ~14x because the edge list is extremely repetitive
(same keys, UUIDs sharing prefixes, repeated linkType / color strings).
On a 491-node bank with 75k edges this drops the wire payload from
21.7 MiB to 1.6 MiB, well under V8's ~512 MiB string-length cap that
was breaking the Control Plane graph view on dense production banks.
minimum_size=1024 skips compression on small responses where the gzip
overhead would dominate.
Also includes a hindsight-docs skill regen picked up by pre-commit
(upstream alibaba reranker docs not previously synced into skills/).
r0gig0r
added a commit
to r0gig0r/hindsight
that referenced
this pull request
May 26, 2026
Notable upstream additions pulled in: - feat(api): clear endpoint for mental model content (vectorize-io#1706) - feat(api): per-operation LLM concurrency caps (vectorize-io#1738) - feat(typescript-client): concrete generated types (replace Promise<any>) - feat(reranker): Alibaba Qwen3-Rerank support (vectorize-io#1501) - feat: opencode-go LLM provider (vectorize-io#1652) - feat(extensions): OperationValidator.precheck pre-body-parse hook (vectorize-io#1548) - feat(right-agent): new Right Agent integration (vectorize-io#1599) - fix(ollama): ollama-cloud provider + native API auth (vectorize-io#1734) - fix(reflect): hide disabled tools from agent system prompt (vectorize-io#1740) - fix(retain): split oversized single items in batch retain (vectorize-io#1736) - fix: escape literal braces in user-supplied prompt fields (vectorize-io#1728) - fix(mental-models): full refresh pending delta baselines (vectorize-io#1684) - fix(api): lazy load reflect tiktoken encoding (vectorize-io#1654) - fix(api): reject blank retain content (vectorize-io#1685) - fix(api): auto-refresh openai-codex OAuth access_token (vectorize-io#1637) - fix(api): gzip middleware for graph payloads (vectorize-io#1731) - fix(reranker): detect pre-normalized scores; rank-based fallback (vectorize-io#1512) Conflicts: only package-lock.json files (took upstream, npm install verified) Fork customizations verified intact (all 14 checks): - duplicate_checker_fn streaming Phase 1.5 in orchestrator - FallbackLLMProvider + CircuitBreaker (fallback_llm.py) - Single-fact consolidation mode (is_fallback_active routing) - recallExp + Jaccard dedup + compact memory formatter (plugin) - Codex 5.1-codex-mini reasoning guard - Infinity reranker /models fallback in cross_encoder.py - diversity.py + deduplication.py fork-only modules retained Tests: - openclaw vitest: 267/267 pass - ruff: clean - tsc --noEmit: clean - pytest: pre-existing env-config flakes (need HINDSIGHT_API_LLM_API_KEY); upstream commit 90cb145 acknowledged as pre-existing CI flakes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add starlette's
GZipMiddlewareto the FastAPI app. Three-line change to thecreate_appfactory.Why this and not the clamp from #1605
I closed #1605 (cap
limitserver-side at 200) because it silently broke the Control Plane's "Load more" button on dense banks and assumed the bytes came from node count. They don't — they come from edges.I ingested 19 LoCoMo sessions into a bank, profiled the
/banks/{bank_id}/graphresponse atlimit=500, and got:Edges are 98% of the payload and gzip-compresses ~14× because the list is extremely repetitive: same JSON keys per edge, UUIDs sharing session prefixes, identical
linkType/color/lineStylestrings. Node text, context, andtable_rowscombined are <3% of bytes — trimming them would be invisible.1.6 MiB on the wire is comfortably under V8's ~512 MiB string-length cap that was breaking the Control Plane graph view in production. No UI change required; the Control Plane's
data-view.tsxuseState(1000)continues to work and "Load more" still does something.What this isn't
A real fix. The endpoint returns 75k edges from only 11.5k DB rows — the bulk are observation-inferred links built in-memory and uncapped (
memory_engine.py:4880-4913). Capping that at 10k like the direct links would cut edges ~85% and is the next PR. Tracking separately so this lands fast.Test plan
ruff check,ruff format,ty checkclean.Content-Encoding: gzipwhen client sendsAccept-Encoding: gzip, decompresses to identical bytes as theidentityvariant.Accept-Encoding: identityand then try to parseContent-Lengthfor sizing (none I can find).Notes
minimum_size=1024skips compression on small responses where the per-response gzip overhead would dominate.alibabareranker docs were never synced toskills/). Three lines, same commit.