feat(runtime): batteries-included compression policies (truncating / extractive / anchored)#1885
Merged
Merged
Conversation
…extractive / anchored)
MK shipped the CompressionPolicy / PreTurnCompressionPolicy seams but no
concrete strategy — every consumer had to hand-roll compress(). Add three
default strategies plus a single policy wrapper that conforms to both seams.
- TruncatingCompressionStrategy: zero-inference sliding window (baseline)
- ExtractiveCompressionStrategy: zero-inference scored selection (recency /
length / keyword-density) with verbatim tail + optional headBudgetFraction
knob (middle-out / lost-in-the-middle preservation)
- AnchoredCompressionStrategy: summarize old via generate, prepend a
.memory("summary") record + keep a verbatim recency tail; chunk-and-fold for
over-window input; falls back to extractive on failure/empty/missing-generate
- DefaultCompressionPolicy: Sendable struct, conforms to BOTH CompressionPolicy
and PreTurnCompressionPolicy; static factories .truncating/.extractive/.anchored
Ported and rewritten from Fireside's StoryCompression suite against [ChatMessage].
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a truncating-strategy test that guards the never-drop-newest invariant under load-bearing-overflow — the existing tests passed even with that line removed (the greedy backward fill already keeps the newest in the common case), so the invariant was effectively untested. Sabotage-verified. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Owner
Author
|
Review-pass note (from the adversarial review + fix pass, commit
|
Budget realism (configurable reservedTokens replacing the bare 512, injectable tokenizer on the factories, skip-on-tiny-window guard), extractive verbatim-core overflow clamp, thinking-model robustness (strip <think> before summary parse, configurable response reserve), multimodal/tool per-part token accounting in ContextWindowManager, plus the QA test set (asymmetry boundary, chunk-and-fold, summary-floor, cancellation, tightened weak assertions) and doc fixes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MK already ships the
CompressionPolicy/PreTurnCompressionPolicyseams (wired intoConversationRuntimeviaTurnCompressionCoordinator), but no concrete strategy — the protocol docs show a hand-wavecompress()stub and every consumer had to write their own. This PR adds three batteries-included strategies + one policy wrapper.Ported and rewritten from Fireside's
StoryCompressionsubsystem, adapted from its internalMessage/Resulttuples to MK's[ChatMessage]seam (generateis a parameter, not stored → cleanSendablestructs, no@MainActor/@unchecked).Strategies (a ladder)
TruncatingCompressionStrategy— zero-inference sliding window (keep system + newest-by-budget tail, drop oldest). The canonical baseline. (new — not in Fireside)ExtractiveCompressionStrategy— zero-inference scored selection (recency / length / keyword-density), verbatim tail, greedy within budget. Adds an optionalheadBudgetFractionknob to pin establishing context (middle-out / lost-in-the-middle).AnchoredCompressionStrategy— summarize old messages viagenerate, prepend a.memory("summary")record + keep a verbatim recency tail. Keeps Fireside'ssummarizerInputWindowinput-sizing decoupling, chunk-and-fold for over-window input, summary-floor logic,CancellationErrorearly-return, and fall-back-to-extractive on failure/empty/missing-generate.Policy wrapper
DefaultCompressionPolicyis aSendablestruct conforming to bothCompressionPolicy(post-turn) andPreTurnCompressionPolicy(pre-turn). The trigger asymmetry is handled internally — the post-turn path usescontextUtilization; the pre-turn path receives onlymessageCount/lastPromptTokens, so the policy storescontextSizeas config to compute utilization.Tests
Tests/ManifoldRuntimeTests/DefaultCompressionPolicyTests.swift— ported from Fireside'sStoryCompressionTests+ P0 extractive-edge-case / anchored-fallback / summarizer-starvation suites, rewritten against[ChatMessage], plus new truncating-strategy,headBudgetFraction, and pre-turn-vs-post-turn trigger tests.Out of scope (deliberately)
postCompresshook (left as the default no-op for consumers to fill)Notes for review
isPinnedresolution againstChatMessage(Fireside'sMessagehadisPinned; check whether MK exposes a pin/kind equivalent or whether v1 treats none-pinned as acceptable with a TODO).🤖 Generated with Claude Code