feat: add durable identifiers to messages#2836
Draft
opieter-aws wants to merge 1 commit into
Draft
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
83fae70 to
fe16a56
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Messages do not carry a durable identity. The only per-message tracking available today is ephemeral: the memory
ExtractionCoordinator's in-session high-water-mark sequence number, andSessionMessage.message_id— an ordinal index that is not stable across conversation-manager truncation or session restore. Neither survives as a durable key, so a memory store has no way to build a(session_id, message_id)tuple to deduplicate extracted messages across sessions.This change gives every message a durable, stable
idin both SDKs. The id is assigned once, when a message is added to the conversation, and is preserved everywhere that message is later observed —MessageAddedEventsubscribers, session persistence, and snapshots. It is never sent to model providers (the existing role/content whitelist already strips everything else). Memory stores can now combine this id with a session id to identify a message uniquely across restarts.Assignment happens at append time rather than at construction. This keeps it idempotent — a message that already has an id (restored from a session, supplied by a caller, or re-appended) keeps it — which is what makes the id stable through a save/restore cycle. It also means messages that were persisted before this change are left without an id rather than being silently backfilled with a fresh one on each load.
Both SDKs generate a canonical (hyphenated) UUID v4: Python via
str(uuid.uuid4()), TypeScript viacrypto.randomUUID(). The shapes match deliberately, so a message id means the same thing regardless of which SDK produced it.Wiring the id into the memory extraction pipeline (so stores receive it for deduplication) is intentionally out of scope here; this PR only establishes the durable id on the
Messagetype so that work can build on it.This is a coordinated change across both SDKs to keep the
Messageshape consistent. Python and TypeScript are kept behaviorally identical: assign at the append chokepoint, preserve on redaction, exclude from provider payloads, and no backfill of legacy messages.Public API Changes
Messagegains an optional, durableid.Python — new
idfield on theMessageTypedDict, plus a null-safe accessor:TypeScript — new optional
idonMessageDataand theMessageclass; it round-trips throughtoJSON/fromJSON/clone:Both fields are optional and backward compatible: existing code that constructs messages without an id is unaffected, and messages persisted before this change deserialize with no id. The id is assigned by the agent when a message is appended, so a caller-supplied id (e.g. on input messages) is always preserved.
Related Issues
Resolves: #2805
Documentation PR
No documentation changes. The id is assigned automatically and is primarily a building block for upcoming memory-store deduplication; there is no new user-facing workflow to document yet.
Type of Change
New feature
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce new warnings.
hatch run prepareBeyond the unit suites (both SDKs green), I exercised the change end to end: ran a real agent turn and confirmed every recorded message has a unique id, then persisted through a
FileSessionManagerand restored into a fresh agent, confirming the restored ids match the persisted ones. Regression tests assert the id never reaches the model-provider payload — in Python at thestream_messageswhitelist, and in TypeScript at the Anthropic adapter's request formatting (where TS does its stripping). Note:hatch run prepare's static-analysis step couldn't bootstrap locally due to an unrelated native build failure in the optional[cedar]extra (rustc version); ruff/mypy on the changed files and the full test suites pass, and CI runs the gate cleanly.Checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.