feat(events): IOJournalRecorder context manager (M3 / slice 2 of #517)#534
Conversation
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
5d7eb35for PR #534
Review record:
f5b49864-a309-4821-97c0-b595356c7b8e
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/events/io_recorder.py:348 | BLOCKING | _append() swallows every append failure, and record_llm_call() / record_tool_call() persist the start and return events as two independent best-effort writes. If the first append fails and the second later succeeds, the journal will contain an orphaned *.returned event with no matching request/start payload. That breaks the pairing invariant this module documents and defeats the “reconstructable from the journal alone” contract on transient EventStore failures. The current tests only cover “both appends fail” and miss this partial-write case. |
Non-blocking Suggestions
None.
Design Notes
The event factories are straightforward and the recorder API is a reasonable adapter boundary, but the persistence strategy undermines the journal’s core reliability goal. Pair integrity needs to be preserved explicitly, or partial persistence needs to be surfaced and handled rather than silently ignored.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
Follow-up architectural decisions for #517 slices 3–7This PR (#534) is the shared primitive every per-adapter migration consumes — This PR (slice 2) is purely additive — it ships the recorder helper plus tests. The decisions below land in subsequent slices, not in this PR. Slice 5 — Claude Code adapter (
|
| Slice | Hard dependency | Open architectural decisions |
|---|---|---|
| 5 (Claude Code) | #532 + #534 | (1) call boundary (subprocess vs turn), (2) prompt hash payload, (3) token-count parsing |
| 6 (Codex CLI) | #532 + #534 + #505 | (1) runtime_profile model_id, (2) streaming vs snapshot |
| 7 (Gemini CLI) | #532 + #534 | (1) finish_reason opacity (already locked), (2) multimodal serialisation |
| 8 (OpenCode) | #532 + #534 | (1) plugin vs subprocess wrap, (2) caller string convention |
| 9 (MCP tool dispatch) | #532 + #534 | (1) wrap layer (handler / bridge), (2) tool_name taxonomy, (3) args sanitisation policy |
| 10 (acceptance) | All of 1–9 | (1) fixture shape, (2) assertion shape |
This PR (slice 2) does not require any of the above to be decided yet. The recorder is intentionally provider-agnostic; each adapter slice picks its answers locally.
If maintainers prefer this folded into the PR body itself rather than a follow-up comment, happy to push an edit.
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
118e19bfor PR #534
Review record:
f101ba22-486a-4f8a-bd86-803e1878efcd
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/events/io_recorder.py:199 | BLOCKING | IOJournalRecorder shapes previews before calling the event factories, but each factory reshapes the same field again (src/ouroboros/events/io.py:282, 335, 390, 455). That breaks the recorder contract in two production cases: a caller-supplied preview_cap above 256 is silently truncated back to the factory default, and an explicit recorder privacy override like privacy=PrivacyMode.ON is ignored if the process env is OUROBOROS_IO_JOURNAL_PREVIEWS=redacted because the factory applies env-based redaction a second time. The current tests only cover the privacy=OFF case, which passes because None survives the second pass, so this regression would ship unnoticed. |
Non-blocking Suggestions
None.
Design Notes
The split between pure event factories and an async recorder helper is sensible and keeps adapter integrations small. The main architectural flaw is that preview/privacy policy is implemented at both layers, which creates conflicting sources of truth.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
|
Autopilot follow-up: pushed review-addressing commits and confirmed the GitHub check suite is green on the latest head. Current merge state is clean; remaining CHANGES_REQUESTED status appears to be awaiting fresh bot/human re-review rather than failing CI. |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
9461344for PR #534
Review record:
ff043e7d-fc6d-4cb2-98cd-74dd62e7f634
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/events/io_recorder.py:222 | BLOCKING | is_error is derived only from an exception escaping the context manager, but the yielded record objects do not let callers explicitly mark a handled failure. That means any adapter/tool dispatcher that catches an exception, receives a provider-level error response, or otherwise continues after a failed call will still emit llm.call.returned / tool.call.returned with is_error=False (and for LLM there is no error_kind field at all). Since this recorder is the shared primitive for later adapter migrations, it will silently misclassify real failed calls as successes and break the journal’s stated goal of explaining retries from persisted events alone. |
Non-blocking Suggestions
None.
Design Notes
The split between pure event factories and the async recorder helper is clean, and the privacy/hash/timing concerns are well isolated. The main gap is contract expressiveness: the recorder currently models only “success” or “uncaught exception,” which is too narrow for a foundation layer that future adapters are supposed to share.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
|
Autopilot follow-up: latest head 4b4370f includes explicit handled-error recording support (record_error / is_error / error_kind) with local recorder tests passing. Requesting fresh bot review on current head. |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
d65f11dfor PR #534
Review record:
ea93fada-cc2c-4c4a-b8aa-1399463980e4
Blocking Findings
No in-scope blocking findings remained after policy filtering.
Follow-up Findings
src/ouroboros/events/io_recorder.py:331[warning]record_tool_call's exception path is part of the recorder's core contract, but this PR never tests that a raised tool exception both propagates and still emitstool.call.returnedwithis_error=Trueanderror_kind. Because the tool path is duplicated rather than shared with the LLM path, this can regress silently and leave the journal unable to explain tool-call failures, which is exactly the M3 invariant this primitive is supposed to guarantee.
Non-blocking Suggestions
| 1 | src/ouroboros/events/io_recorder.py:108 | Refactoring | record_error() on both record handle types cannot attach extra metadata, even though the returned events support extra. If adapters later need structured handled-error context, callers will have to mutate .extra directly. |
Design Notes
The slice is well-factored overall: schema shaping lives in events/io.py, and the recorder cleanly centralizes hashing, privacy, timing, and best-effort persistence. The main design risk is drift between the duplicated LLM and tool recorder paths, which makes mirrored failure-path tests especially important.
Policy Notes
- No in-scope blocking findings remained after policy filtering; downgraded verdict accordingly.
Recovery Notes
First recoverable review artifact generated from codex analysis log.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
Q00
left a comment
There was a problem hiding this comment.
Reviewed IOJournalRecorder. The context-manager API keeps adapter integrations small, pairs started/returned events cleanly, and handles error paths without creating orphan returned events.
d65f11d to
0b4da0b
Compare
Summary
Second slice of #517: introduces the
IOJournalRecorderasync context manager that wraps an outbound LLM or tool call and emits the pairedstarted/returnedjournal events with sharedcall_id, duration, hashing, privacy-aware preview shaping, and exception capture.The recorder makes per-adapter wiring a 4-line addition rather than duplicating factory + hashing + privacy + timing boilerplate at every call site. Subsequent slices (Anthropic / LiteLLM / Claude Code / Codex CLI / Gemini CLI / OpenCode / MCP tool dispatch) just need to pass a recorder where appropriate and call
record_completion/record_resultinside the context block.Usage example
Design choices baked in
event_store=Noneproduces a no-op recorder that yields the same shape but emits nothing. Adapters that have not yet adopted the journal passNoneand continue to work unchanged.call_id, timing (duration_ms), prompt/result hashing (sha256:), and privacy-aware preview shaping all happen inside the recorder. Callers only provide payload text and metadata.*.returnedevent withis_error=Trueand the exception type name inerror_kind. Projections see the failure rather than a half-open call.append()raises) does NOT propagate the failure. The journal stays out of the way; LLM/tool calls never fail because the recorder could not persist.event_store. Protocol with oneappend(event)coroutine. No import fromouroboros.persistence, so the helper is cheap to test with a list-backed fake.Changes
src/ouroboros/events/io_recorder.py— new module, ~330 LOC.tests/unit/events/test_io_recorder.py— 8 cases.Verification
uv run ruff check src/ouroboros/events/io_recorder.py tests/unit/events/test_io_recorder.pyuv run ruff format ...uv run pytest tests/unit/events/test_io_recorder.py tests/unit/events/test_io_events.pyPre-merge checklist
record_llm_callemits paired requested/returned with sharedcall_idis_error=True+error_kindpopulated, exception re-raisedrecord_tool_callmirrors the LLM shape for tool dispatchevent_store=None→ no events emitted (recorder is idle)Post-merge checklist
record_tool_callRollback
The change is purely additive: a new module + tests. No existing behaviour or schema is touched. Rollback steps:
Stack: depends on #532.