feat(providers): wire IOJournalRecorder into LiteLLMAdapter (M3 / slice 4 of #517)#536
Conversation
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
b3e0d23for PR #536
Review record:
17d15d2f-f002-4456-a20e-708346b884f5
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/events/io_recorder.py:348 | BLOCKING | _append() now swallows every event_store.append() failure without logging, but EventStore.append() does not log generic insert failures before raising ([src/ouroboros/persistence/event_store.py:239]). In production this makes journal loss completely silent: recorder misconfiguration, uninitialized stores, or DB write failures will drop both *.requested and *.returned events with no signal to operators, which violates the PR’s own “record every outbound call” contract and makes outages very hard to diagnose. |
Non-blocking Suggestions
None.
Design Notes
The overall shape is reasonable: the event factories are clean, the recorder keeps adapter call sites small, and the provider integrations preserve legacy behavior when no recorder is supplied. The main design issue is observability of failure paths: the recorder is intentionally best-effort, but the current implementation turns persistence failures into silent data loss.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
a3fc1effor PR #536
Review record:
c5d890ed-d890-4115-908c-1142aaee682d
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/events/io.py:282 | BLOCKING | The event factories re-run shape_preview() on values that IOJournalRecorder has already shaped, so recorder-level overrides are not actually honored. In record_llm_call() / record_tool_call() the preview is first shaped with privacy=self.privacy and the caller’s preview_cap / preview_hard_cap, but create_*_event() immediately reshapes it again with factory defaults and env-derived privacy. That means privacy=PrivacyMode.ON is defeated whenever OUROBOROS_IO_JOURNAL_PREVIEWS=off, privacy=REDACTED can be dropped the same way, and any non-default preview cap is silently truncated back to the factory default (256 for LLM, 256 then 1024-hard-cap path for tools). The new API therefore produces wrong persisted payloads for valid recorder configurations. |
Non-blocking Suggestions
None.
Design Notes
The recorder/factory split is reasonable and keeps provider integrations small, but the current boundary is leaky: both layers own preview shaping, which creates configuration drift and makes the recorder’s explicit knobs unreliable. One layer needs to be the single authority for privacy/truncation.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
|
Autopilot follow-up: pushed review-addressing commits and confirmed the GitHub check suite is green on the latest head. Current merge state is clean; remaining CHANGES_REQUESTED status appears to be awaiting fresh bot/human re-review rather than failing CI. |
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
394795ffor PR #536
Review record:
6d2dfa88-24f5-4933-9fdf-7aa59457309a
Blocking Findings
No in-scope blocking findings remained after policy filtering.
Non-blocking Suggestions
None.
Design Notes
The change is well-contained: the new event factories and recorder centralize hashing, privacy, timing, and pairing semantics, and the Anthropic/LiteLLM integrations stay additive behind an opt-in recorder dependency. I did not find a diff-scoped functional regression or contract violation in the changed paths.
Recovery Notes
First recoverable review artifact generated from codex analysis log.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: REQUEST_CHANGES
Reviewing commit
135a044for PR #536
Review record:
610a6833-e8ce-4571-83ea-127f636077d4
Blocking Findings
| # | File:Line | Severity | Finding |
|### Recovery Notes
First recoverable review artifact generated from codex analysis log.
---|-----------|----------|---------|
| 1 | src/ouroboros/providers/litellm_adapter.py:395 | BLOCKING | The recorder captures the raw LiteLLM response before _parse_response() runs, so the journal can diverge from the actual CompletionResponse returned by the adapter. In this file _parse_response() truncates oversized completions to MAX_LLM_RESPONSE_LENGTH, but _record_litellm_completion() hashes/previews the untruncated text first. In production that means llm.call.returned can persist a different completion body/hash than the application actually consumed, which breaks the “reconstructable from the journal alone” contract and can leak more text into the journal than the adapter exposes. |
| 2 | src/ouroboros/providers/litellm_adapter.py:42 | BLOCKING | The new LiteLLM prompt-hash serialization only includes message role/content, while the actual outbound request also varies on top_p, stop, and response_format (see _build_completion_kwargs()). Two materially different requests can therefore emit the same prompt_hash and nearly identical llm.call.requested metadata, making the journal unable to distinguish or replay why behavior changed between runs. Anthropic’s integration already includes request options in its serialized/hashable payload; LiteLLM needs the same treatment to satisfy the journal’s stated reproducibility contract. |
Non-blocking Suggestions
None.
Design Notes
The recorder abstraction itself is clean and the adapter integration is appropriately opt-in. The main design problem is consistency: the Anthropic path journals a normalized, reconstructable request/response shape, while the LiteLLM path currently records a less faithful view of both the request inputs and the returned content.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
There was a problem hiding this comment.
Review — ouroboros-agent[bot]
Verdict: APPROVE
Reviewing commit
365b52cfor PR #536
Review record:
df0871bc-80cc-41a0-9725-78bbedc8432e
Blocking Findings
No in-scope blocking findings remained after policy filtering.
Non-blocking Suggestions
None.
Design Notes
The split between pure event factories, the async IOJournalRecorder, and thin provider wiring is sound and keeps the new journal path isolated from core request logic. Diff-scoped tests cover the main success and error paths well; I could not execute them in this environment because pytest is not installed in the snapshot.
Recovery Notes
First recoverable review artifact generated from codex analysis log.
Reviewed by ouroboros-agent[bot] via Codex deep analysis
Q00
left a comment
There was a problem hiding this comment.
Reviewed the LiteLLMAdapter IOJournalRecorder wiring. The integration is optional, keeps current behavior when disabled, and records the important request/response/error boundaries.
365b52c to
2434f8b
Compare
Summary
Mirrors #535 for
LiteLLMAdapter. Constructor accepts optionalio_recorderkwarg; when provided, every retry attempt of_raw_completeis wrapped inrecorder.record_llm_call(...)so each attempt produces its own pairedllm.call.requested/llm.call.returnedevents. Failures across retries appear as distinct attempts in the journal.Changes
src/ouroboros/providers/litellm_adapter.pyio_recorderkwarg (defaultNone).complete()wraps_raw_completeinside its existing retry loop only when a recorder is configured._serialise_messages_for_hash,_record_litellm_completion) keep the wiring tidy and graceful around missingusagefields.tests/unit/providers/test_litellm_adapter_io_recorder.py— 9 cases.Verification
uv run ruff check ...uv run ruff format ...uv run pytest test_litellm_adapter_io_recorder.py test_anthropic_adapter_io_recorder.pyPre-merge checklist
io_recorderkwarg; legacy callers unchangedreturnedwithis_error=True+error_kindresponse.usageis missing (token counts default to None)Post-merge checklist
record_tool_callRollback
Constructor kwarg is optional with
Nonedefault. Pure additive change; legacy behaviour preserved.Stack: depends on #535 → #534 → #532.