Improve OpenInference compliance#1
Conversation
- Add agent.name on AGENT spans (root + subagent) - Add user.id from sender context - Add metadata JSON attribute (channel, trigger, agentId, sessionKey) - Fix root span output.value: extract plain text instead of raw JSON - Add llm.invocation_parameters on LLM spans - Add llm.token_count.completion_details.reasoning for thinking models - Add stale trace cleanup with configurable timeout/interval - Delete dead code (src/service.ts, src/service/hooks/*.ts)
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
exiao
left a comment
There was a problem hiding this comment.
PR Review: improve/openinference-v2
Reviewed diff, full index.ts, improvement plan, and live Phoenix trace output.
✅ All 9 improvements confirmed implemented
[1] openinference.span.kind on TOOL spans
Correct. before_tool_call passes it inside the attributes object to startChildSpan — this is fine since startChildSpan spreads those as top-level span attributes. AGENT and LLM kinds also set correctly.
[3] user.id from sender
Implemented. Tries agentCtx.senderId ?? agentCtx.userId ?? agentCtx.senderName in that order. Stored in activeTraces state for lifecycle continuity.
[4] metadata JSON attribute
Implemented in agent_end queueMicrotask. Builds { channel, trigger, agentId, sessionKey } and sets as JSON.stringify.
[5] Root span output.value plain text
Implemented. Correctly extracts text from current.output.output (set by llm_output handler) first, with fallback to parsing event.messages content arrays. No more raw JSON blobs.
[6] agent.name on AGENT spans
Set to serviceName ("openclaw") at root span creation in llm_input.
[7] Dead code deleted
Confirmed: src/service/hooks/ directory is gone, src/service.ts deleted. Remaining files (otel-bridge.ts, payload-sanitizer.ts, helpers.ts, constants.ts) are all still used by index.ts.
[8] Stale trace cleanup
staleSweepTimer declared in register(), interval started in service.start(), cleared in service.stop(). Default 5m timeout / 60s sweep. Ends orphaned spans and marks root as ERROR before deleting.
[9] Reasoning token count
event.usage.reasoning != null guard with llm.token_count.completion_details.reasoning.
[10] llm.invocation_parameters
Checks event.invocationParams ?? event.params, only sets if non-empty object. Correct — not fabricated.
🐛 Bugs
1. Redundant metadata.channel attribute (minor)
In agent_end, line ~329 sets:
if (current.channelId) current.rootSpan.setAttribute("metadata.channel", current.channelId);Then ~10 lines later, the metadata JSON object also includes channel. The standalone metadata.channel attribute is noise — remove it.
2. Missing output.mime_type in tool error path (minor)
In after_tool_call, the error branch sets output.value as a JSON blob ({ error: ... }) but doesn't set output.mime_type = "application/json". The success branch does set it. Fix: move output.mime_type outside the if/else or add it to the error branch.
⚠️ Observation: Subagent spans not created
activeTraces still holds a subagentSpans: Map and the stale cleanup iterates it — but there are no before_subagent / after_subagent hooks in the new index.ts. The deleted src/service/hooks/subagent.ts previously handled these.
If subagent tracing was intentionally cut from this PR (scope), fine — but worth noting the map is maintained and cleaned up without ever being populated. Either add a comment clarifying this is a future scope item, or remove the subagentSpans map from the trace state entirely to avoid confusion.
📡 Phoenix traces confirmed flowing
CLI check returned recent traces with correct span hierarchy (AGENT → LLM → TOOL). The agent.name = "openclaw" attribute appears on root spans in Phoenix. user.id and metadata appear on sessions where agentCtx exposes sender info.
Summary
9/9 improvements implemented. Two minor bugs worth fixing before merge (the redundant attribute and missing mime_type). The subagent spans omission should be acknowledged in the PR description. No type errors or architectural issues found. Good cleanup.
…me_type to error path, clarify subagentSpans
Changes
Verified traces flowing to Phoenix Cloud with new attributes.