Skip to content

Conversation

@mgzb
Copy link

@mgzb mgzb commented Oct 4, 2025

  • Add wrapper for the Responses.parse and it's asynchronous variant
  • Add test coverage
  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Screenshots of traces in Jaeger (content removed as I've tried this in a real project)

Screenshot 2025-10-04 at 01 02 42 Screenshot 2025-10-04 at 01 04 50

Important

Adds support for Responses.parse() in OpenAI instrumentation with synchronous and asynchronous wrappers, and comprehensive test coverage.

  • Behavior:
    • Adds responses_parse_wrapper and async_responses_parse_wrapper in responses_wrappers.py to handle structured outputs in Responses.parse().
    • Updates _instrument() and _uninstrument() in __init__.py to wrap Responses.parse and AsyncResponses.parse.
  • Tests:
    • Adds tests in test_responses_parse.py for Responses.parse() covering basic, message history, moderation, tools, reasoning, exceptions, output fallback, instructions, token usage, and response ID scenarios.
    • Adds YAML files for VCR cassettes to test various Responses.parse() scenarios.

This description was created by Ellipsis for 7c47e06. You can customize this summary. It will automatically update as commits are pushed.

Summary by CodeRabbit

  • New Features

    • Enhanced OpenAI instrumentation to trace response parsing for both sync and async flows.
    • Captures richer telemetry: system/user prompts, completions, structured outputs (with fallback), tool calls, reasoning details, token usage, and response IDs.
    • Improved error reporting and proper cleanup when instrumentation is removed.
  • Tests

    • Added comprehensive cassettes and test suite covering basic flows, message history, tools, moderation, reasoning, output fallback, token usage, response ID correlation, async variants, and error scenarios.

- Add wrapper for the Responses.parse and it's asynchronous variant
- Add test coverage
@CLAassistant
Copy link

CLAassistant commented Oct 4, 2025

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link

coderabbitai bot commented Oct 4, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds instrumentation to wrap OpenAI v1 Responses.parse (sync and async), implements new parse wrappers that capture structured outputs, prompts, tools, reasoning, and usage into spans, updates uninstrumentation to unwrap those methods, and introduces many VCR cassettes plus a comprehensive test suite covering sync/async, tools, reasoning, moderation, fallback, and error paths.

Changes

Cohort / File(s) Summary
Instrumentation hooks
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py
Imports parse wrappers and wraps Responses.parse and AsyncResponses.parse via _try_wrap; adds corresponding unwrap calls in _uninstrument.
Response parse wrappers
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
Adds responses_parse_wrapper and async variants; starts spans, records exceptions and attributes (prompts, outputs, tools, reasoning, usage), serializes structured outputs, merges traced data, adds ResponseOutputMessageParamWithoutId type under RESPONSES_AVAILABLE, and adjusts set_data_attributes retrieval.
Test cassettes (responses.parse)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/*
Adds multiple VCR cassettes for many scenarios (basic, async basic, message history, tools, moderation, reasoning, instructions, token usage, response id, output fallback) to exercise parsing and metadata.
Tests
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py
New comprehensive test suite validating spans/attributes and behaviors for sync/async parsing, tools, moderation, reasoning, fallbacks, token usage, response ID, and error paths.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App
  participant SDK as OpenAI SDK
  participant Wrapper as Parse Wrapper
  participant API as OpenAI API
  participant Tracer

  App->>SDK: responses.parse(...) or await responses.parse(...)
  SDK->>Wrapper: invoke wrapped parse
  Wrapper->>Tracer: start span "openai.responses.parse"
  Wrapper->>API: POST /v1/responses
  API-->>Wrapper: response (id, output, usage, reasoning, tools)
  Wrapper->>Wrapper: extract/serialize output_parsed or fallback output_text
  Wrapper->>Tracer: set attributes (prompts, completion, tools, usage, reasoning, response.id)
  Wrapper-->>SDK: return parsed result
  SDK-->>App: parsed result

  alt error
    Wrapper->>Tracer: record exception & error attributes
    Wrapper-->>SDK: re-raise error
  end

  note right of Wrapper: Async path mirrors sync with await points
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • nirga
  • dinmukhamedm

Poem

A rabbit taps keys with a hop and a cheer,
Wrapping parse calls so the traces appear.
Spans gather tokens, tools, and reasoned light,
Async or sync, they record day and night.
Carrots for tests, and traces tucked tight. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title “feat(openai): Add support for Responses.parse()” clearly summarizes the primary change of the pull request by indicating that support for the Responses.parse() method is being added within the OpenAI instrumentation. It follows conventional commit guidelines, is concise and specific, and directly reflects the core feature introduced in the changeset. A teammate scanning the history would immediately understand the main purpose of the PR from this title.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Changes requested ❌

Reviewed everything up to 7c47e06 in 1 minute and 31 seconds. Click for details.
  • Reviewed 2427 lines of code in 15 files
  • Skipped 0 files when reviewing.
  • Skipped posting 3 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:144
  • Draft comment:
    The global variable 'responses' is used to cache or store traced data. In a concurrent/multithreaded environment, this shared mutable state might raise thread-safety issues or lead to memory leaks if the dictionary is never cleared. Consider using proper synchronization or an eviction strategy.
  • Reason this comment was not posted:
    Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:832
  • Draft comment:
    When handling structured outputs via 'output_parsed', you try to dump via 'json.dumps(model_as_dict(parsed_output))'. Consider adding more robust error handling and possibly logging in this fallback, in case unexpected data types are encountered.
  • Reason this comment was not posted:
    Confidence changes required: 50% <= threshold 50% None
3. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py:54
  • Draft comment:
    Test assertions check for a hard‐coded response model value (e.g. 'gpt-4.1-nano-2025-04-14'). Ensure this value is stable or abstract it to a configuration, so future changes to the underlying API versioning don't cause fragile tests.
  • Reason this comment was not posted:
    Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_CW82AsLtFkBsiQfa

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.


@dont_throw
@_with_tracer_wrapper
def responses_parse_wrapper(tracer: Tracer, wrapped, instance, args, kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new 'responses_parse_wrapper' (and its async version) essentially duplicates much of the logic from other wrappers (e.g. responses_get_or_create_wrapper). Consider refactoring the common logic into a helper to reduce duplication and ease maintenance.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (6)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_basic.yaml (1)

88-113: Same organization/project ID concern as other cassettes.

This cassette contains the same potentially sensitive openai-organization and openai-project identifiers flagged in test_responses_parse_response_id.yaml. Ensure consistent scrubbing across all cassettes.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_basic.yaml (1)

88-113: Same organization/project ID concern.

This async cassette contains the same potentially sensitive identifiers. Ensure scrubbing is applied consistently across both sync and async test fixtures.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_message_history.yaml (1)

88-115: Same organization/project ID concern.

Consistent with other cassettes in this PR, ensure the organization and project identifiers are scrubbed.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_token_usage.yaml (1)

88-113: Same organization/project ID concern.

Ensure consistent scrubbing of sensitive identifiers across all cassettes including this token usage test fixture.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_tools.yaml (1)

95-120: Same organization/project ID concern.

This tools-focused cassette should have the same scrubbing applied to organization and project identifiers.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_message_history.yaml (1)

90-115: Same organization/project ID concern.

Final cassette in this set should also have organization and project identifiers scrubbed consistently with the others.

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py (1)

323-328: Assert the precise exception type.

Catching a blanket Exception in tests hides regressions. Please assert the concrete OpenAI error (openai.AuthenticationError) so the test proves we surface the right failure.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e66894f and 7c47e06.

📒 Files selected for processing (15)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (4 hunks)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (4 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_basic.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_message_history.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_tools.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_basic.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_moderation.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_output_fallback.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_response_id.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_token_usage.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_instructions.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_message_history.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_reasoning.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_tools.yaml (1 hunks)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/cassettes/**/*.{yaml,yml,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Never commit secrets or PII in VCR cassettes; scrub sensitive data

Files:

  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_tools.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_message_history.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_basic.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_token_usage.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_tools.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_moderation.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_response_id.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_instructions.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_output_fallback.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_message_history.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_basic.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_reasoning.yaml
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
🧬 Code graph analysis (3)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)
  • async_responses_parse_wrapper (889-1026)
  • responses_parse_wrapper (745-884)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)
  • is_reasoning_supported (25-30)
packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (3)
  • export (45-51)
  • InMemorySpanExporter (22-61)
  • get_finished_spans (40-43)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (2)
  • dont_throw (132-160)
  • _with_tracer_wrapper (116-123)
🪛 Ruff (0.13.3)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py

38-38: Unused function argument: instrument_legacy

(ARG001)


76-76: Unused function argument: instrument_legacy

(ARG001)


125-125: Unused function argument: instrument_legacy

(ARG001)


160-160: Unused function argument: instrument_legacy

(ARG001)


213-213: Unused function argument: instrument_legacy

(ARG001)


245-245: Unused function argument: instrument_legacy

(ARG001)


276-276: Unused function argument: instrument_legacy

(ARG001)


318-318: Unused function argument: instrument_legacy

(ARG001)


323-323: Do not assert blind exception: Exception

(B017)


349-349: Unused function argument: instrument_legacy

(ARG001)


356-356: Do not assert blind exception: Exception

(B017)


382-382: Unused function argument: instrument_legacy

(ARG001)


405-405: Unused function argument: instrument_legacy

(ARG001)


441-441: Unused function argument: instrument_legacy

(ARG001)


470-470: Unused function argument: instrument_legacy

(ARG001)


495-495: Unused function argument: instrument_legacy

(ARG001)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

745-745: Unused function argument: instance

(ARG001)


796-796: Do not catch blind exception: Exception

(BLE001)


837-838: try-except-pass detected, consider logging the exception

(S110)


837-837: Do not catch blind exception: Exception

(BLE001)


843-844: try-except-pass detected, consider logging the exception

(S110)


843-843: Do not catch blind exception: Exception

(BLE001)


872-872: Do not catch blind exception: Exception

(BLE001)


890-890: Unused function argument: instance

(ARG001)


938-938: Do not catch blind exception: Exception

(BLE001)


979-980: try-except-pass detected, consider logging the exception

(S110)


979-979: Do not catch blind exception: Exception

(BLE001)


985-986: try-except-pass detected, consider logging the exception

(S110)


985-985: Do not catch blind exception: Exception

(BLE001)


1014-1014: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (4)

33-36: LGTM!

The imports for the new parse wrappers are correctly added and follow the established pattern for other wrapper imports in this file.


314-318: LGTM!

The instrumentation correctly wraps Responses.parse with the new wrapper, using _try_wrap for backward compatibility with older OpenAI SDK versions.


334-338: LGTM!

The async variant instrumentation is correctly implemented, mirroring the sync wrapper pattern and ensuring compatibility across SDK versions.


365-369: LGTM!

Uninstrumentation correctly unwraps both sync and async parse methods, ensuring clean teardown when the instrumentation is disabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants