feat(openai): Add support for Responses.parse() #3397

mgzb · 2025-10-04T00:10:47Z

Add wrapper for the Responses.parse and it's asynchronous variant
Add test coverage

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Screenshots of traces in Jaeger (content removed as I've tried this in a real project)

Important

Adds support for Responses.parse() in OpenAI instrumentation with synchronous and asynchronous wrappers, and comprehensive test coverage.

Behavior:
- Adds responses_parse_wrapper and async_responses_parse_wrapper in responses_wrappers.py to handle structured outputs in Responses.parse().
- Updates _instrument() and _uninstrument() in __init__.py to wrap Responses.parse and AsyncResponses.parse.
Tests:
- Adds tests in test_responses_parse.py for Responses.parse() covering basic, message history, moderation, tools, reasoning, exceptions, output fallback, instructions, token usage, and response ID scenarios.
- Adds YAML files for VCR cassettes to test various Responses.parse() scenarios.

^{This description was created by}^{for 7c47e06. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Enhanced OpenAI instrumentation to trace response parsing for both sync and async flows.
- Captures richer telemetry: system/user prompts, completions, structured outputs (with fallback), tool calls, reasoning details, token usage, and response IDs.
- Improved error reporting and proper cleanup when instrumentation is removed.
Tests
- Added comprehensive cassettes and test suite covering basic flows, message history, tools, moderation, reasoning, output fallback, token usage, response ID correlation, async variants, and error scenarios.

- Add wrapper for the Responses.parse and it's asynchronous variant - Add test coverage

CLAassistant · 2025-10-04T00:10:53Z

All committers have signed the CLA.

coderabbitai · 2025-10-04T00:10:59Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds instrumentation to wrap OpenAI v1 Responses.parse (sync and async), implements new parse wrappers that capture structured outputs, prompts, tools, reasoning, and usage into spans, updates uninstrumentation to unwrap those methods, and introduces many VCR cassettes plus a comprehensive test suite covering sync/async, tools, reasoning, moderation, fallback, and error paths.

Changes

Cohort / File(s)	Summary
Instrumentation hooks `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py`	Imports parse wrappers and wraps `Responses.parse` and `AsyncResponses.parse` via `_try_wrap`; adds corresponding `unwrap` calls in `_uninstrument`.
Response parse wrappers `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py`	Adds `responses_parse_wrapper` and async variants; starts spans, records exceptions and attributes (prompts, outputs, tools, reasoning, usage), serializes structured outputs, merges traced data, adds `ResponseOutputMessageParamWithoutId` type under `RESPONSES_AVAILABLE`, and adjusts `set_data_attributes` retrieval.
Test cassettes (responses.parse) `packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/*`	Adds multiple VCR cassettes for many scenarios (basic, async basic, message history, tools, moderation, reasoning, instructions, token usage, response id, output fallback) to exercise parsing and metadata.
Tests `packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py`	New comprehensive test suite validating spans/attributes and behaviors for sync/async parsing, tools, moderation, reasoning, fallbacks, token usage, response ID, and error paths.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App
  participant SDK as OpenAI SDK
  participant Wrapper as Parse Wrapper
  participant API as OpenAI API
  participant Tracer

  App->>SDK: responses.parse(...) or await responses.parse(...)
  SDK->>Wrapper: invoke wrapped parse
  Wrapper->>Tracer: start span "openai.responses.parse"
  Wrapper->>API: POST /v1/responses
  API-->>Wrapper: response (id, output, usage, reasoning, tools)
  Wrapper->>Wrapper: extract/serialize output_parsed or fallback output_text
  Wrapper->>Tracer: set attributes (prompts, completion, tools, usage, reasoning, response.id)
  Wrapper-->>SDK: return parsed result
  SDK-->>App: parsed result

  alt error
    Wrapper->>Tracer: record exception & error attributes
    Wrapper-->>SDK: re-raise error
  end

  note right of Wrapper: Async path mirrors sync with await points

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fix(openai): dynamically import types for 1.99 #3244 — Modifies OpenAI v1 responses_wrappers and parse instrumentation; likely overlaps with parse wrapper changes and attribute handling.

Suggested reviewers

nirga
dinmukhamedm

Poem

A rabbit taps keys with a hop and a cheer,
Wrapping parse calls so the traces appear.
Spans gather tokens, tools, and reasoned light,
Async or sync, they record day and night.
Carrots for tests, and traces tucked tight. 🥕✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title “feat(openai): Add support for Responses.parse()” clearly summarizes the primary change of the pull request by indicating that support for the `Responses.parse()` method is being added within the OpenAI instrumentation. It follows conventional commit guidelines, is concise and specific, and directly reflects the core feature introduced in the changeset. A teammate scanning the history would immediately understand the main purpose of the PR from this title.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 7c47e06 in 1 minute and 31 seconds. Click for details.

Reviewed 2427 lines of code in 15 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:144

Draft comment:
The global variable 'responses' is used to cache or store traced data. In a concurrent/multithreaded environment, this shared mutable state might raise thread-safety issues or lead to memory leaks if the dictionary is never cleared. Consider using proper synchronization or an eviction strategy.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:832

Draft comment:
When handling structured outputs via 'output_parsed', you try to dump via 'json.dumps(model_as_dict(parsed_output))'. Consider adding more robust error handling and possibly logging in this fallback, in case unexpected data types are encountered.
Reason this comment was not posted:
Confidence changes required: 50% <= threshold 50% None

3. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py:54

Draft comment:
Test assertions check for a hard‐coded response model value (e.g. 'gpt-4.1-nano-2025-04-14'). Ensure this value is stable or abstract it to a configuration, so future changes to the underlying API versioning don't cause fragile tests.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_CW82AsLtFkBsiQfa

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

ellipsis-dev · 2025-10-04T00:12:23Z

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py


+@dont_throw
+@_with_tracer_wrapper
+def responses_parse_wrapper(tracer: Tracer, wrapped, instance, args, kwargs):


The new 'responses_parse_wrapper' (and its async version) essentially duplicates much of the logic from other wrappers (e.g. responses_get_or_create_wrapper). Consider refactoring the common logic into a helper to reduce duplication and ease maintenance.

coderabbitai

Actionable comments posted: 7

♻️ Duplicate comments (6)

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_basic.yaml (1)

88-113: Same organization/project ID concern as other cassettes.

This cassette contains the same potentially sensitive openai-organization and openai-project identifiers flagged in test_responses_parse_response_id.yaml. Ensure consistent scrubbing across all cassettes.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_basic.yaml (1)

88-113: Same organization/project ID concern.

This async cassette contains the same potentially sensitive identifiers. Ensure scrubbing is applied consistently across both sync and async test fixtures.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_message_history.yaml (1)

88-115: Same organization/project ID concern.

Consistent with other cassettes in this PR, ensure the organization and project identifiers are scrubbed.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_token_usage.yaml (1)

88-113: Same organization/project ID concern.

Ensure consistent scrubbing of sensitive identifiers across all cassettes including this token usage test fixture.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_tools.yaml (1)

95-120: Same organization/project ID concern.

This tools-focused cassette should have the same scrubbing applied to organization and project identifiers.

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_message_history.yaml (1)

90-115: Same organization/project ID concern.

Final cassette in this set should also have organization and project identifiers scrubbed consistently with the others.

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py (1)

323-328: Assert the precise exception type.

Catching a blanket Exception in tests hides regressions. Please assert the concrete OpenAI error (openai.AuthenticationError) so the test proves we surface the right failure.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e66894f and 7c47e06.

📒 Files selected for processing (15)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (4 hunks)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (4 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_basic.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_message_history.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_tools.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_basic.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_moderation.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_output_fallback.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_response_id.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_token_usage.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_instructions.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_message_history.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_reasoning.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_tools.yaml (1 hunks)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/cassettes/**/*.{yaml,yml,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Never commit secrets or PII in VCR cassettes; scrub sensitive data

Files:

packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_tools.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_message_history.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_basic.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_token_usage.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_tools.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_moderation.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_response_id.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_instructions.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_output_fallback.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_message_history.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_basic.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_reasoning.yaml

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

🧬 Code graph analysis (3)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)

async_responses_parse_wrapper (889-1026)

responses_parse_wrapper (745-884)

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)

is_reasoning_supported (25-30)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (3)

export (45-51)

InMemorySpanExporter (22-61)

get_finished_spans (40-43)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (2)

dont_throw (132-160)

_with_tracer_wrapper (116-123)

🪛 Ruff (0.13.3)

packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py

38-38: Unused function argument: instrument_legacy

(ARG001)

76-76: Unused function argument: instrument_legacy

(ARG001)

125-125: Unused function argument: instrument_legacy

(ARG001)

160-160: Unused function argument: instrument_legacy

(ARG001)

213-213: Unused function argument: instrument_legacy

(ARG001)

245-245: Unused function argument: instrument_legacy

(ARG001)

276-276: Unused function argument: instrument_legacy

(ARG001)

318-318: Unused function argument: instrument_legacy

(ARG001)

323-323: Do not assert blind exception: Exception

(B017)

349-349: Unused function argument: instrument_legacy

(ARG001)

356-356: Do not assert blind exception: Exception

(B017)

382-382: Unused function argument: instrument_legacy

(ARG001)

405-405: Unused function argument: instrument_legacy

(ARG001)

441-441: Unused function argument: instrument_legacy

(ARG001)

470-470: Unused function argument: instrument_legacy

(ARG001)

495-495: Unused function argument: instrument_legacy

(ARG001)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

745-745: Unused function argument: instance

(ARG001)

796-796: Do not catch blind exception: Exception

(BLE001)

837-838: try-except-pass detected, consider logging the exception

(S110)

837-837: Do not catch blind exception: Exception

(BLE001)

843-844: try-except-pass detected, consider logging the exception

(S110)

843-843: Do not catch blind exception: Exception

(BLE001)

872-872: Do not catch blind exception: Exception

(BLE001)

890-890: Unused function argument: instance

(ARG001)

938-938: Do not catch blind exception: Exception

(BLE001)

979-980: try-except-pass detected, consider logging the exception

(S110)

979-979: Do not catch blind exception: Exception

(BLE001)

985-986: try-except-pass detected, consider logging the exception

(S110)

985-985: Do not catch blind exception: Exception

(BLE001)

1014-1014: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (4)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (4)

33-36: LGTM!

The imports for the new parse wrappers are correctly added and follow the established pattern for other wrapper imports in this file.

314-318: LGTM!

The instrumentation correctly wraps Responses.parse with the new wrapper, using _try_wrap for backward compatibility with older OpenAI SDK versions.

334-338: LGTM!

The async variant instrumentation is correctly implemented, mirroring the sync wrapper pattern and ensuring compatibility across SDK versions.

365-369: LGTM!

Uninstrumentation correctly unwraps both sync and async parse methods, ensuring clean teardown when the instrumentation is disabled.

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

...penai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_tools.yaml

...tion-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_moderation.yaml

...openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_output_fallback.yaml

...ion-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_response_id.yaml

...enai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_instructions.yaml

...-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_reasoning.yaml

feat(openai): Add support for Responses.parse()

7c47e06

- Add wrapper for the Responses.parse and it's asynchronous variant - Add test coverage

ellipsis-dev bot reviewed Oct 4, 2025

View reviewed changes

coderabbitai bot reviewed Oct 4, 2025

View reviewed changes

mgzb added 2 commits October 4, 2025 17:19

fix(openai): Handle None output_text in responses_parse_wrapper

d47eed3

fix(openai): Redacted potential secrets in cassettes

21b48ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(openai): Add support for Responses.parse() #3397

feat(openai): Add support for Responses.parse() #3397

Uh oh!

mgzb commented Oct 4, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Oct 4, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 4, 2025 •

edited

Loading

Other AI code review bot(s) detected

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

ellipsis-dev bot Oct 4, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(openai): Add support for Responses.parse() #3397

Are you sure you want to change the base?

feat(openai): Add support for Responses.parse() #3397

Uh oh!

Conversation

mgzb commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

CLAassistant commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

ellipsis-dev bot Oct 4, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mgzb commented Oct 4, 2025 •

edited

Loading

CLAassistant commented Oct 4, 2025 •

edited

Loading

coderabbitai bot commented Oct 4, 2025 •

edited

Loading