- 
                Notifications
    You must be signed in to change notification settings 
- Fork 817
feat(openai): Add support for Responses.parse() #3397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Add wrapper for the Responses.parse and it's asynchronous variant - Add test coverage
| Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds instrumentation to wrap OpenAI v1 Responses.parse (sync and async), implements new parse wrappers that capture structured outputs, prompts, tools, reasoning, and usage into spans, updates uninstrumentation to unwrap those methods, and introduces many VCR cassettes plus a comprehensive test suite covering sync/async, tools, reasoning, moderation, fallback, and error paths. Changes
 Sequence Diagram(s)sequenceDiagram
  autonumber
  participant App
  participant SDK as OpenAI SDK
  participant Wrapper as Parse Wrapper
  participant API as OpenAI API
  participant Tracer
  App->>SDK: responses.parse(...) or await responses.parse(...)
  SDK->>Wrapper: invoke wrapped parse
  Wrapper->>Tracer: start span "openai.responses.parse"
  Wrapper->>API: POST /v1/responses
  API-->>Wrapper: response (id, output, usage, reasoning, tools)
  Wrapper->>Wrapper: extract/serialize output_parsed or fallback output_text
  Wrapper->>Tracer: set attributes (prompts, completion, tools, usage, reasoning, response.id)
  Wrapper-->>SDK: return parsed result
  SDK-->>App: parsed result
  alt error
    Wrapper->>Tracer: record exception & error attributes
    Wrapper-->>SDK: re-raise error
  end
  note right of Wrapper: Async path mirrors sync with await points
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
 Suggested reviewers
 Poem
 Pre-merge checks and finishing touches❌ Failed checks (1 warning)
 ✅ Passed checks (2 passed)
 ✨ Finishing touches
 🧪 Generate unit tests (beta)
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Changes requested ❌
Reviewed everything up to 7c47e06 in 1 minute and 31 seconds. Click for details.
- Reviewed 2427lines of code in15files
- Skipped 0files when reviewing.
- Skipped posting 3draft comments. View those below.
- Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.
1. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:144
- Draft comment:
 The global variable 'responses' is used to cache or store traced data. In a concurrent/multithreaded environment, this shared mutable state might raise thread-safety issues or lead to memory leaks if the dictionary is never cleared. Consider using proper synchronization or an eviction strategy.
- Reason this comment was not posted:
 Comment was not on a location in the diff, so it can't be submitted as a review comment.
2. packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py:832
- Draft comment:
 When handling structured outputs via 'output_parsed', you try to dump via 'json.dumps(model_as_dict(parsed_output))'. Consider adding more robust error handling and possibly logging in this fallback, in case unexpected data types are encountered.
- Reason this comment was not posted:
 Confidence changes required:50%<= threshold50%None
3. packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py:54
- Draft comment:
 Test assertions check for a hard‐coded response model value (e.g. 'gpt-4.1-nano-2025-04-14'). Ensure this value is stable or abstract it to a configuration, so future changes to the underlying API versioning don't cause fragile tests.
- Reason this comment was not posted:
 Confidence changes required:33%<= threshold50%None
Workflow ID: wflow_CW82AsLtFkBsiQfa
You can customize  by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.
|  | ||
| @dont_throw | ||
| @_with_tracer_wrapper | ||
| def responses_parse_wrapper(tracer: Tracer, wrapped, instance, args, kwargs): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new 'responses_parse_wrapper' (and its async version) essentially duplicates much of the logic from other wrappers (e.g. responses_get_or_create_wrapper). Consider refactoring the common logic into a helper to reduce duplication and ease maintenance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
♻️ Duplicate comments (6)
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_basic.yaml (1)
88-113: Same organization/project ID concern as other cassettes.This cassette contains the same potentially sensitive
openai-organizationandopenai-projectidentifiers flagged in test_responses_parse_response_id.yaml. Ensure consistent scrubbing across all cassettes.packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_basic.yaml (1)
88-113: Same organization/project ID concern.This async cassette contains the same potentially sensitive identifiers. Ensure scrubbing is applied consistently across both sync and async test fixtures.
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_message_history.yaml (1)
88-115: Same organization/project ID concern.Consistent with other cassettes in this PR, ensure the organization and project identifiers are scrubbed.
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_token_usage.yaml (1)
88-113: Same organization/project ID concern.Ensure consistent scrubbing of sensitive identifiers across all cassettes including this token usage test fixture.
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_tools.yaml (1)
95-120: Same organization/project ID concern.This tools-focused cassette should have the same scrubbing applied to organization and project identifiers.
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_message_history.yaml (1)
90-115: Same organization/project ID concern.Final cassette in this set should also have organization and project identifiers scrubbed consistently with the others.
🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py (1)
323-328: Assert the precise exception type.Catching a blanket
Exceptionin tests hides regressions. Please assert the concrete OpenAI error (openai.AuthenticationError) so the test proves we surface the right failure.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (15)
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py(4 hunks)
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py(4 hunks)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_basic.yaml(1 hunks)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_message_history.yaml(1 hunks)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_tools.yaml(1 hunks)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_basic.yaml(1 hunks)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_moderation.yaml(1 hunks)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_output_fallback.yaml(1 hunks)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_response_id.yaml(1 hunks)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_token_usage.yaml(1 hunks)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_instructions.yaml(1 hunks)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_message_history.yaml(1 hunks)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_reasoning.yaml(1 hunks)
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_tools.yaml(1 hunks)
- packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/cassettes/**/*.{yaml,yml,json}
📄 CodeRabbit inference engine (CLAUDE.md)
Never commit secrets or PII in VCR cassettes; scrub sensitive data
Files:
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_tools.yaml
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_with_message_history.yaml
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_basic.yaml
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_token_usage.yaml
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_tools.yaml
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_moderation.yaml
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_response_id.yaml
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_instructions.yaml
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_output_fallback.yaml
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_message_history.yaml
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_async_responses_parse_basic.yaml
- packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_responses_parse/test_responses_parse_with_reasoning.yaml
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules
Files:
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py
- packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py
- packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
🧬 Code graph analysis (3)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (2)
async_responses_parse_wrapper(889-1026)
responses_parse_wrapper(745-884)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (1)
is_reasoning_supported(25-30)packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (3)
export(45-51)
InMemorySpanExporter(22-61)
get_finished_spans(40-43)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/utils.py (2)
dont_throw(132-160)
_with_tracer_wrapper(116-123)
🪛 Ruff (0.13.3)
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses_parse.py
38-38: Unused function argument: instrument_legacy
(ARG001)
76-76: Unused function argument: instrument_legacy
(ARG001)
125-125: Unused function argument: instrument_legacy
(ARG001)
160-160: Unused function argument: instrument_legacy
(ARG001)
213-213: Unused function argument: instrument_legacy
(ARG001)
245-245: Unused function argument: instrument_legacy
(ARG001)
276-276: Unused function argument: instrument_legacy
(ARG001)
318-318: Unused function argument: instrument_legacy
(ARG001)
323-323: Do not assert blind exception: Exception
(B017)
349-349: Unused function argument: instrument_legacy
(ARG001)
356-356: Do not assert blind exception: Exception
(B017)
382-382: Unused function argument: instrument_legacy
(ARG001)
405-405: Unused function argument: instrument_legacy
(ARG001)
441-441: Unused function argument: instrument_legacy
(ARG001)
470-470: Unused function argument: instrument_legacy
(ARG001)
495-495: Unused function argument: instrument_legacy
(ARG001)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
745-745: Unused function argument: instance
(ARG001)
796-796: Do not catch blind exception: Exception
(BLE001)
837-838: try-except-pass detected, consider logging the exception
(S110)
837-837: Do not catch blind exception: Exception
(BLE001)
843-844: try-except-pass detected, consider logging the exception
(S110)
843-843: Do not catch blind exception: Exception
(BLE001)
872-872: Do not catch blind exception: Exception
(BLE001)
890-890: Unused function argument: instance
(ARG001)
938-938: Do not catch blind exception: Exception
(BLE001)
979-980: try-except-pass detected, consider logging the exception
(S110)
979-979: Do not catch blind exception: Exception
(BLE001)
985-986: try-except-pass detected, consider logging the exception
(S110)
985-985: Do not catch blind exception: Exception
(BLE001)
1014-1014: Do not catch blind exception: Exception
(BLE001)
🔇 Additional comments (4)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/__init__.py (4)
33-36: LGTM!The imports for the new parse wrappers are correctly added and follow the established pattern for other wrapper imports in this file.
314-318: LGTM!The instrumentation correctly wraps
Responses.parsewith the new wrapper, using_try_wrapfor backward compatibility with older OpenAI SDK versions.
334-338: LGTM!The async variant instrumentation is correctly implemented, mirroring the sync wrapper pattern and ensuring compatibility across SDK versions.
365-369: LGTM!Uninstrumentation correctly unwraps both sync and async parse methods, ensuring clean teardown when the instrumentation is disabled.
feat(instrumentation): ...orfix(instrumentation): ....Screenshots of traces in Jaeger (content removed as I've tried this in a real project)
Important
Adds support for
Responses.parse()in OpenAI instrumentation with synchronous and asynchronous wrappers, and comprehensive test coverage.responses_parse_wrapperandasync_responses_parse_wrapperinresponses_wrappers.pyto handle structured outputs inResponses.parse()._instrument()and_uninstrument()in__init__.pyto wrapResponses.parseandAsyncResponses.parse.test_responses_parse.pyforResponses.parse()covering basic, message history, moderation, tools, reasoning, exceptions, output fallback, instructions, token usage, and response ID scenarios.Responses.parse()scenarios.This description was created by for 7c47e06. You can customize this summary. It will automatically update as commits are pushed.
 for 7c47e06. You can customize this summary. It will automatically update as commits are pushed.
Summary by CodeRabbit
New Features
Tests