fix(bedrock): Add prompt caching support for Converse API #3390

AlanPonnachan · 2025-09-19T14:51:07Z

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

Description

This PR introduces prompt caching telemetry for the AWS Bedrock Converse and Converse Stream APIs, bringing feature parity with the existing invoke_model instrumentation.

The Converse API reports caching information in the usage field of the response body, rather than through HTTP headers. This implementation adds the necessary logic to parse this information and record it as metrics and span attributes.

Changes include:

New function prompt_caching_converse_handling in prompt_caching.py to extract cache_read_input_tokens and cache_creation_input_tokens from the response body.
Integration into __init__.py: The new function is now called from _handle_converse and _handle_converse_stream to process caching data for both standard and streaming calls.
New Test File: Added test_bedrock_converse_prompt_caching_metrics.py to validate that the gen_ai.prompt.caching metric is correctly emitted for the Converse API.

Fixes #3337

Important

Adds prompt caching telemetry for AWS Bedrock Converse APIs, including new function for caching data extraction and corresponding tests.

Behavior:
- Adds prompt_caching_converse_handling in prompt_caching.py to extract caching data from Converse API response body.
- Integrates prompt_caching_converse_handling into _handle_converse and _handle_converse_stream in __init__.py.
Testing:
- Adds test_bedrock_converse_prompt_caching_metrics.py to validate gen_ai.prompt.caching metric emission for Converse API.
Misc:
- Fixes 🚀 Feature: Add prompt caching for Bedrock Converse #3337.

^{This description was created by}^{for 4fa3792. You can customize this summary. It will automatically update as commits are pushed.}

Summary by CodeRabbit

New Features
- Prompt-caching detection for Converse (streaming and non‑streaming), recording cache read/write state and token-level usage into telemetry and metrics.
Tests
- Added tests validating cache write and cache read scenarios, span attributes, and aggregated token metrics.
- Added a test cassette exercising a Converse error response path.

CLAassistant · 2025-09-19T14:51:20Z

All committers have signed the CLA.

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to 4fa3792 in 2 minutes and 2 seconds. Click for details.

Reviewed 156 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 2 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py:359

Draft comment:
Good integration of prompt_caching_converse_handling in _handle_converse. In the streaming handler (lines ~400), note that if both read and write tokens are present, the span attribute may be overwritten. Ensure this is the intended behavior.
Reason this comment was not posted:
Comment was on unchanged code.

2. packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py:56

Draft comment:
The test correctly validates prompt caching metrics for Converse API. The cumulative workaround for metric values indicates the underlying counter is cumulative. Consider resetting metrics between tests to avoid cross-test interference if possible.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% The comment has two parts: 1) An observation about the cumulative nature of the metrics which is already documented in the code comments, and 2) A speculative suggestion about resetting metrics that isn't clearly necessary since the current approach works. The comment doesn't identify any actual problems or required changes. The suggestion about resetting metrics could be valid if there's evidence of cross-test interference, but we don't see any such evidence. The current workaround seems intentional and functional. Since the current approach is working and documented, and there's no evidence of actual problems, the suggestion is more speculative than necessary. Delete the comment as it's primarily informative/observational and makes a speculative suggestion without clear evidence of need for change.

Workflow ID: wflow_bNUeXv3pUdPPxbhz

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

...pentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py

…nstrumentation/bedrock/prompt_caching.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

nirga

hey @AlanPonnachan - looks like tests are failing, can you take a look?

….com/AlanPonnachan/openllmetry into feat-bedrock-converse-prompt-caching

AlanPonnachan · 2025-09-19T21:05:47Z

Hi @nirga

I’ve resolved the lint test failures. The remaining failing test, test_prompt_cache_converse, is expected since it requires a VCR cassette to be recorded.

As I don’t have access to an active AWS account, I’m unable to generate the test_prompt_cache_converse.yaml cassette file myself. Would you be able to check out this branch, run the test and push the generated cassette file to this PR?

Thanks for your help!

nirga

Sure @AlanPonnachan, will do it - can you fix the small comment I wrote? I'll then run it locally and record a test. BTW - if you can rely on existing converse tests it might be easier

...pentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py

AlanPonnachan · 2025-09-21T06:24:50Z

Thanks for the great suggestion and for your willingness to help record the test!

I agree that relying on an existing test is a cleaner approach. Before I push the changes, I just want to confirm my plan sounds good to you.

Here is what I am planning to do:

Modify the Existing Test: I will update the test_titan_converse function in tests/traces/test_titan.py.
Enable Caching: I'll add the additionalModelRequestFields with cacheControl to the existing brt.converse API call.
Test Both Scenarios: I will add a second brt.converse call within that same test to ensure we cover both the initial "cache-write" and the subsequent "cache-read".
Add Assertions: I will add the metric assertions I wrote to validate that the prompt caching counters are working correctly.
Clean Up: Finally, I will delete the new test file I originally created (test_bedrock_converse_prompt_caching_metrics.py).

This will result in the cassette for test_titan_converse.yaml needing to be re-recorded, as you mentioned.

Does this plan look good? If so, I'll go ahead and make the changes.

AlanPonnachan · 2025-10-28T14:57:39Z

@nirga any update on this?

coderabbitai · 2025-10-29T11:55:45Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds response/metadata-based prompt caching handling for Bedrock Converse and invokes it from both synchronous and streaming converse flows; introduces a new handler that reads cache read/write token counts from responses/metadata, sets span attributes, and emits cache metrics when enabled.

Changes

Cohort / File(s)	Summary
Instrumentation entrypoints `packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py`	Imported `prompt_caching_converse_handling` (and `prompt_caching_handling`) and added calls to `prompt_caching_converse_handling(...)` in `_handle_converse()` and in the final-message path of `_handle_converse_stream()` after guardrail handling.
Prompt caching logic `packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py`	Added `prompt_caching_converse_handling(response, vendor, model, metric_params)` which extracts cache read/write token counts from a response or metadata object, verifies current span presence and recording, sets `CACHED` span attribute and `gen_ai.usage.*` token attributes, and emits cache read/write metrics when configured.
Tests & cassettes `packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py`, `packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py`, `packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml`	Added metric helper functions and `test_anthropic_converse_with_caching` to validate caching read/write behaviour (note: duplicate test definition present); minor formatting tweak in `test_titan.py`; added a Titan Converse cassette capturing a 403 auth-failure response for converse-with-caching scenario.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant BedrockInstrumentor
    participant BedrockAPI
    participant PromptCacheHandler
    participant Span
    participant Metrics

    Client->>BedrockInstrumentor: invoke_converse(...)
    BedrockInstrumentor->>BedrockAPI: call Converse (sync or stream)
    BedrockAPI-->>BedrockInstrumentor: response / streamed events (include usage_metadata or metadata)

    alt sync final response
      BedrockInstrumentor->>PromptCacheHandler: prompt_caching_converse_handling(response, vendor, model, metric_params)
    else streaming final message
      BedrockInstrumentor->>PromptCacheHandler: prompt_caching_converse_handling(metadata, vendor, model, metric_params)
    end

    rect #EEF8FF
      Note over PromptCacheHandler: extract cache_read/cache_creation\nset span attrs and gen_ai.usage.* token attrs
      PromptCacheHandler->>Span: set CACHED ("read"/"write") and token attributes
    end

    alt cache_read > 0
      PromptCacheHandler->>Metrics: emit cache read metric
    end
    alt cache_creation > 0
      PromptCacheHandler->>Metrics: emit cache write metric
    end

    BedrockInstrumentor-->>Client: return response / stream end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Verify response/metadata key paths (e.g., usage_metadata, input_token_details) match Bedrock Converse shapes.
Confirm span existence and is_recording checks are correct for streaming final-message handling.
Ensure handler is only invoked for final events (not intermediate stream events).
Review the duplicated test definition in test_anthropic.py and cassette expectations.

Suggested reviewers

galkleinman

Poem

🐰 I hopped through responses, counted tokens by light,

read and wrote some markers, and set spans just right.
Metrics took note as the cache did its dance,
traces wore small badges from a fortuitous chance.
Hooray — Converse caching tracked in a joyful prance!

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Out of Scope Changes Check	⚠️ Warning	The changes to `test_anthropic.py` appear to be out of scope for a PR focused on "Add prompt caching support for Converse API" (Bedrock); while they introduce caching test utilities and a new caching test for Anthropic models, they do not directly align with the Bedrock Converse objectives stated in issue #3337. More critically, the summary indicates that `test_anthropic_converse_with_caching` is defined twice (duplicate function definition), which is a clear code error that should not be present in a merged changeset regardless of scope considerations.	Remove the duplicate `test_anthropic_converse_with_caching` function definition from `test_anthropic.py`. Additionally, clarify whether the Anthropic test changes are intentional and necessary for this PR; if they represent a separate feature effort, consider moving them to a separate pull request focused on Anthropic caching to keep this PR focused on its stated Bedrock Converse objectives.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Linked Issues Check	❓ Inconclusive	The code implementation addresses the core requirements from issue #3337 by creating the `prompt_caching_converse_handling` function that extracts cache metrics from the Converse response body, integrating it into both `_handle_converse` and `_handle_converse_stream`, recording cache-related metrics and span attributes, and emitting telemetry to the gen_ai usage attributes. However, the linked requirement to add and validate tests covering both cache-write and cache-read flows is not fully satisfied; the test cassette for `test_titan_converse_with_caching.yaml` contains a 403 authentication error response rather than a successful caching scenario, and reviewers have reported botocore ValidationExceptions indicating the test input contains an extraneous `cacheControl` key that is not permitted by the API, preventing successful validation of the implementation.	The implementation should be validated by running the tests with valid AWS credentials to properly record the VCR cassettes and ensure they capture successful caching scenarios. Additionally, the test input should be corrected to remove or properly format the `cacheControl` parameter to resolve the botocore ValidationException before the PR can be fully verified against the issue requirements.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "fix(bedrock): Add prompt caching support for Converse API" accurately and concisely describes the main objective of the changeset. The primary changes across the codebase—adding the `prompt_caching_converse_handling` function, integrating it into `_handle_converse` and `_handle_converse_stream`, and introducing associated tests—all directly align with this stated goal. The title is specific and clear without unnecessary noise, making it immediately obvious to reviewers what aspect of the Bedrock instrumentation is being enhanced.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 771b5d5 and ea2b0da.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (5)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

PromptCaching (470-472)

packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)

reader (37-41)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

CacheSpanAttrs (9-11)

packages/opentelemetry-instrumentation-bedrock/tests/conftest.py (2)

instrument_legacy (90-102)

brt (36-42)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

🪛 Ruff (0.14.2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

22-22: Create your own exception

(TRY002)

22-22: Avoid specifying long messages outside the exception class

(TRY003)

1113-1113: Unused function argument: instrument_legacy

(ARG001)

🔇 Additional comments (2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (2)

16-44: LGTM! Clear helper functions for metric validation.

The get_metric and assert_metric helper functions are well-structured for validating prompt caching metrics in tests. The logic correctly distinguishes between cache read and write operations and verifies the token counts.

1112-1171: Test implementation looks correct; VCR cassette recording pending.

The test logic is sound and all critical issues from previous reviews have been addressed:

✓ Uses cachePoint blocks in message content (correct API usage)

✓ Uses GenAIAttributes.GEN_AI_REQUEST_MODEL (correct attribute)

✓ Proper indentation and formatting

✓ Correct cache point type "default"

The test validates both cache-write and cache-read flows with appropriate assertions on usage tokens, span attributes, and metrics. According to the PR objectives, the VCR cassette (test_anthropic_converse_with_caching.yaml) still needs to be recorded by a reviewer with AWS credentials.

Note: The static analysis warning about unused instrument_legacy is a false positive—it's a pytest fixture that sets up instrumentation.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

nirga

Sorry about the delay @AlanPonnachan! I was OOO and this fell between the cracks (team missed it for some reason). I'm back now! Left a small comment reg. the test - lmk if you need any assistance

...emetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)
59-85: Span attribute overwrite when both cache operations occur.

If a response contains both cache_read_input_tokens > 0 and cache_creation_input_tokens > 0, the second block (line 82) will overwrite the CACHED attribute set by the first block (line 68). This means only "write" will be recorded on the span, losing the "read" information.

Consider one of these approaches:

Accumulate both values in a single attribute (e.g., "read,write")

Use separate span attributes for read and write operations

Prioritize one operation type if simultaneous occurrence is expected to be rare

Apply this diff to record both operations:
     if read_cached_tokens > 0:
         if metric_params.prompt_caching:
             metric_params.prompt_caching.add(
                 read_cached_tokens,
                 attributes={
                     **base_attrs,
                     CacheSpanAttrs.TYPE: "read",
                 },
             )
-        span.set_attribute(CacheSpanAttrs.CACHED, "read")
+        span.set_attribute(f"{CacheSpanAttrs.CACHED}.read", True)
         span.set_attribute(
             "gen_ai.usage.cache_read_input_tokens", read_cached_tokens
         )
 
     if write_cached_tokens > 0:
         if metric_params.prompt_caching:
             metric_params.prompt_caching.add(
                 write_cached_tokens,
                 attributes={
                     **base_attrs,
                     CacheSpanAttrs.TYPE: "write",
                 },
             )
-        span.set_attribute(CacheSpanAttrs.CACHED, "write")
+        span.set_attribute(f"{CacheSpanAttrs.CACHED}.write", True)
         span.set_attribute(
             "gen_ai.usage.cache_creation_input_tokens", write_cached_tokens
         )

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)
14-43: Consider adding is_recording check for consistency.

The new prompt_caching_converse_handling function includes an is_recording() check (line 52 in the new function), but this existing function doesn't. For consistency and to avoid unnecessary span operations when not recording, consider adding the same check here.

Apply this diff:
 def prompt_caching_handling(headers, vendor, model, metric_params):
     base_attrs = {
         "gen_ai.system": vendor,
         "gen_ai.response.model": model,
     }
     span = trace.get_current_span()
-    if not isinstance(span, trace.Span):
+    if not isinstance(span, trace.Span) or not span.is_recording():
         return

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b01af56 and 277499e.

📒 Files selected for processing (3)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (3 hunks)
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1 hunks)
packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py
packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py

🧬 Code graph analysis (2)

packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py (3)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

PromptCaching (470-472)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

CacheSpanAttrs (9-11)

packages/opentelemetry-instrumentation-bedrock/tests/metrics/conftest.py (1)

test_context (35-47)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (2)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (2)

prompt_caching_converse_handling (46-85)

prompt_caching_handling (14-43)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/guardrail.py (1)

guardrail_converse (153-176)

🪛 Ruff (0.14.2)

packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py

30-30: Create your own exception

(TRY002)

30-30: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (6)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

46-57: Good defensive programming with is_recording check.

The addition of span.is_recording() check and the conditional metric recording (if metric_params.prompt_caching:) are good defensive practices that prevent unnecessary work when telemetry is disabled.

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (3)

25-28: LGTM: Clean import of prompt caching handlers.

The import statements are correctly structured and bring in both the new prompt_caching_converse_handling and the existing prompt_caching_handling functions.

357-361: LGTM: Proper integration in synchronous Converse handler.

The call to prompt_caching_converse_handling is correctly placed after guardrail_converse and receives the appropriate parameters (response, provider, model, metric_params). This mirrors the pattern used in _handle_call (line 332) for the invoke_model API.

399-406: LGTM: Correct metadata extraction for streaming events.

The metadata is properly extracted from the event before being passed to prompt_caching_converse_handling. This ensures that cache metrics are recorded when the final metadata event arrives in the stream.
packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py (2)
33-47: Verify metric assertion logic handles all scenarios.

The current assertion logic iterates through all data points and validates each one individually. However, it doesn't verify that the expected number of data points exist (e.g., exactly one for "read" and one for "write"). This could mask issues where duplicate or missing data points occur.

Consider adding explicit checks:
def assert_metric(reader, usage):
    metrics_data = reader.get_metrics_data()
    resource_metrics = metrics_data.resource_metrics
    assert len(resource_metrics) > 0

    m = get_metric(resource_metrics, PromptCaching.LLM_BEDROCK_PROMPT_CACHING)
    
    read_points = [dp for dp in m.data.data_points if dp.attributes[CacheSpanAttrs.TYPE] == "read"]
    write_points = [dp for dp in m.data.data_points if dp.attributes[CacheSpanAttrs.TYPE] == "write"]
    
    if usage["cache_read_input_tokens"] > 0:
        assert len(read_points) == 1
        assert read_points[0].value == usage["cache_read_input_tokens"]
    
    if usage["cache_creation_input_tokens"] > 0:
        assert len(write_points) == 1
        assert write_points[0].value == usage["cache_creation_input_tokens"]
50-69: Test structure is sound but VCR cassette needs recording.

The test correctly validates both cache creation (first call) and cache reading (second call) scenarios. The cumulative metric workaround at line 68 is necessary due to how the InMemoryMetricReader accumulates counter values across reads.

Note: As mentioned in the PR description, this test requires a VCR cassette to be recorded with actual AWS credentials.

Ensure the VCR cassette is recorded before merging, as the test will fail without it.

coderabbitai · 2025-10-29T12:00:38Z

...emetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py

+def get_metric(resource_metrics, name):
+    for rm in resource_metrics:
+        for sm in rm.scope_metrics:
+            for metric in sm.metrics:
+                if metric.name == name:
+                    return metric
+    raise Exception(f"No metric found with name {name}")


⚠️ Potential issue | 🟡 Minor

Use AssertionError for test helper.

The generic Exception should be replaced with AssertionError since this is a test assertion helper and test failures should propagate as assertion failures.

As per static analysis hints.

Apply this diff:

def get_metric(resource_metrics, name): for rm in resource_metrics: for sm in rm.scope_metrics: for metric in sm.metrics: if metric.name == name: return metric - raise Exception(f"No metric found with name {name}") + raise AssertionError(f"No metric found with name {name}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def get_metric(resource_metrics, name):

for rm in resource_metrics:

for sm in rm.scope_metrics:

for metric in sm.metrics:

if metric.name == name:

return metric

raise Exception(f"No metric found with name {name}")

def get_metric(resource_metrics, name):

for rm in resource_metrics:

for sm in rm.scope_metrics:

for metric in sm.metrics:

if metric.name == name:

return metric

raise AssertionError(f"No metric found with name {name}")

🧰 Tools

🪛 Ruff (0.14.2)

30-30: Create your own exception

(TRY002)

30-30: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents

In packages/opentelemetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py around lines 24 to 30, the helper raises a generic Exception when a metric is not found; change it to raise an AssertionError (or use assert False) with the same message so test failures surface as assertion failures. Update the raise statement accordingly and keep the message f"No metric found with name {name}" intact.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)
16-22: Consider using a more specific exception type.

While the function logic is correct, using a generic Exception is flagged by Ruff. For test utilities, this is acceptable, but you could optionally create a custom exception class or use AssertionError for better clarity in test failures.

Apply this diff if you'd like to improve the exception handling:
-    raise Exception(f"No metric found with name {name}")
+    raise AssertionError(f"No metric found with name {name}")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 277499e and 1965083.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (5)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

PromptCaching (470-472)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

CacheSpanAttrs (9-11)

packages/opentelemetry-instrumentation-bedrock/tests/metrics/conftest.py (1)

test_context (35-47)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-261)

🪛 Ruff (0.14.2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

22-22: Create your own exception

(TRY002)

22-22: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (4)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (4)

12-13: LGTM! Imports are appropriate for caching support.

The new imports are necessary for testing prompt caching functionality and are properly scoped.

25-46: LGTM! Metric assertion logic is sound.

The function correctly handles cumulative metrics by tracking both read and write data points separately and asserting the appropriate one based on the is_read parameter.

1076-1090: Verify VCR cassette recording.

This test requires a VCR cassette to be recorded with actual AWS credentials. According to the PR comments, you've requested that a maintainer with AWS access run and record this cassette. Ensure this is completed before merging.

1077-1130: Comprehensive caching test with good coverage.

The test thoroughly validates both cache write and read scenarios, including:

Usage token assertions from API responses

Span attribute validation for cache state

Metric emission verification

The test structure is clear and follows best practices.

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)

1113-1114: Resolved: replaced hardcoded attr strings with semconv constants.

Good use of SpanAttributes for cache token attributes; previous review concern addressed.

Also applies to: 1121-1122

🧹 Nitpick comments (4)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (4)

12-14: Avoid depending on instrumentation internals in tests.

Importing PromptCaching and CacheSpanAttrs from instrumentation modules couples tests to internal paths. Prefer stable semconv constants or add local fallbacks for attribute keys to reduce breakage risk on refactors.

16-22: Use pytest.fail and drop bare Exception; simplify lookup.

Replace manual loops + bare Exception with next(...) and pytest.fail for clearer failures and to satisfy TRY002/TRY003.

 def get_metric(resource_metrics, name):
-    for rm in resource_metrics:
-        for sm in rm.scope_metrics:
-            for metric in sm.metrics:
-                if metric.name == name:
-                    return metric
-    raise Exception(f"No metric found with name {name}")
+    try:
+        return next(
+            m
+            for rm in resource_metrics
+            for sm in rm.scope_metrics
+            for m in sm.metrics
+            if m.name == name
+        )
+    except StopIteration:
+        pytest.fail(f"Metric '{name}' not found in collected metrics")

25-46: Make metric assertion robust to cumulative/export cycles.

Sum data points by type to tolerate multiple collections and exporter behavior; add clearer empty-metrics assertion.

 def assert_metric(reader, usage, is_read=False):
-    metrics_data = reader.get_metrics_data()
-    resource_metrics = metrics_data.resource_metrics
-    assert len(resource_metrics) > 0
-
-    m = get_metric(resource_metrics, PromptCaching.LLM_BEDROCK_PROMPT_CACHING)
-    # This check is now more specific to handle cumulative metrics
-    found_read = False
-    found_write = False
-    for data_point in m.data.data_points:
-        if data_point.attributes[CacheSpanAttrs.TYPE] == "read":
-            found_read = True
-            assert data_point.value == usage["cache_read_input_tokens"]
-        elif data_point.attributes[CacheSpanAttrs.TYPE] == "write":
-            found_write = True
-            assert data_point.value == usage["cache_creation_input_tokens"]
-    
-    if is_read:
-        assert found_read
-    else:
-        assert found_write
+    metrics_data = reader.get_metrics_data()
+    resource_metrics = metrics_data.resource_metrics
+    assert resource_metrics, "No resource metrics collected"
+
+    m = get_metric(resource_metrics, PromptCaching.LLM_BEDROCK_PROMPT_CACHING)
+    read = sum(
+        dp.value for dp in m.data.data_points
+        if dp.attributes.get(CacheSpanAttrs.TYPE) == "read"
+    )
+    write = sum(
+        dp.value for dp in m.data.data_points
+        if dp.attributes.get(CacheSpanAttrs.TYPE) == "write"
+    )
+    if is_read:
+        assert read == usage["cache_read_input_tokens"]
+    else:
+        assert write == usage["cache_creation_input_tokens"]

1106-1123: Don’t rely on span ordering; select by attribute.

Use CacheSpanAttrs.CACHED to locate write/read spans to avoid flakiness.

-    # Assertions for the first span (cache write)
-    span_write = spans[0]
-    assert span_write.name == "bedrock.converse"
-    attributes_write = span_write.attributes
+    # Assertions for the first span (cache write)
+    span_write = next(s for s in spans if s.attributes.get(CacheSpanAttrs.CACHED) == "write")
+    assert span_write.name == "bedrock.converse"
+    attributes_write = span_write.attributes
@@
-    # Assertions for the second span (cache read)
-    span_read = spans[1]
-    assert span_read.name == "bedrock.converse"
-    attributes_read = span_read.attributes
+    # Assertions for the second span (cache read)
+    span_read = next(s for s in spans if s.attributes.get(CacheSpanAttrs.CACHED) == "read")
+    assert span_read.name == "bedrock.converse"
+    attributes_read = span_read.attributes

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1965083 and 750a59e.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (5)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

PromptCaching (470-472)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

CacheSpanAttrs (9-11)

packages/opentelemetry-instrumentation-bedrock/tests/metrics/conftest.py (1)

test_context (35-47)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-261)

🪛 Ruff (0.14.2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

22-22: Create your own exception

(TRY002)

22-22: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (1)

1124-1130: Metrics assertions look good after helper hardening.

Cumulative usage check for read/write is aligned with counter semantics; ok once assert_metric is updated as suggested.

Optionally assert there are exactly two data points (read/write) for the metric in this test context after moving fixture for isolation.

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

AlanPonnachan · 2025-10-29T16:54:07Z

Hi @nirga , I've updated the PR with the changes we discussed. I moved the logic into a new test in test_titan.py and deleted the old file.

It should be ready for the cassette to be recorded now. Thanks again for your help!

nirga · 2025-10-30T14:43:53Z

Thanks @AlanPonnachan! Can you record the test? poetry run pytest --record-mode=once

AlanPonnachan · 2025-10-30T17:04:40Z

Hi @nirga, thank you again for all your guidance.

I tried to run the recording command locally as you requested. As expected, since I don't have AWS credentials, the test run fails with an UnrecognizedClientException (invalid security token). This confirms that the test is now correctly set up and is just waiting for a real recording to be generated.

I believe the PR is now ready from a code perspective. Would you be able to run the recording on your end when you have a moment?

Thank you so much for your help

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 750a59e and 419ae0f.

📒 Files selected for processing (3)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (3 hunks)
packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml (1 hunks)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/init.py

🧰 Additional context used

📓 Path-based instructions (2)

**/cassettes/**/*.{yaml,yml,json}

📄 CodeRabbit inference engine (CLAUDE.md)

Never commit secrets or PII in VCR cassettes; scrub sensitive data

Files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

🧠 Learnings (1)

📚 Learning: 2025-08-17T15:06:48.109Z

Learnt from: CR
PR: traceloop/openllmetry#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-17T15:06:48.109Z
Learning: Applies to tests/**/*.py : Tests that make API calls must utilize VCR cassettes

Applied to files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (5)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

PromptCaching (470-472)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

CacheSpanAttrs (9-11)

packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)

reader (37-41)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

🪛 Ruff (0.14.2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

22-22: Create your own exception

(TRY002)

22-22: Avoid specifying long messages outside the exception class

(TRY003)

1077-1077: Unused function argument: instrument_legacy

(ARG001)

🔇 Additional comments (4)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (4)

12-13: LGTM!

The imports for PromptCaching and CacheSpanAttrs are properly added to support the caching test scenario.

16-22: Helper function is clear and appropriate for test utilities.

The get_metric helper correctly searches through resource metrics. While static analysis suggests using a custom exception class, a generic Exception is acceptable for test utilities where the primary goal is clear failure messages during test debugging.

25-45: LGTM!

The assert_metric helper correctly validates prompt caching metrics for both read and write scenarios. The logic properly handles cumulative metrics by iterating through all data points and checking that the appropriate cache type was found.

1076-1129: Test logic is well-structured and correctly validates caching behavior.

The test properly exercises both cache-write and cache-read paths:

First call writes to cache (cache_creation_input_tokens > 0)

Second call reads from cache (cache_read_input_tokens > 0)

Span attributes and metrics are validated for both scenarios

Good use of semantic convention constants (SpanAttributes.*) instead of hardcoded strings, addressing previous review feedback.

Note: The instrument_legacy fixture is flagged as unused by static analysis, but it's required for side effects to set up the Bedrock instrumentation before the test runs.

The test will work correctly once the VCR cassette is properly recorded with valid AWS credentials (see cassette file review comment).

coderabbitai · 2025-10-30T17:23:26Z

...trumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml

+interactions:
+- request:
+    body: '{"messages": [{"role": "user", "content": [{"text": "Hello, this is a test
+      prompt for caching."}]}], "inferenceConfig": {"maxTokens": 50}, "additionalModelRequestFields":
+      {"cacheControl": {"type": "ephemeral"}}}'
+    headers:
+      Content-Length:
+      - '211'
+      Content-Type:
+      - !!binary |
+        YXBwbGljYXRpb24vanNvbg==
+      User-Agent:
+      - !!binary |
+        Qm90bzMvMS4zNC4xNDUgbWQvQm90b2NvcmUjMS4zNC4xNDUgdWEvMi4wIG9zL3dpbmRvd3MjMTAg
+        bWQvYXJjaCNhbWQ2NCBsYW5nL3B5dGhvbiMzLjExLjggbWQvcHlpbXBsI0NQeXRob24gY2ZnL3Jl
+        dHJ5LW1vZGUjbGVnYWN5IEJvdG9jb3JlLzEuMzQuMTQ1
+      X-Amz-Date:
+      - !!binary |
+        MjAyNTEwMzBUMTY1MTI4Wg==
+      amz-sdk-invocation-id:
+      - !!binary |
+        ZTFmM2VjZTMtM2VlMC00ZGZiLWE2MWEtYjBiYmU4M2U2NDFm
+      amz-sdk-request:
+      - !!binary |
+        YXR0ZW1wdD0x
+    method: POST
+    uri: https://bedrock-runtime.us-east-1.amazonaws.com/model/amazon.titan-text-express-v1/converse
+  response:
+    body:
+      string: '{"message":"The security token included in the request is invalid."}'
+    headers:
+      Connection:
+      - keep-alive
+      Content-Length:
+      - '68'
+      Content-Type:
+      - application/json
+      Date:
+      - Thu, 30 Oct 2025 16:51:30 GMT
+      x-amzn-ErrorType:
+      - UnrecognizedClientException:http://internal.amazon.com/coral/com.amazon.coral.service/
+      x-amzn-RequestId:
+      - f0fdf86b-d85a-4f16-8713-1b9b5fccbe25
+    status:
+      code: 403
+      message: Forbidden
+version: 1


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Cassette contains authentication failure instead of valid caching response.

The cassette records a 403 Forbidden response with "The security token included in the request is invalid" rather than a successful API response with caching metadata. This means the test test_titan_converse_with_caching will fail because:

Line 1086-1088: Expects usage_write["cache_creation_input_tokens"] > 0, but the response has no usage field

Line 1091-1099: Expects a second successful call with cache read data

Per the PR objectives, you need AWS credentials to record a valid cassette. Ask the maintainer to run:

and commit the generated cassette.

🏁 Script executed:

poetry run pytest packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py::test_titan_converse_with_caching --record-mode=once

Length of output: 44

Once I locate the test file, I'll examine the specific test to verify the assertions mentioned in the review comment. --- 🏁 Script executed: ```shell #!/bin/bash # Find and inspect the test_titan.py file and locate the test_titan_converse_with_caching test fd "test_titan\.py" --type f

Length of output: 77

🏁 Script executed:

#!/bin/bash # Inspect the test function test_titan_converse_with_caching ast-grep --pattern 'def test_titan_converse_with_caching($_) { $$$ }'

Length of output: 0

🏁 Script executed:

#!/bin/bash # Search for the test function and display context rg -n "def test_titan_converse_with_caching" --type py -A 50

Length of output: 6541

Cassette is invalid and will cause test failure; AWS credentials required to regenerate.

The cassette indeed contains a 403 Forbidden response instead of a valid caching response. The test will fail at line 1086 when attempting to access response_write["usage"] from the error response, which lacks this field. The cassette must be regenerated with valid AWS credentials to record a successful response with cache metadata.

🤖 Prompt for AI Agents

packages/opentelemetry-instrumentation-bedrock/tests/traces/cassettes/test_titan/test_titan_converse_with_caching.yaml lines 1-47: the cassette records a 403 UnrecognizedClientException (invalid AWS credentials) instead of a successful converse response, causing tests to crash when they expect response_write["usage"]; regenerate or replace this cassette by re-running the test recording with valid AWS credentials so the cassette contains a successful response that includes cache/usage metadata, and then commit the new cassette (or remove the failing cassette and update the test to use a properly recorded fixture).

nirga · 2025-10-30T19:10:46Z

Thanks @AlanPonnachan, I tried running it locally and the test failed:

FAILED tests/traces/test_titan.py::test_titan_converse_with_caching - botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the Converse operation: The model returned the following errors: Malformed input request: extraneous key [cacheControl] is not permitted, please reformat your input and try again.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (2)
16-22: Consider using a custom exception or a standard assertion.

For test helpers, raising AssertionError or a custom exception would be more idiomatic than a generic Exception.

Apply this diff:
 def get_metric(resource_metrics, name):
     for rm in resource_metrics:
         for sm in rm.scope_metrics:
             for metric in sm.metrics:
                 if metric.name == name:
                     return metric
-    raise Exception(f"No metric found with name {name}")
+    raise AssertionError(f"No metric found with name {name}")
1142-1142: Use constants for cache token attribute names.

Lines 1142 and 1149 use string literals for cache token attributes. The SpanAttributes class defines constants GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS and GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS (see relevant code snippets). Use these constants for consistency and maintainability.

Apply this diff:
-    assert attributes_write["gen_ai.usage.cache_creation_input_tokens"] == usage_write["cache_creation_input_tokens"]
+    assert attributes_write[SpanAttributes.GEN_AI_USAGE_CACHE_CREATION_INPUT_TOKENS] == usage_write["cache_creation_input_tokens"]
-    assert attributes_read["gen_ai.usage.cache_read_input_tokens"] == usage_read["cache_read_input_tokens"]
+    assert attributes_read[SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS] == usage_read["cache_read_input_tokens"]
Also applies to: 1149-1149

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 419ae0f and 440295c.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (2 hunks)
packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py (2 hunks)

✅ Files skipped from review due to trivial changes (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (5)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

PromptCaching (470-472)

packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)

reader (37-41)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

CacheSpanAttrs (9-11)

packages/opentelemetry-instrumentation-bedrock/tests/conftest.py (1)

instrument_legacy (90-102)

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py (1)

SpanAttributes (64-245)

🪛 Ruff (0.14.2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

22-22: Create your own exception

(TRY002)

22-22: Avoid specifying long messages outside the exception class

(TRY003)

1113-1113: Unused function argument: instrument_legacy

(ARG001)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (1)
1131-1136: Remove invalid additionalModelRequestFields and use consistent cache point structure.

Line 1135 uses additionalModelRequestFields={"cacheControl": {"type": "ephemeral"}}, which causes a ValidationException as confirmed in the past review comments and PR discussion. The Bedrock Converse API does not accept cacheControl as a parameter; cache points must be specified within the message content structure.

For the second call to read from the cache created by the first call, it must use the exact same message structure up to the cache point. The current implementation will fail at runtime.

Apply this diff to fix the second call:
     response_read = brt.converse(
         modelId="anthropic.claude-3-haiku-20240307-v1:0",
-        messages=[{"role": "user", "content": [{"text": "Hello, this is a test prompt for caching."}]}],
+        messages=[
+            {
+                "role": "user",
+                "content": [
+                    {"text": "Hello, this is a test prompt for caching."},
+                    {"cachePoint": {"type": "default"}},
+                ],
+            }
+        ],
         inferenceConfig={"maxTokens": 50},
-        additionalModelRequestFields={"cacheControl": {"type": "ephemeral"}},
     )

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (1)
16-22: Consider a custom exception or using a standard exception type.

The function raises a generic Exception with a formatted message. For better error handling and consistency with Python best practices, consider using a more specific exception type such as ValueError or KeyError, or define a custom exception class.

Apply this diff to use a more specific exception:
-    raise Exception(f"No metric found with name {name}")
+    raise ValueError(f"No metric found with name {name}")

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 440295c and a5563d5.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (5)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

PromptCaching (470-472)

packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)

reader (37-41)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

CacheSpanAttrs (9-11)

packages/opentelemetry-instrumentation-bedrock/tests/conftest.py (2)

instrument_legacy (90-102)

brt (36-42)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

🪛 Flake8 (7.3.0)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

[error] 1115-1115: continuation line missing indentation or outdented

(E122)

[error] 1116-1116: continuation line missing indentation or outdented

(E122)

[error] 1125-1125: continuation line missing indentation or outdented

(E122)

[error] 1126-1126: continuation line missing indentation or outdented

(E122)

🪛 Ruff (0.14.2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

22-22: Create your own exception

(TRY002)

22-22: Avoid specifying long messages outside the exception class

(TRY003)

1113-1113: Unused function argument: instrument_legacy

(ARG001)

🔇 Additional comments (4)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (4)

13-13: LGTM!

The import is correctly used in the assert_metric helper function.

25-44: LGTM!

The helper correctly validates cache metrics by checking data point attributes and values for both read and write cache types.

1113-1113: Note: The instrument_legacy parameter is not unused.

Static analysis flags instrument_legacy as unused, but this is a false positive. The parameter is a pytest fixture that performs instrumentation setup as a side effect, following the standard pytest fixture pattern. The fixture's presence in the parameter list is necessary for the test to function correctly.

1141-1163: No action required—cache implementation and assertions are already correct.

The cache implementation has already been fixed correctly in the code:

Line 1121 uses the correct cachePoint method with the "Correct method" comment

Line 1135 properly uses additionalModelRequestFields={"cacheControl": {"type": "ephemeral"}}

The prompt_caching_converse_handling() function in prompt_caching.py correctly extracts cache tokens and sets span attributes

The assertions at lines 1141-1163 are properly aligned with the implementation, checking for the expected cache attributes and token values

The test is decorated with @pytest.mark.vcr and VCR cassettes exist and are properly configured

The test structure, assertions, and implementation are all consistent and correct. No modifications are needed.

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a5563d5 and 771b5d5.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Store API keys only in environment variables/secure vaults; never hardcode secrets in code
Use Flake8 for code linting and adhere to its rules

Files:

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

🧬 Code graph analysis (1)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (5)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/__init__.py (1)

PromptCaching (470-472)

packages/opentelemetry-instrumentation-milvus/tests/conftest.py (1)

reader (37-41)

packages/opentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py (1)

CacheSpanAttrs (9-11)

packages/opentelemetry-instrumentation-bedrock/tests/conftest.py (2)

instrument_legacy (90-102)

brt (36-42)

packages/traceloop-sdk/traceloop/sdk/utils/in_memory_span_exporter.py (1)

get_finished_spans (40-43)

🪛 Ruff (0.14.2)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

22-22: Create your own exception

(TRY002)

22-22: Avoid specifying long messages outside the exception class

(TRY003)

1113-1113: Unused function argument: instrument_legacy

(ARG001)

🔇 Additional comments (5)

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py (5)

13-14: LGTM! Import is properly added.

The PromptCaching import is correctly used in the assert_metric helper to validate cache metrics.

16-22: LGTM! Helper function is correct.

The function correctly iterates through resource metrics to find a metric by name. The use of a generic Exception is acceptable for test utility code, though a custom exception type could improve clarity.

25-44: LGTM! Metric assertion logic is sound.

The function correctly validates cache read/write data points by iterating through metrics, checking the cache type attribute, and asserting the expected value matches the usage data.

1114-1130: LGTM! Cache write flow is correctly implemented.

The first call properly uses {"cachePoint": {"type": "default"}} in the message content to mark a cache checkpoint, which aligns with AWS Bedrock Converse API documentation. The assertions correctly validate that cache creation occurred without any cache reads.

1142-1164: LGTM! Span and metric assertions are well-structured.

The test correctly validates:

Both spans use the expected model ID via GenAIAttributes.GEN_AI_REQUEST_MODEL (properly fixed from past review comments)

Cache attributes (CacheSpanAttrs.CACHED) distinguish write vs. read operations

Cache token counts in span attributes match the response usage data

Metrics are emitted for both cache read and write flows

The assertions will work correctly once the cache read call (lines 1132-1140) is fixed.

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py

AlanPonnachan · 2025-11-02T17:07:19Z

Hi @nirga,

Thank you so much for running the test and providing that error!

I did some research and found that, in general, newer Amazon Titan models do support prompt caching. However, you are right that the specific model in the original test, amazon.titan-text-express-v1, does not support it, which perfectly explains the error.

To ensure the test is stable and reliable, I have implemented the fix we discussed:

I reverted all changes to test_titan.py to ensure it remains focused on Guardrails.
I moved the caching test logic into a new function, test_anthropic_converse_with_caching, within the correct test_anthropic.py file.
The new test uses a supported Anthropic Claude 3 model and the correct API structure.

Would you be able to run the recording on your end when you have a moment?

AlanPonnachan added 3 commits September 19, 2025 20:02

add converse handling function in prompt_caching

ff8e343

update init

83bdb0b

add test

4fa3792

ellipsis-dev bot reviewed Sep 19, 2025

View reviewed changes

...pentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py Outdated Show resolved Hide resolved

Update packages/opentelemetry-instrumentation-bedrock/opentelemetry/i…

c8abf60

…nstrumentation/bedrock/prompt_caching.py Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

nirga changed the title ~~feat(bedrock): Add prompt caching support for Converse API~~ fix(bedrock): Add prompt caching support for Converse API Sep 19, 2025

nirga reviewed Sep 19, 2025

View reviewed changes

AlanPonnachan added 2 commits September 20, 2025 01:49

correct lint test

7d65ff8

Merge branch 'feat-bedrock-converse-prompt-caching' of https://github…

6cf3366

….com/AlanPonnachan/openllmetry into feat-bedrock-converse-prompt-caching

nirga reviewed Sep 19, 2025

View reviewed changes

...pentelemetry-instrumentation-bedrock/opentelemetry/instrumentation/bedrock/prompt_caching.py Show resolved Hide resolved

add token data

a3c94e3

Merge branch 'main' into feat-bedrock-converse-prompt-caching

277499e

nirga reviewed Oct 29, 2025

View reviewed changes

...emetry-instrumentation-bedrock/tests/metrics/test_bedrock_converse_prompt_caching_metrics.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Oct 29, 2025

View reviewed changes

AlanPonnachan and others added 2 commits October 29, 2025 20:55

Merge branch 'main' into feat-bedrock-converse-prompt-caching

979db5d

add real test

1965083

coderabbitai bot reviewed Oct 29, 2025

View reviewed changes

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py Outdated Show resolved Hide resolved

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py Outdated Show resolved Hide resolved

refactoring

750a59e

coderabbitai bot reviewed Oct 29, 2025

View reviewed changes

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_titan.py Outdated Show resolved Hide resolved

AlanPonnachan and others added 2 commits October 30, 2025 22:27

record test

d4c2ca6

Merge branch 'main' into feat-bedrock-converse-prompt-caching

419ae0f

coderabbitai bot reviewed Oct 30, 2025

View reviewed changes

add test in test_anthropic.py

440295c

coderabbitai bot reviewed Nov 2, 2025

View reviewed changes

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py Outdated Show resolved Hide resolved

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py Outdated Show resolved Hide resolved

resolve issues

a5563d5

coderabbitai bot reviewed Nov 2, 2025

View reviewed changes

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py Outdated Show resolved Hide resolved

resolve issues

771b5d5

coderabbitai bot reviewed Nov 2, 2025

View reviewed changes

packages/opentelemetry-instrumentation-bedrock/tests/traces/test_anthropic.py Show resolved Hide resolved

fix issues

ea2b0da

fix(bedrock): Add prompt caching support for Converse API #3390

Are you sure you want to change the base?

fix(bedrock): Add prompt caching support for Converse API #3390

Conversation

AlanPonnachan commented Sep 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Uh oh!

CLAassistant commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

AlanPonnachan commented Sep 19, 2025

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AlanPonnachan commented Sep 21, 2025

Uh oh!

AlanPonnachan commented Oct 28, 2025

Uh oh!

coderabbitai bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

nirga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AlanPonnachan commented Oct 29, 2025

Uh oh!

nirga commented Oct 30, 2025

Uh oh!

AlanPonnachan commented Oct 30, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

nirga commented Oct 30, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

AlanPonnachan commented Sep 19, 2025 •

edited by coderabbitai bot

Loading

CLAassistant commented Sep 19, 2025 •

edited

Loading

coderabbitai bot commented Oct 29, 2025 •

edited

Loading