Update model-latency-benchmarking: latest SDK/models and configurable prompt caching

## Summary

Proposal to update the [`model-latency-benchmarking`](https://github.com/aws-samples/amazon-bedrock-samples/tree/main/model-latency-benchmarking) tool so it uses current SDK/APIs, benchmarks the latest Bedrock models, and can benchmark prompt caching.

## Motivation

- The example dataset and model list point at older models (Llama 3.1 70B, Claude 3.5 Haiku).
- Bedrock now offers newer models (Claude 4.5 family, Amazon Nova) that users want to benchmark.
- Prompt caching is generally available and reduces latency by up to 85%, but the tool cannot benchmark it today.

## Proposed changes

1. SDK/API refresh
   - Use current `boto3`/`botocore`.
   - Keep the Converse/ConverseStream APIs; tidy client setup and error handling.
   - Fix an existing inconsistency where `get_body` hardcodes `temperature=0` and `topP=1` and ignores the configured inference parameters.

2. Latest model support
   - Update the dummy JSONL and README to current cross-region inference profile IDs (for example Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5, Claude 3.7 Sonnet, and Amazon Nova Pro/Lite/Micro).

3. Configurable prompt caching (opt-in)
   - Add optional per-row JSONL fields, defaulting off: `prompt_caching` (bool), `cache_ttl` ("5m"/"1h"), and a field for the cacheable static prefix (system prompt / context).
   - Insert Converse `cachePoint` checkpoints when caching is enabled.
   - Capture `cacheReadInputTokens` and `cacheWriteInputTokens` from the Converse `usage` block, derive a `cache_hit_rate`, and report TTFT cached vs uncached.
   - Provide a longer sample prompt that meets the per-model cache minimums (1,024-4,096 tokens).

## Backward compatibility

- Existing JSONL files run unchanged. All new fields are optional and default to off.

## Out of scope

- No change to the core benchmarking methodology beyond adding cache-related columns and plots.

## Contribution

I would like to submit a pull request for this. Happy to adjust the scope based on maintainer feedback.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update model-latency-benchmarking: latest SDK/models and configurable prompt caching #713

Summary

Motivation

Proposed changes

Backward compatibility

Out of scope

Contribution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Update model-latency-benchmarking: latest SDK/models and configurable prompt caching #713

Description

Summary

Motivation

Proposed changes

Backward compatibility

Out of scope

Contribution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions