Skip to content

Update model-latency-benchmarking: latest SDK/models and configurable prompt caching #713

Description

@evgenisokolov

Summary

Proposal to update the model-latency-benchmarking tool so it uses current SDK/APIs, benchmarks the latest Bedrock models, and can benchmark prompt caching.

Motivation

  • The example dataset and model list point at older models (Llama 3.1 70B, Claude 3.5 Haiku).
  • Bedrock now offers newer models (Claude 4.5 family, Amazon Nova) that users want to benchmark.
  • Prompt caching is generally available and reduces latency by up to 85%, but the tool cannot benchmark it today.

Proposed changes

  1. SDK/API refresh

    • Use current boto3/botocore.
    • Keep the Converse/ConverseStream APIs; tidy client setup and error handling.
    • Fix an existing inconsistency where get_body hardcodes temperature=0 and topP=1 and ignores the configured inference parameters.
  2. Latest model support

    • Update the dummy JSONL and README to current cross-region inference profile IDs (for example Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5, Claude 3.7 Sonnet, and Amazon Nova Pro/Lite/Micro).
  3. Configurable prompt caching (opt-in)

    • Add optional per-row JSONL fields, defaulting off: prompt_caching (bool), cache_ttl ("5m"/"1h"), and a field for the cacheable static prefix (system prompt / context).
    • Insert Converse cachePoint checkpoints when caching is enabled.
    • Capture cacheReadInputTokens and cacheWriteInputTokens from the Converse usage block, derive a cache_hit_rate, and report TTFT cached vs uncached.
    • Provide a longer sample prompt that meets the per-model cache minimums (1,024-4,096 tokens).

Backward compatibility

  • Existing JSONL files run unchanged. All new fields are optional and default to off.

Out of scope

  • No change to the core benchmarking methodology beyond adding cache-related columns and plots.

Contribution

I would like to submit a pull request for this. Happy to adjust the scope based on maintainer feedback.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions