Enrich retry log messages with task/sample/model context by sjawhar · Pull Request #3240 · UKGovernmentBEIS/inspect_ai

sjawhar · 2026-02-14T20:54:52Z

Summary

When inspect retries failed model requests, log messages like Retrying request to /responses in 0.396765 seconds lack context about which task, sample, and provider triggered the retry. In a concurrent runner processing many samples across many tasks, this makes debugging difficult.

This PR enriches all retry log messages with a compact context prefix and error summary:

[EeDA74nwfs3uirgyprfa4b hello_world/1/1 openai/gpt-4o-mini] Retrying request to /chat/completions in 1.000000 seconds

[Abc12xY mmlu/42/1 openai/gpt-4o] -> openai/gpt-4o retry 2 (retrying in 6 seconds) [RateLimitError 429 rate_limit_exceeded]

Format: [{sample_uuid} {task}/{sample_id}/{epoch} {model}]

What changed

New helpers in src/inspect_ai/_util/retry.py:

sample_context_prefix() — builds the compact prefix from sample_active() ContextVar
retry_error_summary() — extracts exception type/status/code without leaking message content
SampleContextFilter — logging.Filter for SDK loggers, adds both inline prefix and structured fields
install_sample_context_logging() — attaches filter at eval startup

Enriched existing loggers:

log_model_retry() in model/_model.py — prefix + error summary
log_httpx_retry_attempt() in _util/httpx.py — prefix

Wired into startup:

init_eval_context() in _eval/context.py calls install_sample_context_logging()

Evidence of working

E2E: Real `inspect eval` against mock 429 server

Ran inspect eval examples/hello_world.py against a local mock server returning 429s:

[02/13/26 20:24:32] HTTP     POST                                   hooks.py:123
                             http://localhost:8765/v1/chat/completi
                             ons "HTTP/1.0 429 Too Many Requests"
                    INFO     [EeDA74nwfs3uirgyprfa4b        _base_client.py:1693
                             hello_world/1/1
                             openai/gpt-4o-mini] Retrying
                             request to /chat/completions
                             in 1.000000 seconds

The prefix [EeDA74nwfs3uirgyprfa4b hello_world/1/1 openai/gpt-4o-mini] appears on the SDK's own retry message — confirming the SampleContextFilter on openai._base_client works.

Structured JSON logging

When using a JSON log formatter (e.g. python-json-logger), the structured fields appear as top-level keys:

{
  "message": "[Abc12xY mmlu/42/1 openai/gpt-4o] Retrying request to /responses in 0.396765 seconds",
  "name": "openai._base_client",
  "levelname": "INFO",
  "sample_uuid": "Abc12xY",
  "sample_task": "mmlu",
  "sample_id": 42,
  "sample_epoch": 1,
  "sample_model": "openai/gpt-4o"
}

Unit tests: 24 passing

tests/util/test_retry_logging.py  24 passed
tests/test_retry.py               12 passed, 1 skipped
tests/test_retry_on_error.py       5 passed

Review-driven fixes

Code review caught three issues, all fixed with regression tests added before the fix:

Issue	Fix
Filter on wrong logger — installed on `openai` but SDK logs from `openai._base_client`; parent logger filters don't run for child records during propagation	Target `openai._base_client` directly
`TypeError` on non-string `.code` — `getattr(ex, "code")` can return int, crashing `' '.join()`	`str(raw_code)` cast
`%` in prefix breaks formatting — mutating `record.msg` with `%` chars would corrupt `msg % args`	Call `record.getMessage()` first, then set resolved msg and clear args

Linear: ENG-594

Merged branches: - retry-log (PR UKGovernmentBEIS#3240): Enrich retry log messages with task/sample/model context - fix/find-band-search (PR UKGovernmentBEIS#3237): Improve Ctrl+F search: wrap-around, match count, virtualization support - feature/viewer-flat-view: Add flat view toggle to transcript viewer

## Summary - Updates inspect-ai git pin from cherry-picked release (`4bfe32e7`) to proper octopus merge release (`f2e836ec`) based on PyPI `0.3.179` - The previous release was built by cherry-picking commits, missing several open PRs. This release is an octopus merge of all METR PR branches. ## Included METR PRs (on top of 0.3.179) | PR | Branch | Title | |----|--------|-------| | [#3240](UKGovernmentBEIS/inspect_ai#3240) | `retry-log` | Enrich retry log messages with task/sample/model context | | [#3237](UKGovernmentBEIS/inspect_ai#3237) | `fix/find-band-search` | Improve Ctrl+F search: wrap-around, match count, virtualization support | | — | `feature/viewer-flat-view` | Add flat view toggle to transcript viewer | ## Testing & Validation - [ ] CI passes - [ ] Smoke tests pass against dev environment ## Code Quality - [x] Lock files updated (root + all lambda modules) - [x] No code changes beyond `pyproject.toml` and lock files --------- Co-authored-by: Mischa Spiegelmock <[email protected]>

sjawhar mentioned this pull request Feb 14, 2026

Enrich retry log messages with task/sample/model context METR/inspect_ai#14

Closed

sjawhar force-pushed the retry-log branch 2 times, most recently from 8e3cb0b to dac225e Compare February 14, 2026 22:03

sjawhar added 2 commits February 16, 2026 00:08

feat: enrich log_model_retry with sample context and error summary

a26c7dc

feat: add SampleContextFilter for enriching SDK retry logs

36651ab

sjawhar mentioned this pull request Feb 16, 2026

Update inspect-ai pin to octopus merge release on 0.3.179 METR/inspect-action#884

Merged

4 tasks

sjawhar force-pushed the retry-log branch from dac225e to 36651ab Compare February 16, 2026 00:26

Merge branch 'main' into retry-log

7a2c02f

revmischa mentioned this pull request Feb 25, 2026

PLT-594: Update docs and tests for enriched retry log messages METR/inspect-action#929

Open

Merge branch 'main' into retry-log

ffe2344

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enrich retry log messages with task/sample/model context#3240

Enrich retry log messages with task/sample/model context#3240
sjawhar wants to merge 4 commits intoUKGovernmentBEIS:mainfrom
METR:retry-log

sjawhar commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sjawhar commented Feb 14, 2026

Summary

What changed

Evidence of working

E2E: Real inspect eval against mock 429 server

Structured JSON logging

Unit tests: 24 passing

Review-driven fixes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

E2E: Real `inspect eval` against mock 429 server