Skip to content

Enrich retry log messages with task/sample/model context#3240

Open
sjawhar wants to merge 4 commits intoUKGovernmentBEIS:mainfrom
METR:retry-log
Open

Enrich retry log messages with task/sample/model context#3240
sjawhar wants to merge 4 commits intoUKGovernmentBEIS:mainfrom
METR:retry-log

Conversation

@sjawhar
Copy link
Contributor

@sjawhar sjawhar commented Feb 14, 2026

Summary

When inspect retries failed model requests, log messages like Retrying request to /responses in 0.396765 seconds lack context about which task, sample, and provider triggered the retry. In a concurrent runner processing many samples across many tasks, this makes debugging difficult.

This PR enriches all retry log messages with a compact context prefix and error summary:

[EeDA74nwfs3uirgyprfa4b hello_world/1/1 openai/gpt-4o-mini] Retrying request to /chat/completions in 1.000000 seconds
[Abc12xY mmlu/42/1 openai/gpt-4o] -> openai/gpt-4o retry 2 (retrying in 6 seconds) [RateLimitError 429 rate_limit_exceeded]

Format: [{sample_uuid} {task}/{sample_id}/{epoch} {model}]

What changed

New helpers in src/inspect_ai/_util/retry.py:

  • sample_context_prefix() — builds the compact prefix from sample_active() ContextVar
  • retry_error_summary() — extracts exception type/status/code without leaking message content
  • SampleContextFilterlogging.Filter for SDK loggers, adds both inline prefix and structured fields
  • install_sample_context_logging() — attaches filter at eval startup

Enriched existing loggers:

  • log_model_retry() in model/_model.py — prefix + error summary
  • log_httpx_retry_attempt() in _util/httpx.py — prefix

Wired into startup:

  • init_eval_context() in _eval/context.py calls install_sample_context_logging()

Evidence of working

E2E: Real inspect eval against mock 429 server

Ran inspect eval examples/hello_world.py against a local mock server returning 429s:

[02/13/26 20:24:32] HTTP     POST                                   hooks.py:123
                             http://localhost:8765/v1/chat/completi
                             ons "HTTP/1.0 429 Too Many Requests"
                    INFO     [EeDA74nwfs3uirgyprfa4b        _base_client.py:1693
                             hello_world/1/1
                             openai/gpt-4o-mini] Retrying
                             request to /chat/completions
                             in 1.000000 seconds

The prefix [EeDA74nwfs3uirgyprfa4b hello_world/1/1 openai/gpt-4o-mini] appears on the SDK's own retry message — confirming the SampleContextFilter on openai._base_client works.

Structured JSON logging

When using a JSON log formatter (e.g. python-json-logger), the structured fields appear as top-level keys:

{
  "message": "[Abc12xY mmlu/42/1 openai/gpt-4o] Retrying request to /responses in 0.396765 seconds",
  "name": "openai._base_client",
  "levelname": "INFO",
  "sample_uuid": "Abc12xY",
  "sample_task": "mmlu",
  "sample_id": 42,
  "sample_epoch": 1,
  "sample_model": "openai/gpt-4o"
}

Unit tests: 24 passing

tests/util/test_retry_logging.py  24 passed
tests/test_retry.py               12 passed, 1 skipped
tests/test_retry_on_error.py       5 passed

Review-driven fixes

Code review caught three issues, all fixed with regression tests added before the fix:

Issue Fix
Filter on wrong logger — installed on openai but SDK logs from openai._base_client; parent logger filters don't run for child records during propagation Target openai._base_client directly
TypeError on non-string .codegetattr(ex, "code") can return int, crashing ' '.join() str(raw_code) cast
% in prefix breaks formatting — mutating record.msg with % chars would corrupt msg % args Call record.getMessage() first, then set resolved msg and clear args

Linear: ENG-594

@sjawhar sjawhar force-pushed the retry-log branch 2 times, most recently from 8e3cb0b to dac225e Compare February 14, 2026 22:03
sjawhar added a commit to METR/inspect_ai that referenced this pull request Feb 16, 2026
Merged branches:
- retry-log (PR UKGovernmentBEIS#3240): Enrich retry log messages with task/sample/model context
- fix/find-band-search (PR UKGovernmentBEIS#3237): Improve Ctrl+F search: wrap-around, match count, virtualization support
- feature/viewer-flat-view: Add flat view toggle to transcript viewer
revmischa added a commit to METR/inspect-action that referenced this pull request Feb 17, 2026
## Summary

- Updates inspect-ai git pin from cherry-picked release (`4bfe32e7`) to
proper octopus merge release (`f2e836ec`) based on PyPI `0.3.179`
- The previous release was built by cherry-picking commits, missing
several open PRs. This release is an octopus merge of all METR PR
branches.

## Included METR PRs (on top of 0.3.179)

| PR | Branch | Title |
|----|--------|-------|
| [#3240](UKGovernmentBEIS/inspect_ai#3240) |
`retry-log` | Enrich retry log messages with task/sample/model context |
| [#3237](UKGovernmentBEIS/inspect_ai#3237) |
`fix/find-band-search` | Improve Ctrl+F search: wrap-around, match
count, virtualization support |
| — | `feature/viewer-flat-view` | Add flat view toggle to transcript
viewer |

## Testing & Validation

- [ ] CI passes
- [ ] Smoke tests pass against dev environment

## Code Quality

- [x] Lock files updated (root + all lambda modules)
- [x] No code changes beyond `pyproject.toml` and lock files

---------

Co-authored-by: Mischa Spiegelmock <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant