Skip to content

Stop Codex harness retry loops on TPM exhaustion and unfinished-goal errors#42420

Merged
pelikhan merged 4 commits into
mainfrom
copilot/fix-retry-loop-issue
Jun 30, 2026
Merged

Stop Codex harness retry loops on TPM exhaustion and unfinished-goal errors#42420
pelikhan merged 4 commits into
mainfrom
copilot/fix-retry-loop-issue

Conversation

Copilot AI commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Codex runs were repeatedly re-invoked after deterministic failures, burning tokens without forward progress. In particular, token-per-minute exhaustion and a newer “unfinished goal” router error variant were not being treated as non-retryable.

  • Retry guard: token budget exhaustion

    • Added explicit detection for OpenAI TPM-limit wording (on tokens per min) in codex_harness.cjs.
    • Short-circuits harness retries when this condition is present to avoid fresh-run token drain.
  • Retry guard: goal-state regression

    • Extended harness_retry_guard.cjs to recognize:
      • cannot create a new goal because this thread has an unfinished goal; complete the existing goal first
    • Ensures this deterministic router state is handled as non-retryable.
  • Targeted harness test updates

    • Added/updated tests in:
      • actions/setup/js/codex_harness.test.cjs
      • actions/setup/js/harness_retry_guard.test.cjs
    • Updated retry-policy expectations so TPM-limit errors and unfinished-goal errors do not trigger retries.
  • Docs: retry-count configuration clarity

    • Updated docs/src/content/docs/reference/engines.md to document current built-in harness retry budget and where custom retry behavior can be introduced.
// codex_harness.cjs
if (isTokenPerMinuteRateLimitError(result.output)) {
  log(`attempt ${attempt + 1}: token-per-minute rate limit detected — not retrying`);
  break;
}

Copilot AI linked an issue Jun 30, 2026 that may be closed by this pull request
Copilot AI and others added 2 commits June 30, 2026 05:40
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix retry loop that drained all tokens Stop Codex harness retry loops on TPM exhaustion and unfinished-goal errors Jun 30, 2026
Copilot AI requested a review from pelikhan June 30, 2026 05:44
@pelikhan pelikhan marked this pull request as ready for review June 30, 2026 05:46
Copilot AI review requested due to automatic review settings June 30, 2026 05:46

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens retry behavior in the Node.js harnesses used by gh-aw engine runners to avoid wasting tokens on deterministic failures (TPM exhaustion and “unfinished goal” router-state errors), and updates tests + docs to match the new retry policy.

Changes:

  • Stop Codex harness fresh-run retries when output indicates token-per-minute (TPM) rate-limit exhaustion.
  • Extend the shared non-retryable retry-guard to detect the “unfinished goal” error variant as a goal-already-active condition.
  • Update harness unit tests and engine docs to reflect the fixed retry budget and the new non-retryable conditions.
Show a summary per file
File Description
docs/src/content/docs/reference/engines.md Documents the current built-in harness retry budget and how to customize retry behavior via a custom Copilot harness.
actions/setup/js/harness_retry_guard.cjs Adds an additional non-retryable pattern for the “unfinished goal” wording.
actions/setup/js/harness_retry_guard.test.cjs Adds coverage to ensure the new “unfinished goal” wording is detected as non-retryable.
actions/setup/js/codex_harness.cjs Introduces TPM-specific detection and short-circuits the retry loop on TPM exhaustion.
actions/setup/js/codex_harness.test.cjs Adds unit tests for TPM detection and updates retry-policy expectations accordingly.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 0
  • Review effort level: Low

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #42420 does not have the 'implementation' label and has 0 new lines of code in business logic directories (threshold: 100).

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions github-actions Bot mentioned this pull request Jun 30, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Skills-Based Review 🧠

Applied /diagnosing-bugs and /tdd — 4 non-blocking observations posted as inline comments.

📋 Key Themes & Highlights

Key Themes

  • Test shadowing (highest-impact): The existing does not retry when rate-limit reconnects are exhausted (N/N pattern) test uses output containing on tokens per min, so the new TPM guard now fires first in shouldRetry. That test no longer exercises the reconnect-exhausted guard for TPM output — see inline comment on line 439.
  • Observability gap: isTokenPerMinuteRateLimitError is not surfaced in the structured attempt X failed log. Operators debugging a stopped retry loop will see isRateLimitError=true but need to search further to confirm the TPM guard fired — see inline comment on line 525.
  • Test completeness: TPM guard test covers only hasOutput: false; unfinished-goal test uses a plain string rather than a JSON-wrapped error.

Positive Highlights

  • ✅ Clear, well-motivated PR description with concrete code snippets
  • ✅ Both new guards are accompanied by tests
  • ✅ Renamed test accurately reflects the new (correct) false expectation
  • ✅ Docs updated with actionable retry-budget guidance and custom-harness escape hatch
  • ✅ Guard composition (isRateLimitError && TOKEN_PER_MIN_RATE_LIMIT_PATTERN) is defensively sound

@copilot please address the review comments above.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 61.9 AIC · ⌖ 7.9 AIC · ⊞ 6.5K ·
Comment /matt to run again

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 61.9 AIC · ⌖ 7.9 AIC · ⊞ 6.5K
Comment /matt to run again

Comment thread actions/setup/js/codex_harness.test.cjs Outdated
const nonRetryableGuard = detectNonRetryableHarnessGuard(result.output);
if (nonRetryableGuard.aiCreditsExceeded || nonRetryableGuard.awfAPIProxyBlockingRequests || nonRetryableGuard.goalAlreadyActive || nonRetryableGuard.maxRunsExceeded) return false;
const isRateLimit = isRateLimitError(result.output);
if (isTokenPerMinuteRateLimitError(result.output)) return false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The existing does not retry when rate-limit reconnects are exhausted (N/N pattern) test (line 518) uses output that also contains on tokens per min — meaning this new TPM guard fires before the reconnect-exhausted check, and that test now exercises the TPM path rather than the reconnect-exhausted path for TPM output.

💡 How to fix the coverage gap

Add a companion reconnect-exhausted test that uses non-TPM reconnect output so the isRateLimit && isReconnectExhaustedError guard stays specifically covered:

it("does not retry when rate-limit reconnects are exhausted (non-TPM rate limit)", () => {
  const output =
    '{"type":"error","message":"Reconnecting... 1/5 (stream disconnected: RateLimitError)"}\n' +
    '{"type":"error","message":"Reconnecting... 5/5 (stream disconnected: RateLimitError)"}';
  const result = { exitCode: 1, hasOutput: true, output };
  expect(shouldRetry(result, 0)).toBe(false);
});

Without this, accidentally deleting the reconnect-exhausted guard for TPM scenarios would go undetected.

@copilot please address this.

// Fresh-run retries immediately repeat the same prompt workload and can quickly
// drain available credits without making forward progress.
if (isTokenPerMinuteRateLimitError(result.output)) {
log(`attempt ${attempt + 1}: token-per-minute rate limit detected — not retrying (fresh runs can further drain token budget)`);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnosing-bugs] The structured failure log above (line ~460) records isRateLimitError=true but doesn't include the new isTokenPerMinuteRateLimitError flag. When debugging why a run didn't retry, operators will see "rate-limit" in the structured log but have to search for the break-message later to learn the TPM guard fired.

💡 Suggested addition to the structured log

Add isTokenPerMinuteRateLimitError to the existing failure log:

// near line 460
log(
  `attempt ${attempt + 1} failed:` +
    ` ...` +
    ` isRateLimitError=${isRateLimit}` +
+   ` isTokenPerMinuteRateLimitError=${isTokenPerMinuteRateLimitError(result.output)}` +
    ` ...`
);

This keeps all guard-state visible at the same log line and avoids the need to scroll forward to find the break-message.

@copilot please address this.

const result = {
exitCode: 1,
hasOutput: false,
output: '{"type":"error","message":"Rate limit reached for gpt-4o-mini in organization org-xxx on tokens per min (TPM): Limit 200000, Used 50000, Requested 35000. Please try again in 615ms."}',

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The new test only covers hasOutput: false. TPM limits can trigger mid-run after partial output is written — worth adding a hasOutput: true variant to confirm the guard fires unconditionally.

💡 Suggested additional test
it("does not retry on token-per-minute rate limit wording even with partial output", () => {
  const result = {
    exitCode: 1,
    hasOutput: true,
    output: 'Rate limit reached for gpt-4o-mini in organization org-xxx on tokens per min (TPM): Limit 200000, Used 199000, Requested 1200.',
  };
  expect(shouldRetry(result, 0)).toBe(false);
});

This guards against a future refactor that might inadvertently make the TPM check conditional on hasOutput.

@copilot please address this.

expect(result.goalAlreadyActive).toBe(true);
});

it("detects goal already active for unfinished-goal wording", () => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The new test passes a plain string. Real router errors arrive JSON-wrapped, so the pattern's real-world input is '{"type":"error","message":"cannot create a new goal..."}'. A JSON-wrapped variant would give higher confidence that the [\\s\\S]*? part of the pattern handles the surrounding JSON safely.

💡 Suggested additional test
it("detects goal already active for unfinished-goal wording (JSON-wrapped)", () => {
  const result = detectNonRetryableHarnessGuard(
    '{"type":"error","message":"cannot create a new goal because this thread has an unfinished goal; complete the existing goal first"}'
  );
  expect(result.goalAlreadyActive).toBe(true);
});

@copilot please address this.

@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel — Score: 80/100 ✅ Excellent

Tests in this PR directly verify two new non-retryable error-detection contracts (TPM exhaustion and unfinished-goal wording), all targeting error paths with correct behavioral assertions.

📊 Metrics & Classification

File-level summary

File Lines Added (test) Lines Added (prod) Ratio Flag
codex_harness.test.cjs +14 +21 0.67:1
harness_retry_guard.test.cjs +5 +1 5:1 ⚠️ inflation (see note)

Inflation note: The 5:1 ratio on harness_retry_guard.test.cjs is a natural asymmetry — the production change adds one error-string variant to a pattern matcher (+1 line), while the test appropriately exercises the new wording with a realistic input string. This is correct testing depth, not padding.

Per-test classification

# Test File Assertions Design Value
1 isTokenPerMinuteRateLimitError — returns true for TPM-limit wording codex_harness.test.cjs 1 (toBe(true)) design_test high
2 isTokenPerMinuteRateLimitError — returns false for generic rate-limit codex_harness.test.cjs 1 (toBe(false)) design_test high
3 shouldRetry — does not retry on TPM rate limit wording codex_harness.test.cjs 1 (toBe(false)) implementation_test ⚠️ medium
4 detectNonRetryableHarnessGuard — unfinished-goal wording harness_retry_guard.test.cjs 1 (toBe(true)) design_test high

Score components

Component Value
Design tests / total (×40) 3/4 → 30
Edge-case tests / total (×30) 4/4 → 30
Duplicate clusters penalty 0 clusters → 20
Inflation bonus Ratio >2:1 on harness_retry_guard.test.cjs → 0
Total 80 / 100

Implementation-test ratio: 1/4 = 25% (threshold: 30%) ✅
Guideline violations: none

🔍 Flagged: Test #3 — implementation_detail concern

Test: "does not retry on token-per-minute rate limit wording"codex_harness.test.cjs

The shouldRetry function under test is defined inline within the test file (a pre-existing pattern in this describe block), not imported from the production module. The PR correctly updates both the production codex_harness.cjs and the inline test copy. However, because the test exercises a local reimplementation, a future divergence between the production function and the inline copy could go undetected.

Severity: Low — this is a pre-existing test smell, not introduced by this PR. The real imported isTokenPerMinuteRateLimitError is validated by tests #1 and #2.

Suggestion (non-blocking): Consider exporting shouldRetry from codex_harness.cjs so tests can import and call it directly, eliminating the inline reimplementation.


Verdict: ✅ APPROVE — 80/100. New tests cover the two behavioral contracts introduced by this PR (TPM exhaustion detection and unfinished-goal error variant) with appropriate error-path assertions. Implementation-test ratio is 25%, within the 30% threshold. No guideline violations.

References: §28423209654

🧪 Test quality analysis by Test Quality Sentinel · 64.5 AIC · ⌖ 21.9 AIC · ⊞ 6.7K ·
Comment /review to run again

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 80/100. Test quality is acceptable — 25% of new tests are implementation tests (threshold: 30%).

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Stop Codex harness retry loops on TPM exhaustion and unfinished-goal errors

The overall approach is correct and well-tested. The TPM guard prevents token drain on deterministic failures, and the new goalAlreadyActive variant is handled cleanly. All three harnesses confirmed to use MAX_RETRIES = 3, so the docs update is accurate.

3 non-blocking suggestions:

# File Concern
1 codex_harness.cjs:67 TOKEN_PER_MIN_RATE_LIMIT_PATTERN is too generic — risks false positive when non-TPM rate-limit output happens to mention "on tokens per min" in a log line
2 codex_harness.cjs:524 isTokenPerMinuteRateLimitError re-scans output for isRateLimitError even though isRateLimit was already computed at line 450
3 harness_retry_guard.cjs:8 [\s\S]*? in the new goal pattern is unbounded; the separator is a known ; — a bounded class like [^;]{0,120} is more precise

None of these are blockers — the change is safe to merge as-is. See inline comments for suggested fixes.

🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 54.1 AIC · ⌖ 6.94 AIC · ⊞ 4.9K

Comment thread actions/setup/js/codex_harness.cjs Outdated
// the human-readable message Codex emits inside "Reconnecting..." / error lines:
// "Rate limit reached for <model> in organization <org> on tokens per min (TPM): ..."
const RATE_LIMIT_ERROR_PATTERN = /rate_limit_exceeded|429 Too Many Requests|RateLimitError|Rate limit reached for [^\s]+(?: in organization [^\s]+)? on tokens per min/i;
const TOKEN_PER_MIN_RATE_LIMIT_PATTERN = /on tokens per min/i;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over-broad pattern risks false positives.

TOKEN_PER_MIN_RATE_LIMIT_PATTERN = /on tokens per min/i is a very generic substring. isTokenPerMinuteRateLimitError guards it with isRateLimitError, but that check also matches rate_limit_exceeded and 429 Too Many Requests — both of which could appear in output that also contains the phrase on tokens per min for an unrelated reason (e.g., a Codex diagnostic note about the token budget). In such a case a non-TPM rate-limit failure would be misclassified as TPM and a viable retry would be suppressed.

A self-contained, specific pattern removes the need for the && guard:

const TOKEN_PER_MIN_RATE_LIMIT_PATTERN =
  /Rate limit reached for \S+(?:\s+in organization \S+)? on tokens per min/i;

This matches only the well-known OpenAI TPM error structure and lets isTokenPerMinuteRateLimitError drop the isRateLimitError(output) && guard entirely.

@copilot please address this.

Comment thread actions/setup/js/codex_harness.cjs Outdated
// Token-per-minute limits indicate exhausted budget for the current workload profile.
// Fresh-run retries immediately repeat the same prompt workload and can quickly
// drain available credits without making forward progress.
if (isTokenPerMinuteRateLimitError(result.output)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isTokenPerMinuteRateLimitError re-scans output for isRateLimitError internally, even though isRateLimit was already computed at line 450. For typical Codex sessions the output buffer can be several megabytes; re-applying two regexes to it is minor but avoidable.

If the pattern is tightened to the specific OpenAI TPM structure (see comment on line 67), the && guard is unnecessary and the function becomes a single-pass check. Alternatively, pass the pre-computed result:

if (isRateLimit && isTokenPerMinuteRateLimitError(result.output)) {

@copilot please address this.


const AWF_API_PROXY_BLOCKING_REQUESTS_PATTERNS = [/\bawf\b.*\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocked requests?\b/i, /\bDIFC_FILTERED\b/];
const GOAL_ALREADY_ACTIVE_PATTERNS = [/\bthis thread already has a goal\b[\s\S]*?\buse update_goal\b/i];
const GOAL_ALREADY_ACTIVE_PATTERNS = [/\bthis thread already has a goal\b[\s\S]*?\buse update_goal\b/i, /\bcannot create a new goal because this thread has an unfinished goal\b[\s\S]*?\bcomplete the existing goal first\b/i];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[\s\S]*? can match across unbounded distances. The new pattern:

/\bcannot create a new goal ... \b[\s\S]*?\bcomplete the existing goal first\b/i

uses a lazy wildcard that will span as much text as needed to find both anchors. In real error output the two phrases are always separated by ; — a fixed, known delimiter. Tightening to something like [^;]{0,120} (or simply ;\s*) would both reduce any theoretical ReDoS surface and make the intent explicit:

/\bcannot create a new goal because this thread has an unfinished goal\b[^;]{0,120}\bcomplete the existing goal first\b/i

(The existing first pattern has the same structure; this is consistent with it but worth hardening together.)

@copilot please address this.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking observations — intent is correct, implementation has a few rough edges

The core logic (skip retries on TPM exhaustion and unfinished-goal errors) is sound and the harness/retry-guard pattern detection works correctly. Three non-blocking issues noted inline:

Issues found
  1. ** is over-broad** (line 67): The loose pattern can false-positive when combined with any other rate-limit trigger in the same output — use the precise wording already in instead.

  2. **Double call in ** (line 524): is already computed at line 450; calling internally makes the expensive regex run twice. Inline as .

  3. Test mirrors production logic instead of testing it (test line 439): The local helper is a hand-written copy of the retry conditions in . The actual at is not exercised by any test, so a future regression there would go undetected.

🔎 Code quality review by PR Code Quality Reviewer · 93 AIC · ⌖ 7.26 AIC · ⊞ 1.6K
Comment /review to run again

Comment thread actions/setup/js/codex_harness.cjs Outdated
// the human-readable message Codex emits inside "Reconnecting..." / error lines:
// "Rate limit reached for <model> in organization <org> on tokens per min (TPM): ..."
const RATE_LIMIT_ERROR_PATTERN = /rate_limit_exceeded|429 Too Many Requests|RateLimitError|Rate limit reached for [^\s]+(?: in organization [^\s]+)? on tokens per min/i;
const TOKEN_PER_MIN_RATE_LIMIT_PATTERN = /on tokens per min/i;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over-broad TPM pattern risks suppressing retries on unrelated errors: TOKEN_PER_MIN_RATE_LIMIT_PATTERN = /on tokens per min/i is a much looser match than the TPM-specific tail already present in RATE_LIMIT_ERROR_PATTERN.

💡 Suggested fix

If result.output ever contains two unrelated lines — one matching rate_limit_exceeded and another containing an unrelated "on tokens per min" reference — isTokenPerMinuteRateLimitError will return true and silently break the retry loop for what may have been a recoverable error.

Since RATE_LIMIT_ERROR_PATTERN already encodes the full OpenAI TPM error phrase, eliminate the separate looser constant and match precisely:

function isTokenPerMinuteRateLimitError(output) {
  return /Rate limit reached for [^\s]+(?: in organization [^\s]+)? on tokens per min/i.test(output);
}

This matches the exact OpenAI wording documented in the comment at line 65, avoids the false-positive scenario, and removes the need for TOKEN_PER_MIN_RATE_LIMIT_PATTERN as a separate exported-or-module-scoped constant.

Comment thread actions/setup/js/codex_harness.cjs Outdated
// Token-per-minute limits indicate exhausted budget for the current workload profile.
// Fresh-run retries immediately repeat the same prompt workload and can quickly
// drain available credits without making forward progress.
if (isTokenPerMinuteRateLimitError(result.output)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isRateLimitError runs twice on the same output string: isRateLimit is already computed at line 450. Calling isTokenPerMinuteRateLimitError(result.output) here triggers a second full regex sweep via the internal isRateLimitError call, with no benefit.

💡 Suggested fix

Since isRateLimit is in scope, eliminate the redundant call by inlining the TPM check:

// Line 524 – replace:
if (isTokenPerMinuteRateLimitError(result.output)) {

// with:
if (isRateLimit && TOKEN_PER_MIN_RATE_LIMIT_PATTERN.test(result.output)) {

(Or use the more precise pattern suggested for line 67 directly here.)

This also makes the code's intent clearer: TPM detection is a specialisation of the general rate-limit path, which is exactly what the surrounding code structure implies.

Comment thread actions/setup/js/codex_harness.test.cjs Outdated
const nonRetryableGuard = detectNonRetryableHarnessGuard(result.output);
if (nonRetryableGuard.aiCreditsExceeded || nonRetryableGuard.awfAPIProxyBlockingRequests || nonRetryableGuard.goalAlreadyActive || nonRetryableGuard.maxRunsExceeded) return false;
const isRateLimit = isRateLimitError(result.output);
if (isTokenPerMinuteRateLimitError(result.output)) return false;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test verifies a local copy of retry logic, not the production main() code path: the shouldRetry function defined at line 431 is a hand-written mirror of the retry conditions inside main(). Adding the TPM guard here validates pattern matching but does not exercise the actual break at line 524 of codex_harness.cjs.

💡 Why this matters

If the TPM check in main() is accidentally removed, reordered, or placed inside a condition that gates it away, all tests in this describe block will continue to pass — because they call the local shouldRetry mirror, not the real production code.

The test also cements the double-isRateLimitError call from the production code: line 438 computes const isRateLimit = isRateLimitError(result.output) and line 439 calls isTokenPerMinuteRateLimitError which calls isRateLimitError again internally.

Consider either:

  1. Exporting a thin _shouldRetry(result, attempt, flags) predicate from codex_harness.cjs that encapsulates the retry decision and can be tested directly, or
  2. Adding an integration test that exercises main() end-to-end with a stubbed subprocess returning a TPM error and asserts it does not invoke the subprocess a second time.

@github-actions

Copy link
Copy Markdown
Contributor

🤖 PR Triage — Run §28445959231

Field Value
Category bug
Risk 🟢 Low
Priority 🟡 Medium
Score 66 / 100
Action fast_track
CI ✅ PASSING
Age 7.5h

Score breakdown: Impact 34/50 · Urgency 20/30 · Quality 12/20

Stops infinite Codex retry loops on TPM exhaustion and unfinished-goal errors, preventing unnecessary token burn. CI passing, low-risk fix. Labels already applied.

Generated by 🔧 PR Triage Agent · 52.3 AIC · ⌖ 8.71 AIC · ⊞ 1.6K ·

@pelikhan

Copy link
Copy Markdown
Collaborator

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

@copilot run pr-finisher skill https://github.com/github/gh-aw/actions/runs/28423209643/job/84221720858

Implemented a pr-finisher pass and pushed 36154d3.

Addressed the failing run feedback by tightening TPM detection, logging TPM guard state in attempt diagnostics, avoiding redundant rate-limit rescans, and adding the requested test coverage updates (non-TPM reconnect exhaustion, TPM with partial output, and JSON-wrapped unfinished-goal cases). Local validation passed: make fmt, make lint, make test-unit, make test, and make recompile.

CI on this new head won’t auto-rerun from my push; please re-trigger workflows to verify remote checks on the updated commit.

@pelikhan pelikhan merged commit 2eb41b5 into main Jun 30, 2026
12 of 15 checks passed
@pelikhan pelikhan deleted the copilot/fix-retry-loop-issue branch June 30, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

retry loop drained all tokens #3

3 participants