Skip to content

fix(pipeline): batch liveness sweep for unconfirmed entries before apply#973

Open
FReptar0 wants to merge 1 commit into
santifer:mainfrom
FReptar0:fix/750-apply-liveness-gate
Open

fix(pipeline): batch liveness sweep for unconfirmed entries before apply#973
FReptar0 wants to merge 1 commit into
santifer:mainfrom
FReptar0:fix/750-apply-liveness-gate

Conversation

@FReptar0

@FReptar0 FReptar0 commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #750. Closes the remaining half of the liveness-gate class (the eval-side half landed in #937).

Batch scan adds roles to data/pipeline.md marked **Verification:** unconfirmed (batch mode) because Playwright is unavailable in headless mode — they're never liveness-checked. The reporter then applied to 8 roles, all already closed/filled, wasting time generating materials for dead postings.

Why the existing gates don't cover this

Neither sweeps the inbox up front, so the user still opens every dead tab one at a time before discovering it's closed.

Fix

Adds a "Liveness sweep" step to modes/pipeline.md that runs the existing zero-token check-liveness.mjs over all pending URLs in one batch before the per-URL loop:

  • Expired/closed entries are resolved to "Processed" (and Discarded in the tracker) without being evaluated.
  • uncertain results are left for normal per-URL confirmation (a transient timeout shouldn't drop a possibly-live posting).
  • apply.md gets a short cross-reference pointing multi-role sessions at the sweep.

Complements — does not replace — the per-URL gates. No new scripts; check-liveness.mjs already exists.

Test plan

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added a pre-processing "Liveness sweep" that batch-detects expired/closed URLs, moves them to Processed with expiration notes, and prevents further processing of expired entries.
  • Documentation

    • Added a preflight instruction to run the liveness sweep before per-URL processing in multi-role sessions.
  • Tests

    • Added a mode integrity check that ensures liveness-sweep guidance is present.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The PR documents a pre-loop "Liveness sweep" in pipeline mode that batch-runs check-liveness.mjs to move expired/closed pending URLs to Processed before per-URL work; apply mode adds a preflight note to run this sweep for multi-role sessions; tests assert the pipeline doc contains the sweep guidance.

Changes

Liveness sweep documentation

Layer / File(s) Summary
Liveness sweep workflow documentation
modes/pipeline.md
Adds a pre-loop "Liveness sweep": collect pending URLs, run node check-liveness.mjs --file <tmpfile> (optional --throttle), move expired/closed entries to "Processed" with an expiration note and mark trackers Discarded; keep "uncertain" entries pending. Updates Workflow Step 1 to require this sweep before iterating pending URLs.
Apply mode preflight note
modes/apply.md
Inserts a multi-role preflight instruction to run the pipeline "Liveness sweep" (node check-liveness.mjs --file <urls>) so expired postings are removed from data/pipeline.md before opening tabs.
Mode file integrity test
test-all.mjs
Adds assertions in the "Mode file integrity" suite to verify modes/pipeline.md contains ## Liveness sweep, check-liveness.mjs, unconfirmed, and Do not guidance; emits a dedicated pass message or fails with a clear missing-content message.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Possibly related PRs

  • santifer/career-ops#887: Related preflight liveness checks in apply-mode documentation and tests.
  • santifer/career-ops#374: Changes to expired-detection logic (classifyLiveness) that affect how the sweep classifies expired postings.
  • santifer/career-ops#937: Adds liveness gating at evaluation entry points; overlaps with this PR's documentation and test validation.
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a batch liveness sweep step to the pipeline mode before processing, which directly addresses issue #750.
Linked Issues check ✅ Passed The PR fulfills issue #750 requirements: adds pre-check liveness sweep to pipeline mode, auto-marks expired/closed roles as Discarded/Processed, preserves uncertain results, and informs users of closed roles.
Out of Scope Changes check ✅ Passed All changes align with issue #750 scope: documentation updates to modes/pipeline.md and modes/apply.md, plus a test verification step in test-all.mjs. No unrelated modifications detected.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…ifer#750)

Batch scan adds roles to data/pipeline.md marked
`**Verification:** unconfirmed (batch mode)` because Playwright is
unavailable in headless mode — they're never checked for liveness. A
user then applied to 8 roles, all already closed/filled, wasting time
generating materials for dead postings.

The existing gates only catch this one tab at a time: apply.md Step 5
preflight verifies the single open form, and auto-pipeline Step 0.5
gates per-URL during evaluation. Neither sweeps the inbox up front, so
the user still opens every dead tab.

This adds a "Liveness sweep" step to modes/pipeline.md that runs the
zero-token check-liveness.mjs over all pending URLs in one batch before
the per-URL loop, resolves expired entries to Processed (and Discarded
in the tracker) without evaluating them, and leaves `uncertain` results
for normal per-URL confirmation. apply.md cross-references the sweep for
multi-role sessions. Complements — does not replace — the existing
per-URL gates.

Test: test-all.mjs asserts the pipeline sweep step markers exist
(mirrors the santifer#937 eval-gate test). 243 passed, 0 failed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@FReptar0 FReptar0 force-pushed the fix/750-apply-liveness-gate branch from 7d136af to 33e859c Compare June 13, 2026 22:09

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modes/pipeline.md`:
- Around line 12-15: The doc is ambiguous about the checker’s non-zero exit
causing automation to abort; update the pipeline step to explicitly instruct
consumers to treat the checker’s non-zero exit as informative (not fatal):
either run the checker in a mode/with a wrapper that always returns 0 and emits
a machine-readable report, or explicitly ignore its exit code and parse its
stdout/JSON output to move expired/closed entries to "Processed" and mark
trackers `Discarded`, while leaving `uncertain` entries for normal extraction;
reference the checker by name (check-liveness.mjs) and add a short example
command or note showing how to run it such that output is parsed even if the
process exits non-zero.

In `@test-all.mjs`:
- Around line 637-648: The current check on pipelineMode (from
readFile('modes/pipeline.md')) is too loose—using generic includes that can be
accidentally satisfied; update the if-condition to assert the exact contract
lines instead of loose keywords: check for the full header "## Liveness sweep",
the exact script reference "check-liveness.mjs" (as a standalone token), a
precise sentence that asserts sweeping "unconfirmed" entries (e.g., "Sweeps
unconfirmed entries for liveness before processing"), the explicit directive "Do
not process entries before liveness sweep", and the contract line that states
how "uncertain" liveness is treated (the exact wording used to lock behavior for
issue `#750`, e.g., the sentence that says uncertain is treated as non-zero).
Replace the chained pipelineMode.includes(...) checks with explicit string
equality/substring checks for those full lines (or a small set of exact regexes)
so the test fails on any regression in the sweep contract.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 61af2d9d-ec9a-4a19-9327-34a22902b669

📥 Commits

Reviewing files that changed from the base of the PR and between 7d136af and 33e859c.

📒 Files selected for processing (3)
  • modes/apply.md
  • modes/pipeline.md
  • test-all.mjs

Comment thread modes/pipeline.md
Comment on lines +12 to +15
2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain.
3. For every URL the checker reports as **expired/closed**, resolve the pipeline entry instead of processing it: move it to "Processed" as `- [x] ~~URL | Company | Role~~ — posting expired (liveness sweep)` and, if it already has a tracker row, mark it `Discarded`. **Do not** extract the JD, evaluate, or generate a report/PDF for it.
4. Leave `uncertain` results in place to be confirmed during normal per-URL extraction (a transient timeout shouldn't drop a possibly-live posting).
5. Only the surviving live URLs continue to the per-URL processing loop below.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clarify non-zero sweep exit handling so the workflow does not abort early.

The step says the checker exits non-zero when any URL is expired/uncertain, but the flow then assumes processing continues. Without an explicit “continue and parse output even on non-zero” instruction, automation can stop before expired entries are moved to Processed.

Proposed doc patch
-2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain.
+2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain. **Do not treat that exit code as a hard stop for this sweep**—parse the per-URL verdicts, resolve expired/closed entries, keep `uncertain` entries pending, then continue.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain.
3. For every URL the checker reports as **expired/closed**, resolve the pipeline entry instead of processing it: move it to "Processed" as `- [x] ~~URL | Company | Role~~ — posting expired (liveness sweep)` and, if it already has a tracker row, mark it `Discarded`. **Do not** extract the JD, evaluate, or generate a report/PDF for it.
4. Leave `uncertain` results in place to be confirmed during normal per-URL extraction (a transient timeout shouldn't drop a possibly-live posting).
5. Only the surviving live URLs continue to the per-URL processing loop below.
2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain. **Do not treat that exit code as a hard stop for this sweep**—parse the per-URL verdicts, resolve expired/closed entries, keep `uncertain` entries pending, then continue.
3. For every URL the checker reports as **expired/closed**, resolve the pipeline entry instead of processing it: move it to "Processed" as `- [x] ~~URL | Company | Role~~ — posting expired (liveness sweep)` and, if it already has a tracker row, mark it `Discarded`. **Do not** extract the JD, evaluate, or generate a report/PDF for it.
4. Leave `uncertain` results in place to be confirmed during normal per-URL extraction (a transient timeout shouldn't drop a possibly-live posting).
5. Only the surviving live URLs continue to the per-URL processing loop below.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modes/pipeline.md` around lines 12 - 15, The doc is ambiguous about the
checker’s non-zero exit causing automation to abort; update the pipeline step to
explicitly instruct consumers to treat the checker’s non-zero exit as
informative (not fatal): either run the checker in a mode/with a wrapper that
always returns 0 and emits a machine-readable report, or explicitly ignore its
exit code and parse its stdout/JSON output to move expired/closed entries to
"Processed" and mark trackers `Discarded`, while leaving `uncertain` entries for
normal extraction; reference the checker by name (check-liveness.mjs) and add a
short example command or note showing how to run it such that output is parsed
even if the process exits non-zero.

Comment thread test-all.mjs
Comment on lines +637 to +648
const pipelineMode = readFile('modes/pipeline.md');
if (
pipelineMode.includes('## Liveness sweep') &&
pipelineMode.includes('check-liveness.mjs') &&
pipelineMode.includes('unconfirmed') &&
pipelineMode.includes('Do not') &&
pipelineMode.includes('liveness sweep')
) {
pass('pipeline mode sweeps unconfirmed entries for liveness before processing');
} else {
fail('pipeline mode missing batch liveness sweep for unconfirmed entries');
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Strengthen the pipeline-mode integrity assertion to lock the sweep contract, not just keywords.

This check can pass even if critical sweep semantics regress (especially handling uncertain/non-zero behavior). Add explicit assertions for those contract lines so issue #750 behavior stays protected.

Proposed test hardening
 if (
   pipelineMode.includes('## Liveness sweep') &&
   pipelineMode.includes('check-liveness.mjs') &&
   pipelineMode.includes('unconfirmed') &&
   pipelineMode.includes('Do not') &&
-  pipelineMode.includes('liveness sweep')
+  pipelineMode.includes('liveness sweep') &&
+  pipelineMode.includes('exits non-zero') &&
+  pipelineMode.includes('Leave `uncertain` results in place')
 ) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const pipelineMode = readFile('modes/pipeline.md');
if (
pipelineMode.includes('## Liveness sweep') &&
pipelineMode.includes('check-liveness.mjs') &&
pipelineMode.includes('unconfirmed') &&
pipelineMode.includes('Do not') &&
pipelineMode.includes('liveness sweep')
) {
pass('pipeline mode sweeps unconfirmed entries for liveness before processing');
} else {
fail('pipeline mode missing batch liveness sweep for unconfirmed entries');
}
const pipelineMode = readFile('modes/pipeline.md');
if (
pipelineMode.includes('## Liveness sweep') &&
pipelineMode.includes('check-liveness.mjs') &&
pipelineMode.includes('unconfirmed') &&
pipelineMode.includes('Do not') &&
pipelineMode.includes('liveness sweep') &&
pipelineMode.includes('exits non-zero') &&
pipelineMode.includes('Leave `uncertain` results in place')
) {
pass('pipeline mode sweeps unconfirmed entries for liveness before processing');
} else {
fail('pipeline mode missing batch liveness sweep for unconfirmed entries');
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test-all.mjs` around lines 637 - 648, The current check on pipelineMode (from
readFile('modes/pipeline.md')) is too loose—using generic includes that can be
accidentally satisfied; update the if-condition to assert the exact contract
lines instead of loose keywords: check for the full header "## Liveness sweep",
the exact script reference "check-liveness.mjs" (as a standalone token), a
precise sentence that asserts sweeping "unconfirmed" entries (e.g., "Sweeps
unconfirmed entries for liveness before processing"), the explicit directive "Do
not process entries before liveness sweep", and the contract line that states
how "uncertain" liveness is treated (the exact wording used to lock behavior for
issue `#750`, e.g., the sentence that says uncertain is treated as non-zero).
Replace the chained pipelineMode.includes(...) checks with explicit string
equality/substring checks for those full lines (or a small set of exact regexes)
so the test fails on any regression in the sweep contract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Apply mode has no liveness gate — unconfirmed pipeline roles generate wasted application materials

1 participant