fix(pipeline): batch liveness sweep for unconfirmed entries before apply by FReptar0 · Pull Request #973 · santifer/career-ops

FReptar0 · 2026-06-12T19:00:33Z

Summary

Fixes #750. Closes the remaining half of the liveness-gate class (the eval-side half landed in #937).

Batch scan adds roles to data/pipeline.md marked **Verification:** unconfirmed (batch mode) because Playwright is unavailable in headless mode — they're never liveness-checked. The reporter then applied to 8 roles, all already closed/filled, wasting time generating materials for dead postings.

Why the existing gates don't cover this

modes/apply.md Step 5 preflight verifies the single form currently open in Chrome.
modes/auto-pipeline.md Step 0.5 (from fix(eval): gate dead links before evaluation in oferta/auto-pipeline #937) gates per-URL during evaluation.

Neither sweeps the inbox up front, so the user still opens every dead tab one at a time before discovering it's closed.

Fix

Adds a "Liveness sweep" step to modes/pipeline.md that runs the existing zero-token check-liveness.mjs over all pending URLs in one batch before the per-URL loop:

Expired/closed entries are resolved to "Processed" (and Discarded in the tracker) without being evaluated.
uncertain results are left for normal per-URL confirmation (a transient timeout shouldn't drop a possibly-live posting).
apply.md gets a short cross-reference pointing multi-role sessions at the sweep.

Complements — does not replace — the per-URL gates. No new scripts; check-liveness.mjs already exists.

Test plan

node test-all.mjs --quick → 243 passed, 0 failed
New test asserts the pipeline sweep step markers exist (mirrors the fix(eval): gate dead links before evaluation in oferta/auto-pipeline #937 eval-gate test)
Pure mode-contract + test change — no script behavior altered

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added a pre-processing "Liveness sweep" that batch-detects expired/closed URLs, moves them to Processed with expiration notes, and prevents further processing of expired entries.
Documentation
- Added a preflight instruction to run the liveness sweep before per-URL processing in multi-role sessions.
Tests
- Added a mode integrity check that ensures liveness-sweep guidance is present.

coderabbitai · 2026-06-12T19:00:50Z

📝 Walkthrough

Walkthrough

The PR documents a pre-loop "Liveness sweep" in pipeline mode that batch-runs check-liveness.mjs to move expired/closed pending URLs to Processed before per-URL work; apply mode adds a preflight note to run this sweep for multi-role sessions; tests assert the pipeline doc contains the sweep guidance.

Changes

Liveness sweep documentation

Layer / File(s)	Summary
Liveness sweep workflow documentation `modes/pipeline.md`	Adds a pre-loop "Liveness sweep": collect pending URLs, run `node check-liveness.mjs --file <tmpfile>` (optional `--throttle`), move expired/closed entries to "Processed" with an expiration note and mark trackers `Discarded`; keep "uncertain" entries pending. Updates Workflow Step 1 to require this sweep before iterating pending URLs.
Apply mode preflight note `modes/apply.md`	Inserts a multi-role preflight instruction to run the pipeline "Liveness sweep" (`node check-liveness.mjs --file <urls>`) so expired postings are removed from `data/pipeline.md` before opening tabs.
Mode file integrity test `test-all.mjs`	Adds assertions in the "Mode file integrity" suite to verify `modes/pipeline.md` contains `## Liveness sweep`, `check-liveness.mjs`, `unconfirmed`, and `Do not` guidance; emits a dedicated pass message or fails with a clear missing-content message.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

batch-runner.sh never reconciles data/pipeline.md — processed offers re-surface and get re-evaluated #711: Both address preventing reprocessing of pipeline.md entries and reconciling processed/expired items; this PR's pre-loop sweep complements that reconciliation concern.

Possibly related PRs

santifer/career-ops#887: Related preflight liveness checks in apply-mode documentation and tests.
santifer/career-ops#374: Changes to expired-detection logic (classifyLiveness) that affect how the sweep classifies expired postings.
santifer/career-ops#937: Adds liveness gating at evaluation entry points; overlaps with this PR's documentation and test validation.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding a batch liveness sweep step to the pipeline mode before processing, which directly addresses issue `#750`.
Linked Issues check	✅ Passed	The PR fulfills issue `#750` requirements: adds pre-check liveness sweep to pipeline mode, auto-marks expired/closed roles as Discarded/Processed, preserves uncertain results, and informs users of closed roles.
Out of Scope Changes check	✅ Passed	All changes align with issue `#750` scope: documentation updates to modes/pipeline.md and modes/apply.md, plus a test verification step in test-all.mjs. No unrelated modifications detected.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ifer#750) Batch scan adds roles to data/pipeline.md marked `**Verification:** unconfirmed (batch mode)` because Playwright is unavailable in headless mode — they're never checked for liveness. A user then applied to 8 roles, all already closed/filled, wasting time generating materials for dead postings. The existing gates only catch this one tab at a time: apply.md Step 5 preflight verifies the single open form, and auto-pipeline Step 0.5 gates per-URL during evaluation. Neither sweeps the inbox up front, so the user still opens every dead tab. This adds a "Liveness sweep" step to modes/pipeline.md that runs the zero-token check-liveness.mjs over all pending URLs in one batch before the per-URL loop, resolves expired entries to Processed (and Discarded in the tracker) without evaluating them, and leaves `uncertain` results for normal per-URL confirmation. apply.md cross-references the sweep for multi-role sessions. Complements — does not replace — the existing per-URL gates. Test: test-all.mjs asserts the pipeline sweep step markers exist (mirrors the santifer#937 eval-gate test). 243 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modes/pipeline.md`:
- Around line 12-15: The doc is ambiguous about the checker’s non-zero exit
causing automation to abort; update the pipeline step to explicitly instruct
consumers to treat the checker’s non-zero exit as informative (not fatal):
either run the checker in a mode/with a wrapper that always returns 0 and emits
a machine-readable report, or explicitly ignore its exit code and parse its
stdout/JSON output to move expired/closed entries to "Processed" and mark
trackers `Discarded`, while leaving `uncertain` entries for normal extraction;
reference the checker by name (check-liveness.mjs) and add a short example
command or note showing how to run it such that output is parsed even if the
process exits non-zero.

In `@test-all.mjs`:
- Around line 637-648: The current check on pipelineMode (from
readFile('modes/pipeline.md')) is too loose—using generic includes that can be
accidentally satisfied; update the if-condition to assert the exact contract
lines instead of loose keywords: check for the full header "## Liveness sweep",
the exact script reference "check-liveness.mjs" (as a standalone token), a
precise sentence that asserts sweeping "unconfirmed" entries (e.g., "Sweeps
unconfirmed entries for liveness before processing"), the explicit directive "Do
not process entries before liveness sweep", and the contract line that states
how "uncertain" liveness is treated (the exact wording used to lock behavior for
issue `#750`, e.g., the sentence that says uncertain is treated as non-zero).
Replace the chained pipelineMode.includes(...) checks with explicit string
equality/substring checks for those full lines (or a small set of exact regexes)
so the test fails on any regression in the sweep contract.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 61af2d9d-ec9a-4a19-9327-34a22902b669

📥 Commits

Reviewing files that changed from the base of the PR and between 7d136af and 33e859c.

📒 Files selected for processing (3)

modes/apply.md
modes/pipeline.md
test-all.mjs

coderabbitai · 2026-06-13T22:13:37Z

+2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain.
+3. For every URL the checker reports as **expired/closed**, resolve the pipeline entry instead of processing it: move it to "Processed" as `- [x] ~~URL | Company | Role~~ — posting expired (liveness sweep)` and, if it already has a tracker row, mark it `Discarded`. **Do not** extract the JD, evaluate, or generate a report/PDF for it.
+4. Leave `uncertain` results in place to be confirmed during normal per-URL extraction (a transient timeout shouldn't drop a possibly-live posting).
+5. Only the surviving live URLs continue to the per-URL processing loop below.


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clarify non-zero sweep exit handling so the workflow does not abort early.

The step says the checker exits non-zero when any URL is expired/uncertain, but the flow then assumes processing continues. Without an explicit “continue and parse output even on non-zero” instruction, automation can stop before expired entries are moved to Processed.

Proposed doc patch

-2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain. +2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain. **Do not treat that exit code as a hard stop for this sweep**—parse the per-URL verdicts, resolve expired/closed entries, keep `uncertain` entries pending, then continue.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain.

3. For every URL the checker reports as **expired/closed**, resolve the pipeline entry instead of processing it: move it to "Processed" as `- [x] ~~URL | Company | Role~~ — posting expired (liveness sweep)` and, if it already has a tracker row, mark it `Discarded`. **Do not** extract the JD, evaluate, or generate a report/PDF for it.

4. Leave `uncertain` results in place to be confirmed during normal per-URL extraction (a transient timeout shouldn't drop a possibly-live posting).

5. Only the surviving live URLs continue to the per-URL processing loop below.

2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain. **Do not treat that exit code as a hard stop for this sweep**—parse the per-URL verdicts, resolve expired/closed entries, keep `uncertain` entries pending, then continue.

3. For every URL the checker reports as **expired/closed**, resolve the pipeline entry instead of processing it: move it to "Processed" as `- [x] ~~URL | Company | Role~~ — posting expired (liveness sweep)` and, if it already has a tracker row, mark it `Discarded`. **Do not** extract the JD, evaluate, or generate a report/PDF for it.

4. Leave `uncertain` results in place to be confirmed during normal per-URL extraction (a transient timeout shouldn't drop a possibly-live posting).

5. Only the surviving live URLs continue to the per-URL processing loop below.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modes/pipeline.md` around lines 12 - 15, The doc is ambiguous about the checker’s non-zero exit causing automation to abort; update the pipeline step to explicitly instruct consumers to treat the checker’s non-zero exit as informative (not fatal): either run the checker in a mode/with a wrapper that always returns 0 and emits a machine-readable report, or explicitly ignore its exit code and parse its stdout/JSON output to move expired/closed entries to "Processed" and mark trackers `Discarded`, while leaving `uncertain` entries for normal extraction; reference the checker by name (check-liveness.mjs) and add a short example command or note showing how to run it such that output is parsed even if the process exits non-zero.

coderabbitai · 2026-06-13T22:13:37Z

+const pipelineMode = readFile('modes/pipeline.md');
+if (
+  pipelineMode.includes('## Liveness sweep') &&
+  pipelineMode.includes('check-liveness.mjs') &&
+  pipelineMode.includes('unconfirmed') &&
+  pipelineMode.includes('Do not') &&
+  pipelineMode.includes('liveness sweep')
+) {
+  pass('pipeline mode sweeps unconfirmed entries for liveness before processing');
+} else {
+  fail('pipeline mode missing batch liveness sweep for unconfirmed entries');
+}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Strengthen the pipeline-mode integrity assertion to lock the sweep contract, not just keywords.

This check can pass even if critical sweep semantics regress (especially handling uncertain/non-zero behavior). Add explicit assertions for those contract lines so issue #750 behavior stays protected.

Proposed test hardening

if ( pipelineMode.includes('## Liveness sweep') && pipelineMode.includes('check-liveness.mjs') && pipelineMode.includes('unconfirmed') && pipelineMode.includes('Do not') && - pipelineMode.includes('liveness sweep') + pipelineMode.includes('liveness sweep') && + pipelineMode.includes('exits non-zero') && + pipelineMode.includes('Leave `uncertain` results in place') ) {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const pipelineMode = readFile('modes/pipeline.md');

if (

pipelineMode.includes('## Liveness sweep') &&

pipelineMode.includes('check-liveness.mjs') &&

pipelineMode.includes('unconfirmed') &&

pipelineMode.includes('Do not') &&

pipelineMode.includes('liveness sweep')

) {

pass('pipeline mode sweeps unconfirmed entries for liveness before processing');

} else {

fail('pipeline mode missing batch liveness sweep for unconfirmed entries');

}

const pipelineMode = readFile('modes/pipeline.md');

if (

pipelineMode.includes('## Liveness sweep') &&

pipelineMode.includes('check-liveness.mjs') &&

pipelineMode.includes('unconfirmed') &&

pipelineMode.includes('Do not') &&

pipelineMode.includes('liveness sweep') &&

pipelineMode.includes('exits non-zero') &&

pipelineMode.includes('Leave `uncertain` results in place')

) {

pass('pipeline mode sweeps unconfirmed entries for liveness before processing');

} else {

fail('pipeline mode missing batch liveness sweep for unconfirmed entries');

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test-all.mjs` around lines 637 - 648, The current check on pipelineMode (from readFile('modes/pipeline.md')) is too loose—using generic includes that can be accidentally satisfied; update the if-condition to assert the exact contract lines instead of loose keywords: check for the full header "## Liveness sweep", the exact script reference "check-liveness.mjs" (as a standalone token), a precise sentence that asserts sweeping "unconfirmed" entries (e.g., "Sweeps unconfirmed entries for liveness before processing"), the explicit directive "Do not process entries before liveness sweep", and the contract line that states how "uncertain" liveness is treated (the exact wording used to lock behavior for issue `#750`, e.g., the sentence that says uncertain is treated as non-zero). Replace the chained pipelineMode.includes(...) checks with explicit string equality/substring checks for those full lines (or a small set of exact regexes) so the test fails on any regression in the sweep contract.

github-actions Bot added ⚠️ agent-behavior 🔧 scripts labels Jun 12, 2026

This was referenced Jun 12, 2026

Apply mode has no liveness gate — unconfirmed pipeline roles generate wasted application materials #750

Open

feat(scan): add content/description filter for providers #974

Open

FReptar0 force-pushed the fix/750-apply-liveness-gate branch from 7d136af to 33e859c Compare June 13, 2026 22:09

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(pipeline): batch liveness sweep for unconfirmed entries before apply#973

fix(pipeline): batch liveness sweep for unconfirmed entries before apply#973
FReptar0 wants to merge 1 commit into
santifer:mainfrom
FReptar0:fix/750-apply-liveness-gate

FReptar0 commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Uh oh!

coderabbitai Bot Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

FReptar0 commented Jun 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why the existing gates don't cover this

Fix

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FReptar0 commented Jun 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading