fix(pipeline): batch liveness sweep for unconfirmed entries before apply#973
fix(pipeline): batch liveness sweep for unconfirmed entries before apply#973FReptar0 wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThe PR documents a pre-loop "Liveness sweep" in pipeline mode that batch-runs ChangesLiveness sweep documentation
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ifer#750) Batch scan adds roles to data/pipeline.md marked `**Verification:** unconfirmed (batch mode)` because Playwright is unavailable in headless mode — they're never checked for liveness. A user then applied to 8 roles, all already closed/filled, wasting time generating materials for dead postings. The existing gates only catch this one tab at a time: apply.md Step 5 preflight verifies the single open form, and auto-pipeline Step 0.5 gates per-URL during evaluation. Neither sweeps the inbox up front, so the user still opens every dead tab. This adds a "Liveness sweep" step to modes/pipeline.md that runs the zero-token check-liveness.mjs over all pending URLs in one batch before the per-URL loop, resolves expired entries to Processed (and Discarded in the tracker) without evaluating them, and leaves `uncertain` results for normal per-URL confirmation. apply.md cross-references the sweep for multi-role sessions. Complements — does not replace — the existing per-URL gates. Test: test-all.mjs asserts the pipeline sweep step markers exist (mirrors the santifer#937 eval-gate test). 243 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
7d136af to
33e859c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modes/pipeline.md`:
- Around line 12-15: The doc is ambiguous about the checker’s non-zero exit
causing automation to abort; update the pipeline step to explicitly instruct
consumers to treat the checker’s non-zero exit as informative (not fatal):
either run the checker in a mode/with a wrapper that always returns 0 and emits
a machine-readable report, or explicitly ignore its exit code and parse its
stdout/JSON output to move expired/closed entries to "Processed" and mark
trackers `Discarded`, while leaving `uncertain` entries for normal extraction;
reference the checker by name (check-liveness.mjs) and add a short example
command or note showing how to run it such that output is parsed even if the
process exits non-zero.
In `@test-all.mjs`:
- Around line 637-648: The current check on pipelineMode (from
readFile('modes/pipeline.md')) is too loose—using generic includes that can be
accidentally satisfied; update the if-condition to assert the exact contract
lines instead of loose keywords: check for the full header "## Liveness sweep",
the exact script reference "check-liveness.mjs" (as a standalone token), a
precise sentence that asserts sweeping "unconfirmed" entries (e.g., "Sweeps
unconfirmed entries for liveness before processing"), the explicit directive "Do
not process entries before liveness sweep", and the contract line that states
how "uncertain" liveness is treated (the exact wording used to lock behavior for
issue `#750`, e.g., the sentence that says uncertain is treated as non-zero).
Replace the chained pipelineMode.includes(...) checks with explicit string
equality/substring checks for those full lines (or a small set of exact regexes)
so the test fails on any regression in the sweep contract.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 61af2d9d-ec9a-4a19-9327-34a22902b669
📒 Files selected for processing (3)
modes/apply.mdmodes/pipeline.mdtest-all.mjs
| 2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain. | ||
| 3. For every URL the checker reports as **expired/closed**, resolve the pipeline entry instead of processing it: move it to "Processed" as `- [x] ~~URL | Company | Role~~ — posting expired (liveness sweep)` and, if it already has a tracker row, mark it `Discarded`. **Do not** extract the JD, evaluate, or generate a report/PDF for it. | ||
| 4. Leave `uncertain` results in place to be confirmed during normal per-URL extraction (a transient timeout shouldn't drop a possibly-live posting). | ||
| 5. Only the surviving live URLs continue to the per-URL processing loop below. |
There was a problem hiding this comment.
Clarify non-zero sweep exit handling so the workflow does not abort early.
The step says the checker exits non-zero when any URL is expired/uncertain, but the flow then assumes processing continues. Without an explicit “continue and parse output even on non-zero” instruction, automation can stop before expired entries are moved to Processed.
Proposed doc patch
-2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain.
+2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain. **Do not treat that exit code as a hard stop for this sweep**—parse the per-URL verdicts, resolve expired/closed entries, keep `uncertain` entries pending, then continue.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| 2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain. | |
| 3. For every URL the checker reports as **expired/closed**, resolve the pipeline entry instead of processing it: move it to "Processed" as `- [x] ~~URL | Company | Role~~ — posting expired (liveness sweep)` and, if it already has a tracker row, mark it `Discarded`. **Do not** extract the JD, evaluate, or generate a report/PDF for it. | |
| 4. Leave `uncertain` results in place to be confirmed during normal per-URL extraction (a transient timeout shouldn't drop a possibly-live posting). | |
| 5. Only the surviving live URLs continue to the per-URL processing loop below. | |
| 2. Run `node check-liveness.mjs --file <tmpfile>` (add `--throttle` for large batches to stay under WAF rate limits; it's pure Playwright, zero Claude tokens). The checker prints a per-URL verdict and exits non-zero if any are expired/uncertain. **Do not treat that exit code as a hard stop for this sweep**—parse the per-URL verdicts, resolve expired/closed entries, keep `uncertain` entries pending, then continue. | |
| 3. For every URL the checker reports as **expired/closed**, resolve the pipeline entry instead of processing it: move it to "Processed" as `- [x] ~~URL | Company | Role~~ — posting expired (liveness sweep)` and, if it already has a tracker row, mark it `Discarded`. **Do not** extract the JD, evaluate, or generate a report/PDF for it. | |
| 4. Leave `uncertain` results in place to be confirmed during normal per-URL extraction (a transient timeout shouldn't drop a possibly-live posting). | |
| 5. Only the surviving live URLs continue to the per-URL processing loop below. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modes/pipeline.md` around lines 12 - 15, The doc is ambiguous about the
checker’s non-zero exit causing automation to abort; update the pipeline step to
explicitly instruct consumers to treat the checker’s non-zero exit as
informative (not fatal): either run the checker in a mode/with a wrapper that
always returns 0 and emits a machine-readable report, or explicitly ignore its
exit code and parse its stdout/JSON output to move expired/closed entries to
"Processed" and mark trackers `Discarded`, while leaving `uncertain` entries for
normal extraction; reference the checker by name (check-liveness.mjs) and add a
short example command or note showing how to run it such that output is parsed
even if the process exits non-zero.
| const pipelineMode = readFile('modes/pipeline.md'); | ||
| if ( | ||
| pipelineMode.includes('## Liveness sweep') && | ||
| pipelineMode.includes('check-liveness.mjs') && | ||
| pipelineMode.includes('unconfirmed') && | ||
| pipelineMode.includes('Do not') && | ||
| pipelineMode.includes('liveness sweep') | ||
| ) { | ||
| pass('pipeline mode sweeps unconfirmed entries for liveness before processing'); | ||
| } else { | ||
| fail('pipeline mode missing batch liveness sweep for unconfirmed entries'); | ||
| } |
There was a problem hiding this comment.
Strengthen the pipeline-mode integrity assertion to lock the sweep contract, not just keywords.
This check can pass even if critical sweep semantics regress (especially handling uncertain/non-zero behavior). Add explicit assertions for those contract lines so issue #750 behavior stays protected.
Proposed test hardening
if (
pipelineMode.includes('## Liveness sweep') &&
pipelineMode.includes('check-liveness.mjs') &&
pipelineMode.includes('unconfirmed') &&
pipelineMode.includes('Do not') &&
- pipelineMode.includes('liveness sweep')
+ pipelineMode.includes('liveness sweep') &&
+ pipelineMode.includes('exits non-zero') &&
+ pipelineMode.includes('Leave `uncertain` results in place')
) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const pipelineMode = readFile('modes/pipeline.md'); | |
| if ( | |
| pipelineMode.includes('## Liveness sweep') && | |
| pipelineMode.includes('check-liveness.mjs') && | |
| pipelineMode.includes('unconfirmed') && | |
| pipelineMode.includes('Do not') && | |
| pipelineMode.includes('liveness sweep') | |
| ) { | |
| pass('pipeline mode sweeps unconfirmed entries for liveness before processing'); | |
| } else { | |
| fail('pipeline mode missing batch liveness sweep for unconfirmed entries'); | |
| } | |
| const pipelineMode = readFile('modes/pipeline.md'); | |
| if ( | |
| pipelineMode.includes('## Liveness sweep') && | |
| pipelineMode.includes('check-liveness.mjs') && | |
| pipelineMode.includes('unconfirmed') && | |
| pipelineMode.includes('Do not') && | |
| pipelineMode.includes('liveness sweep') && | |
| pipelineMode.includes('exits non-zero') && | |
| pipelineMode.includes('Leave `uncertain` results in place') | |
| ) { | |
| pass('pipeline mode sweeps unconfirmed entries for liveness before processing'); | |
| } else { | |
| fail('pipeline mode missing batch liveness sweep for unconfirmed entries'); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test-all.mjs` around lines 637 - 648, The current check on pipelineMode (from
readFile('modes/pipeline.md')) is too loose—using generic includes that can be
accidentally satisfied; update the if-condition to assert the exact contract
lines instead of loose keywords: check for the full header "## Liveness sweep",
the exact script reference "check-liveness.mjs" (as a standalone token), a
precise sentence that asserts sweeping "unconfirmed" entries (e.g., "Sweeps
unconfirmed entries for liveness before processing"), the explicit directive "Do
not process entries before liveness sweep", and the contract line that states
how "uncertain" liveness is treated (the exact wording used to lock behavior for
issue `#750`, e.g., the sentence that says uncertain is treated as non-zero).
Replace the chained pipelineMode.includes(...) checks with explicit string
equality/substring checks for those full lines (or a small set of exact regexes)
so the test fails on any regression in the sweep contract.
Summary
Fixes #750. Closes the remaining half of the liveness-gate class (the eval-side half landed in #937).
Batch scan adds roles to
data/pipeline.mdmarked**Verification:** unconfirmed (batch mode)because Playwright is unavailable in headless mode — they're never liveness-checked. The reporter then applied to 8 roles, all already closed/filled, wasting time generating materials for dead postings.Why the existing gates don't cover this
modes/apply.mdStep 5 preflight verifies the single form currently open in Chrome.modes/auto-pipeline.mdStep 0.5 (from fix(eval): gate dead links before evaluation in oferta/auto-pipeline #937) gates per-URL during evaluation.Neither sweeps the inbox up front, so the user still opens every dead tab one at a time before discovering it's closed.
Fix
Adds a "Liveness sweep" step to
modes/pipeline.mdthat runs the existing zero-tokencheck-liveness.mjsover all pending URLs in one batch before the per-URL loop:Discardedin the tracker) without being evaluated.uncertainresults are left for normal per-URL confirmation (a transient timeout shouldn't drop a possibly-live posting).apply.mdgets a short cross-reference pointing multi-role sessions at the sweep.Complements — does not replace — the per-URL gates. No new scripts;
check-liveness.mjsalready exists.Test plan
node test-all.mjs --quick→ 243 passed, 0 failed🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests