feat(rlvr): environment-grounded vaccine verification layer by VoidChecksum · Pull Request #169 · PurpleAILAB/Decepticon

VoidChecksum · 2026-05-06T07:45:24Z

Stacked on #168 — merges cleanly once the web PR lands. Targeting feat/web-dashboard so the diff shows only the RLVR Python changes.

Summary

Replaces the LLM-judged vaccine loop outcome with an environment-grounded verification pipeline. The defender agent can no longer self-report a successful defense — the sandbox itself decides.

Pre-defense baseline gate — PoC is run before defenses are applied. If the exploit never triggers, the finding is INVALID_SPEC → reward 0.0. Defenses can no longer earn reward for blocking something that was never exploitable.
N-run consensus — ExploitSpec.runs (1–10) + min_success_rate (0.0–1.0). Exploit must succeed on ≥ N% of runs. Flaky one-in-three triggers yield PARTIAL, not full reward 1.0.
Impact-pattern confirmation — separate impact_patterns field (e.g. uid=0, root@, data-exfil regexes). Confirms actual exploitation impact, not just that the payload fired. Unconfirmed impact applies a 0.7 confidence multiplier.
CVSS 3.1 base score — auto-derived from TargetCheck types and impact patterns using full CVSS 3.1 formula. Emitted on RLVRReward.cvss_score.
Fingerprint deduplication — SHA-256(poc_command + success_patterns + target_host) keyed in rlvr/dedup.jsonl. Duplicate specs yield ERROR reward, preventing the same finding from inflating signal.
Inconclusive detection — contradicting TargetChecks (port closed + service reachable for same host) → ERROR instead of forced BLOCKED/PASSED.
ZFP demotion — negative control command matching success patterns → entire result demoted to ERROR. Already present, now applied after N-run consensus.

New schemas

Type	Purpose
`PoCRunResult`	Single PoC run result with per-run signal matching
`PoCConsensus`	N-run aggregate: success_rate, agreed_signals, zfp_demoted
`BaselineEvidence`	Pre-defense PoC confirming spec validity
`CVSSEstimate`	CVSS 3.1 components + base_score + vector_string

ExploitSpec new fields

runs: int = 1                    # consensus run count (recommend 3)
min_success_rate: float = 1.0   # success fraction (0.67 for 2-of-3)
impact_patterns: list[str] = [] # actual-impact evidence regexes
target_host: str | None = None  # dedup fingerprint anchor

Reward tiers

Outcome	Reward	When
`BLOCKED`	1.0	Exploit fails, no success signals
`PARTIAL`	0.5	Success rate below threshold, or some env checks flipped
`PASSED`	0.0	Exploit still works post-defense
`ERROR`	0.0	ZFP demotion / invalid spec / inconclusive / duplicate

Files changed (Python only)

decepticon/schemas/exploit_spec.py — new fields
decepticon/schemas/env_verification.py — new schema types
decepticon/core/env_verifier.py — full verification pipeline
decepticon/tools/research/exploit_spec_writer.py — @tool exploit_spec_register
decepticon/orchestrator.py — Phase 4 env-grounded path + legacy fallback
decepticon/core/engagement_loop.py — PRE_DEFENSE snapshot + env-grounded path
decepticon/agents/{exploit,recon,scanner}.py — tool registration
decepticon/agents/prompts/{exploit,recon,scanner}.md — agent guidance
tests/unit/core/test_env_verifier.py — 13 tests (8 original + 5 new)

Test plan

uv run pytest tests/unit/core/test_env_verifier.py -v — 13/13 pass
uv run ruff check decepticon/ — clean
uv run basedpyright decepticon/ — 0 errors
Integration: run engagement with VACCINE_USE_ENV_VERIFIER=1, confirm workspace/rlvr/rewards.jsonl populated after vaccine phase

…ts, container hotswap Web UI: - Add engagement timeline page with live event stream - Add command palette (cmd+k) for keyboard navigation - Add streaming agent detail panel with real-time tool call inspection - Add live activity feed component for engagement monitoring - Add Opplan live overlay for plan visualization - Add health API endpoint for container readiness checks - Add engagement export API endpoint - Add engagement threads API endpoint - Add engagement timeline API endpoint - Refactor web terminal component with improved PTY handling - Refactor dashboard pages (engagements, graph, settings, main) - Add keyboard shortcut in sidebar Containers: - Refactor web.Dockerfile for streamlined build layers - Refactor web-entrypoint.sh with healthcheck awareness - Add web-hotswap.sh for zero-downtime container swap Backend: - Refactor Docker sandbox backend for resource lifecycle Skills: - Add stealth-infra shared skill Build: - Update Makefile targets - Update benchmark validation config

…b switch Two bugs fixed: 1. Terminal reset / task stop on tab switch: - Created EngagementContext + EngagementProvider — persists observer state across Next.js route navigation within an engagement - Lifted useRunObserver from LivePage into engagement layout so events, isRunning, and activeRunId survive tab switches - WebTerminal now rendered at layout level with CSS width control: 35% width on /live, 0 width (but still mounted) on other tabs - PTY connection stays alive; observer continues collecting events even while user is on Findings/Graph/Timeline tabs 2. Stuck "Processing" indicator in AgentDetailPanel: - Added STALENESS_THRESHOLD_MS (15s) staleness detection - deriveStatus now checks event.elapsed — if the most recent event is older than 15s and not followed by subagent_end, status degrades to "idle" instead of stuck "processing" Architecture: engagement/[id]/layout.tsx now fetches engagement data + plan-docs, runs the persistent observer, and hosts the terminal. LivePage consumes from context — only renders activity feed + graph.

Two-part fix for objectives stuck showing "Running" indefinitely: 1. Startup recovery — _recover_stale_objectives() scans the OPPLAN on engagement loop startup and resets any IN_PROGRESS objectives back to PENDING. Covers the crash/restart scenario: the loop marks an objective IN_PROGRESS, invokes the agent, then the process dies before writing COMPLETED/BLOCKED. Without recovery, _next_pending _objective() only considers PENDING objectives, so the orphan is never retried. 2. Exception safety — the agent invocation (_invoke_agent) is now wrapped in try/except. If the agent crashes (API error, timeout, unhandled exception), the objective transitions to BLOCKED instead of being left at IN_PROGRESS. An IterationResult is synthesized so the iteration history stays consistent. KeyboardInterrupt is still propagated for clean shutdown.

… API The engagement loop fix (bbce42d) recovers stale objectives on startup, but only if the loop actually starts. If the engagement is complete or the loop is never restarted, the stale "in-progress" status persists in opplan.json and the UI shows "Running" indefinitely. This adds a read-time staleness check in the /api/engagements/[id]/opplan route: if opplan.json hasn't been modified in 10 minutes, any objectives still marked "in-progress" are downgraded to "pending" in the response. The file on disk is NOT mutated — the loop owns writes; the API only sanitizes the display.

…ploitation LLM prompt injection testing (OBJ-008) and similar iterative web exploitation workflows require many graph steps: - schema discovery → payload crafting → request → response analysis - each iteration burns ~3-5 steps At 400 steps the exploit agent exhausts its budget before completing legitimate multi-step exploitation objectives. Raise exploit agent recursion_limit to 1000 to accommodate: - prompt injection fuzzing - multi-step web exploitation - protocol abuse testing Fixes #127

… responses) Multiple int() calls in the research/reporting/middleware stack lacked ValueError/TypeError guards when parsing externally-sourced data: - kg_ingest_ffuf (tools.py): int(row.get('status') or 0) — the 'or 0' pattern is NOT a safety net; a non-empty HTML string is truthy, so int() receives the raw string and crashes. - rank_candidates (scanner_tools.py): int(hit.get('line', 0)) - _top_chains (executive.py): int(node.props.get('length', 0)) - opplan.py: int(parent.get('priority', 100)) and child equivalent When the LLM endpoint returns HTML instead of JSON (e.g. WAF block, error page, schema mismatch), agent-generated code or ingestion tools may pass HTML content into fields expected to be numeric. Without error handling, the entire agent loop crashes. Changes: - Wrap all int() calls on externally-sourced data in try/except - Fall back to sensible defaults (0, 100, etc.) - Log warnings where appropriate Fixes #129

…alls Local models (Ollama, qwen3-coder, etc.) occasionally produce malformed JSON for tool call arguments: - 'options' as a JSON string instead of a JSON array - 'header' longer than max_length=12 Add BeforeValidator coercers that silently normalize these patterns so the engagement flow stays alive instead of dropping into a bare ask_user_question error loop.

…reValidator behavior

…lines

- skills.py: add log warnings to silent skill-load failures (L157/174/374) - complete_planning.py: add BeforeValidator coercer for engagement_name (empty→fallback, whitespace strip, >64 chars truncate) - research/tools.py: wrap unprotected json.loads in try/except

- opplan after_model: block parallel objective_expand/collapse (race condition fix) - DockerConfig.stall_seconds: 3.0→5.0 (reduce false-stall aborts on slow network scans) - decepticon.py: add # noqa PLC0415 with lazy-load rationale comment - complete_planning: add 7 unit tests for _sanitize_engagement_name

5 new middleware modules that eliminate manual work and enforce quality: AutoContextMiddleware Auto-injects engagement state (workspace, scope, progress, findings) into every model call. Agent never manually writes context in task(). RoEGuardMiddleware Intercepts task() calls, extracts target domains/IPs, cross-references with roe.json scope patterns. Blocks out-of-scope delegations before they reach sub-agents. Cache scope patterns for 5 min. FindingGuardMiddleware Zero false positive enforcement via 5-method verification: 1. Evidence check (code block/HTTP trace/tool output) 2. Reproducibility (steps/PoC) 3. Impact statement 4. Anti-speculation (no hedging language) 5. Severity-impact alignment (critical requires demonstrated exploitation) + Content hash dedup against existing findings BashIntelMiddleware Post-processes bash tool output: extracts open ports (nmap), HTTP status codes, tech stack headers, version strings, error indicators. Injects compact intel summary above raw output. SmartRetryMiddleware On BLOCKED objectives, cross-references failure reason against bypass technique knowledge base. Injects alternative approach suggestions (parameter splitting, encoding bypass, JWT confusion...). + build_resume_briefing() for one-shot engagement restart context. + 10 bypass technique categories with 2-3 hints each. Integrated into decepticon orchestrator + recon agent.

…re stack" This reverts commit 504670f.

- docker_sandbox.py: align STALL_SECONDS constant (3.0→5.0) with DockerConfig default to fix test_constants_match_config_defaults - subagent_streaming.py: guard tool-call id key lookup with None check; widen active_tool_calls type to dict[str, Any] - llm/factory.py: widen kwargs type annotation to dict[str, Any], eliminating ~80 suppressed pyright warnings - tools/ad/bloodhound.py: fix _build_bh_index to correctly iterate graph.nodes.values() and return dict[str, Node] - tools/research/tools.py: fix ChainStep field access (node_kind, crown_jewel_label, entrypoint_label) — was crashing on AttributeError at runtime

@tool

Replaces LLM-judged VerificationResult with environment-grounded verification that produces a scalar RLVR reward from raw system signals — no LLM in the verification path. ## Motivation The vaccine loop (attack → defense → re-attack) previously trusted the defender agent to self-report re_attack_outcome. This is gameable and produces noisy reward signal. The fix: use the target system environment itself as the verifier. ## Architecture ### ExploitSpec (decepticon/schemas/exploit_spec.py) Machine-readable replay spec written by offensive agents at finding-discovery time: - poc_command: exact shell command reproducing the exploit - success_patterns: regexes proving exploit succeeded (min 1) - negative_command: ZFP baseline (optional) - target_checks: discriminated union of PortCheck / ServiceCheck / CredentialCheck / CommandOutputCheck / FileCheck probes ### EnvironmentVerifier (decepticon/core/env_verifier.py) Independent verifier — no LLM: 1. capture_state() runs all target_checks pre/post defense 2. verify_blocked() replays poc_command with ZFP demotion 3. Outcome determined from signal table: - zfp_demoted → ERROR - no success signals → BLOCKED (reward 1.0) - signals + all checks still positive → PASSED (reward 0.0) - signals + some checks flipped → PARTIAL (reward 0.5) 4. compute_reward() → RLVRReward scalar 5. Appends to workspace/rlvr/rewards.jsonl (training stream) ### Workspace layout workspace/ findings/FIND-001-exploit-spec.json ← offensive agent writes verification/FIND-001-pre-snapshot.json verification/FIND-001-post-snapshot.json verification/FIND-001-evidence.json rlvr/rewards.jsonl ← append-only RLVR stream ### Backward compatibility ExploitSpec missing for a finding → falls back to legacy _load_verification_result (LLM-written JSON). Gated by VACCINE_USE_ENV_VERIFIER env var (default on). ### exploit_spec_register tool LangChain @tool added to exploit, recon, scanner agents. Offensive agents call it after writing FIND-NNN.md to register a self-contained spec for env-grounded verification. ## Tests 8 new async unit tests (pytest-asyncio auto mode): - pre/post defense reward transitions (PASSED→BLOCKED) - ZFP demotion → ERROR - PARTIAL reward from partial check flips - rewards.jsonl append and round-trip JSON validity - snapshot/evidence persistence - spec load/missing round-trips ## Files changed New: decepticon/schemas/exploit_spec.py decepticon/schemas/env_verification.py decepticon/core/env_verifier.py decepticon/tools/research/exploit_spec_writer.py tests/unit/core/test_env_verifier.py Modified: decepticon/orchestrator.py — Phase 4 uses _verify_finding() decepticon/core/engagement_loop.py — pre-snapshot before defender, _verify_finding_env() post-defender decepticon/agents/exploit.py — exploit_spec_register added decepticon/agents/recon.py — exploit_spec_register added decepticon/agents/scanner.py — exploit_spec_register added decepticon/agents/prompts/*.md — vaccine loop instructions added

# Conflicts: # clients/web/src/app/(dashboard)/page.tsx # containers/web.Dockerfile # decepticon/backends/docker_sandbox.py # decepticon/core/engagement_loop.py # decepticon/middleware/engagement.py # decepticon/middleware/opplan.py # decepticon/orchestrator.py # decepticon/tools/interaction/ask_user.py # decepticon/tools/research/scanner_tools.py # tests/unit/tools/test_ask_user_question.py

…t patterns, dedup Upgrades EnvironmentVerifier from single-run binary verification to a multi-signal triager-grade pipeline. Eliminates the four primary sources of false positives that would cause a security triager to reject findings. ## New verification pipeline (verify_blocked) 1. **Duplicate fingerprint gate** — SHA-256(poc_command + success_patterns + target_host) keyed in rlvr/dedup.jsonl. Duplicate specs yield ERROR reward immediately, preventing the same finding from inflating reward signal. 2. **Baseline validity gate** — verify_baseline() runs the PoC BEFORE defenses are applied and confirms the exploit actually works. If baseline.valid=False (exploit never triggered), the finding is INVALID_SPEC → ERROR. Defenses can no longer earn reward for "blocking" exploits that were never exploitable. 3. **N-run consensus** — ExploitSpec gains `runs` (1–10) and `min_success_rate` (0.0–1.0). Each run is independent; agreed_signals = intersection across all successful runs. Flaky exploits that only work 1/3 times yield PARTIAL, not the full 1.0 reward. ZFP check runs once after consensus, not per-run. 4. **Impact pattern confirmation** — ExploitSpec gains `impact_patterns` (separate from trigger patterns). Patterns like `uid=0`, `root@`, data-exfil regexes confirm ACTUAL IMPACT, not just "exploit ran". Unconfirmed impact lowers confidence multiplier to 0.7 when impact_patterns are declared. 5. **CVSS 3.1 base score estimation** — _estimate_cvss() derives AV/AC/PR/UI/ Scope/CIA from TargetCheck types and impact_patterns heuristics. Emitted on RLVRReward.cvss_score. Triagers can threshold on CVSS ≥ 7.0. 6. **Inconclusive detection** — _check_inconclusive() flags contradictions where a PortCheck says closed but a ServiceCheck for the same host says reachable. Yields ERROR reward instead of forcing a wrong BLOCKED/PASSED call. ## New schemas - `PoCRunResult` — single run result with per-run signal matching - `PoCConsensus` — N-run aggregate: n_runs, n_success, success_rate, agreed_signals - `BaselineEvidence` — pre-defense PoC result confirming spec validity - `CVSSEstimate` — CVSS 3.1 components + computed base_score + vector_string ## ExploitSpec additions - `runs: int = 1` — consensus run count (recommend 3 for reliable findings) - `min_success_rate: float = 1.0` — success fraction threshold (0.67 for 2/3) - `impact_patterns: list[str] = []` — actual impact evidence regexes - `target_host: str | None` — dedup fingerprint anchor ## Backward compatibility - Legacy `PoCEvidence` field on VerificationEvidence still populated (best-run) - Legacy `_determine_outcome` preserved for existing call sites - verify_blocked signature unchanged (baseline param is optional) - All 8 existing tests pass unchanged; 5 new tests added (13 total) ## Test coverage added - N-run consensus 2/3 → PARTIAL (below min_success_rate) - N-run consensus 2/3 → PASSED (meets 0.5 threshold, signals still present) - Baseline invalid → ERROR propagation through verify_blocked - Impact patterns → evidence.impact_signals_matched + reward.impact_confirmed - Duplicate detection → second registration yields duplicate_of + ERROR

VoidChecksum added 20 commits May 3, 2026 21:15

fix(middleware): add LangGraph reducer for engagement_name

e805703

fix(test): update test_truncates_header_longer_than_max to match Befo…

3d7aed7

…reValidator behavior

fix(ask_user): handle Python-syntax options strings and unescaped new…

4aa0fbe

…lines

Revert "feat(intelligence): comprehensive agent intelligence middlewa…

142a7ef

…re stack" This reverts commit 504670f.

fix(ci): ruff format + basedpyright — Python RLVR files

6fa60f1

fix(ci): basedpyright — assert self._state narrowing in engagement_loop

4d85ede

VoidChecksum requested a review from PurpleCHOIms as a code owner May 6, 2026 07:45

VoidChecksum added the enhancement New feature or request label May 6, 2026

VoidChecksum mentioned this pull request May 6, 2026

feat(vaccine): RLVR environment-grounded verification layer #167

Closed

PurpleCHOIms deleted the branch feat/web-dashboard May 9, 2026 07:56

PurpleCHOIms closed this May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(rlvr): environment-grounded vaccine verification layer#169

feat(rlvr): environment-grounded vaccine verification layer#169
VoidChecksum wants to merge 20 commits into
feat/web-dashboardfrom
feat/rlvr-only

VoidChecksum commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

VoidChecksum commented May 6, 2026

Summary

New schemas

ExploitSpec new fields

Reward tiers

Files changed (Python only)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants