feat(rlvr): environment-grounded vaccine verification layer#169
Closed
VoidChecksum wants to merge 20 commits into
Closed
feat(rlvr): environment-grounded vaccine verification layer#169VoidChecksum wants to merge 20 commits into
VoidChecksum wants to merge 20 commits into
Conversation
…ts, container hotswap Web UI: - Add engagement timeline page with live event stream - Add command palette (cmd+k) for keyboard navigation - Add streaming agent detail panel with real-time tool call inspection - Add live activity feed component for engagement monitoring - Add Opplan live overlay for plan visualization - Add health API endpoint for container readiness checks - Add engagement export API endpoint - Add engagement threads API endpoint - Add engagement timeline API endpoint - Refactor web terminal component with improved PTY handling - Refactor dashboard pages (engagements, graph, settings, main) - Add keyboard shortcut in sidebar Containers: - Refactor web.Dockerfile for streamlined build layers - Refactor web-entrypoint.sh with healthcheck awareness - Add web-hotswap.sh for zero-downtime container swap Backend: - Refactor Docker sandbox backend for resource lifecycle Skills: - Add stealth-infra shared skill Build: - Update Makefile targets - Update benchmark validation config
…b switch
Two bugs fixed:
1. Terminal reset / task stop on tab switch:
- Created EngagementContext + EngagementProvider — persists observer state
across Next.js route navigation within an engagement
- Lifted useRunObserver from LivePage into engagement layout so
events, isRunning, and activeRunId survive tab switches
- WebTerminal now rendered at layout level with CSS width control:
35% width on /live, 0 width (but still mounted) on other tabs
- PTY connection stays alive; observer continues collecting events
even while user is on Findings/Graph/Timeline tabs
2. Stuck "Processing" indicator in AgentDetailPanel:
- Added STALENESS_THRESHOLD_MS (15s) staleness detection
- deriveStatus now checks event.elapsed — if the most recent event
is older than 15s and not followed by subagent_end, status
degrades to "idle" instead of stuck "processing"
Architecture: engagement/[id]/layout.tsx now fetches engagement data
+ plan-docs, runs the persistent observer, and hosts the terminal.
LivePage consumes from context — only renders activity feed + graph.
Two-part fix for objectives stuck showing "Running" indefinitely: 1. Startup recovery — _recover_stale_objectives() scans the OPPLAN on engagement loop startup and resets any IN_PROGRESS objectives back to PENDING. Covers the crash/restart scenario: the loop marks an objective IN_PROGRESS, invokes the agent, then the process dies before writing COMPLETED/BLOCKED. Without recovery, _next_pending _objective() only considers PENDING objectives, so the orphan is never retried. 2. Exception safety — the agent invocation (_invoke_agent) is now wrapped in try/except. If the agent crashes (API error, timeout, unhandled exception), the objective transitions to BLOCKED instead of being left at IN_PROGRESS. An IterationResult is synthesized so the iteration history stays consistent. KeyboardInterrupt is still propagated for clean shutdown.
… API The engagement loop fix (bbce42d) recovers stale objectives on startup, but only if the loop actually starts. If the engagement is complete or the loop is never restarted, the stale "in-progress" status persists in opplan.json and the UI shows "Running" indefinitely. This adds a read-time staleness check in the /api/engagements/[id]/opplan route: if opplan.json hasn't been modified in 10 minutes, any objectives still marked "in-progress" are downgraded to "pending" in the response. The file on disk is NOT mutated — the loop owns writes; the API only sanitizes the display.
…ploitation LLM prompt injection testing (OBJ-008) and similar iterative web exploitation workflows require many graph steps: - schema discovery → payload crafting → request → response analysis - each iteration burns ~3-5 steps At 400 steps the exploit agent exhausts its budget before completing legitimate multi-step exploitation objectives. Raise exploit agent recursion_limit to 1000 to accommodate: - prompt injection fuzzing - multi-step web exploitation - protocol abuse testing Fixes #127
… responses)
Multiple int() calls in the research/reporting/middleware stack lacked
ValueError/TypeError guards when parsing externally-sourced data:
- kg_ingest_ffuf (tools.py): int(row.get('status') or 0) — the 'or 0'
pattern is NOT a safety net; a non-empty HTML string is truthy, so
int() receives the raw string and crashes.
- rank_candidates (scanner_tools.py): int(hit.get('line', 0))
- _top_chains (executive.py): int(node.props.get('length', 0))
- opplan.py: int(parent.get('priority', 100)) and child equivalent
When the LLM endpoint returns HTML instead of JSON (e.g. WAF block,
error page, schema mismatch), agent-generated code or ingestion tools
may pass HTML content into fields expected to be numeric. Without
error handling, the entire agent loop crashes.
Changes:
- Wrap all int() calls on externally-sourced data in try/except
- Fall back to sensible defaults (0, 100, etc.)
- Log warnings where appropriate
Fixes #129
…alls Local models (Ollama, qwen3-coder, etc.) occasionally produce malformed JSON for tool call arguments: - 'options' as a JSON string instead of a JSON array - 'header' longer than max_length=12 Add BeforeValidator coercers that silently normalize these patterns so the engagement flow stays alive instead of dropping into a bare ask_user_question error loop.
…reValidator behavior
- skills.py: add log warnings to silent skill-load failures (L157/174/374) - complete_planning.py: add BeforeValidator coercer for engagement_name (empty→fallback, whitespace strip, >64 chars truncate) - research/tools.py: wrap unprotected json.loads in try/except
- opplan after_model: block parallel objective_expand/collapse (race condition fix) - DockerConfig.stall_seconds: 3.0→5.0 (reduce false-stall aborts on slow network scans) - decepticon.py: add # noqa PLC0415 with lazy-load rationale comment - complete_planning: add 7 unit tests for _sanitize_engagement_name
5 new middleware modules that eliminate manual work and enforce quality: AutoContextMiddleware Auto-injects engagement state (workspace, scope, progress, findings) into every model call. Agent never manually writes context in task(). RoEGuardMiddleware Intercepts task() calls, extracts target domains/IPs, cross-references with roe.json scope patterns. Blocks out-of-scope delegations before they reach sub-agents. Cache scope patterns for 5 min. FindingGuardMiddleware Zero false positive enforcement via 5-method verification: 1. Evidence check (code block/HTTP trace/tool output) 2. Reproducibility (steps/PoC) 3. Impact statement 4. Anti-speculation (no hedging language) 5. Severity-impact alignment (critical requires demonstrated exploitation) + Content hash dedup against existing findings BashIntelMiddleware Post-processes bash tool output: extracts open ports (nmap), HTTP status codes, tech stack headers, version strings, error indicators. Injects compact intel summary above raw output. SmartRetryMiddleware On BLOCKED objectives, cross-references failure reason against bypass technique knowledge base. Injects alternative approach suggestions (parameter splitting, encoding bypass, JWT confusion...). + build_resume_briefing() for one-shot engagement restart context. + 10 bypass technique categories with 2-3 hints each. Integrated into decepticon orchestrator + recon agent.
…re stack" This reverts commit 504670f.
- docker_sandbox.py: align STALL_SECONDS constant (3.0→5.0) with DockerConfig default to fix test_constants_match_config_defaults - subagent_streaming.py: guard tool-call id key lookup with None check; widen active_tool_calls type to dict[str, Any] - llm/factory.py: widen kwargs type annotation to dict[str, Any], eliminating ~80 suppressed pyright warnings - tools/ad/bloodhound.py: fix _build_bh_index to correctly iterate graph.nodes.values() and return dict[str, Node] - tools/research/tools.py: fix ChainStep field access (node_kind, crown_jewel_label, entrypoint_label) — was crashing on AttributeError at runtime
Replaces LLM-judged VerificationResult with environment-grounded
verification that produces a scalar RLVR reward from raw system
signals — no LLM in the verification path.
## Motivation
The vaccine loop (attack → defense → re-attack) previously trusted
the defender agent to self-report re_attack_outcome. This is
gameable and produces noisy reward signal. The fix: use the target
system environment itself as the verifier.
## Architecture
### ExploitSpec (decepticon/schemas/exploit_spec.py)
Machine-readable replay spec written by offensive agents at
finding-discovery time:
- poc_command: exact shell command reproducing the exploit
- success_patterns: regexes proving exploit succeeded (min 1)
- negative_command: ZFP baseline (optional)
- target_checks: discriminated union of PortCheck / ServiceCheck /
CredentialCheck / CommandOutputCheck / FileCheck probes
### EnvironmentVerifier (decepticon/core/env_verifier.py)
Independent verifier — no LLM:
1. capture_state() runs all target_checks pre/post defense
2. verify_blocked() replays poc_command with ZFP demotion
3. Outcome determined from signal table:
- zfp_demoted → ERROR
- no success signals → BLOCKED (reward 1.0)
- signals + all checks still positive → PASSED (reward 0.0)
- signals + some checks flipped → PARTIAL (reward 0.5)
4. compute_reward() → RLVRReward scalar
5. Appends to workspace/rlvr/rewards.jsonl (training stream)
### Workspace layout
workspace/
findings/FIND-001-exploit-spec.json ← offensive agent writes
verification/FIND-001-pre-snapshot.json
verification/FIND-001-post-snapshot.json
verification/FIND-001-evidence.json
rlvr/rewards.jsonl ← append-only RLVR stream
### Backward compatibility
ExploitSpec missing for a finding → falls back to legacy
_load_verification_result (LLM-written JSON). Gated by
VACCINE_USE_ENV_VERIFIER env var (default on).
### exploit_spec_register tool
LangChain @tool added to exploit, recon, scanner agents. Offensive
agents call it after writing FIND-NNN.md to register a self-contained
spec for env-grounded verification.
## Tests
8 new async unit tests (pytest-asyncio auto mode):
- pre/post defense reward transitions (PASSED→BLOCKED)
- ZFP demotion → ERROR
- PARTIAL reward from partial check flips
- rewards.jsonl append and round-trip JSON validity
- snapshot/evidence persistence
- spec load/missing round-trips
## Files changed
New:
decepticon/schemas/exploit_spec.py
decepticon/schemas/env_verification.py
decepticon/core/env_verifier.py
decepticon/tools/research/exploit_spec_writer.py
tests/unit/core/test_env_verifier.py
Modified:
decepticon/orchestrator.py — Phase 4 uses _verify_finding()
decepticon/core/engagement_loop.py — pre-snapshot before defender,
_verify_finding_env() post-defender
decepticon/agents/exploit.py — exploit_spec_register added
decepticon/agents/recon.py — exploit_spec_register added
decepticon/agents/scanner.py — exploit_spec_register added
decepticon/agents/prompts/*.md — vaccine loop instructions added
# Conflicts: # clients/web/src/app/(dashboard)/page.tsx # containers/web.Dockerfile # decepticon/backends/docker_sandbox.py # decepticon/core/engagement_loop.py # decepticon/middleware/engagement.py # decepticon/middleware/opplan.py # decepticon/orchestrator.py # decepticon/tools/interaction/ask_user.py # decepticon/tools/research/scanner_tools.py # tests/unit/tools/test_ask_user_question.py
…t patterns, dedup Upgrades EnvironmentVerifier from single-run binary verification to a multi-signal triager-grade pipeline. Eliminates the four primary sources of false positives that would cause a security triager to reject findings. ## New verification pipeline (verify_blocked) 1. **Duplicate fingerprint gate** — SHA-256(poc_command + success_patterns + target_host) keyed in rlvr/dedup.jsonl. Duplicate specs yield ERROR reward immediately, preventing the same finding from inflating reward signal. 2. **Baseline validity gate** — verify_baseline() runs the PoC BEFORE defenses are applied and confirms the exploit actually works. If baseline.valid=False (exploit never triggered), the finding is INVALID_SPEC → ERROR. Defenses can no longer earn reward for "blocking" exploits that were never exploitable. 3. **N-run consensus** — ExploitSpec gains `runs` (1–10) and `min_success_rate` (0.0–1.0). Each run is independent; agreed_signals = intersection across all successful runs. Flaky exploits that only work 1/3 times yield PARTIAL, not the full 1.0 reward. ZFP check runs once after consensus, not per-run. 4. **Impact pattern confirmation** — ExploitSpec gains `impact_patterns` (separate from trigger patterns). Patterns like `uid=0`, `root@`, data-exfil regexes confirm ACTUAL IMPACT, not just "exploit ran". Unconfirmed impact lowers confidence multiplier to 0.7 when impact_patterns are declared. 5. **CVSS 3.1 base score estimation** — _estimate_cvss() derives AV/AC/PR/UI/ Scope/CIA from TargetCheck types and impact_patterns heuristics. Emitted on RLVRReward.cvss_score. Triagers can threshold on CVSS ≥ 7.0. 6. **Inconclusive detection** — _check_inconclusive() flags contradictions where a PortCheck says closed but a ServiceCheck for the same host says reachable. Yields ERROR reward instead of forcing a wrong BLOCKED/PASSED call. ## New schemas - `PoCRunResult` — single run result with per-run signal matching - `PoCConsensus` — N-run aggregate: n_runs, n_success, success_rate, agreed_signals - `BaselineEvidence` — pre-defense PoC result confirming spec validity - `CVSSEstimate` — CVSS 3.1 components + computed base_score + vector_string ## ExploitSpec additions - `runs: int = 1` — consensus run count (recommend 3 for reliable findings) - `min_success_rate: float = 1.0` — success fraction threshold (0.67 for 2/3) - `impact_patterns: list[str] = []` — actual impact evidence regexes - `target_host: str | None` — dedup fingerprint anchor ## Backward compatibility - Legacy `PoCEvidence` field on VerificationEvidence still populated (best-run) - Legacy `_determine_outcome` preserved for existing call sites - verify_blocked signature unchanged (baseline param is optional) - All 8 existing tests pass unchanged; 5 new tests added (13 total) ## Test coverage added - N-run consensus 2/3 → PARTIAL (below min_success_rate) - N-run consensus 2/3 → PASSED (meets 0.5 threshold, signals still present) - Baseline invalid → ERROR propagation through verify_blocked - Impact patterns → evidence.impact_signals_matched + reward.impact_confirmed - Duplicate detection → second registration yields duplicate_of + ERROR
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the LLM-judged vaccine loop outcome with an environment-grounded verification pipeline. The defender agent can no longer self-report a successful defense — the sandbox itself decides.
INVALID_SPEC→ reward 0.0. Defenses can no longer earn reward for blocking something that was never exploitable.ExploitSpec.runs(1–10) +min_success_rate(0.0–1.0). Exploit must succeed on ≥ N% of runs. Flaky one-in-three triggers yieldPARTIAL, not full reward 1.0.impact_patternsfield (e.g.uid=0,root@, data-exfil regexes). Confirms actual exploitation impact, not just that the payload fired. Unconfirmed impact applies a 0.7 confidence multiplier.TargetChecktypes and impact patterns using full CVSS 3.1 formula. Emitted onRLVRReward.cvss_score.rlvr/dedup.jsonl. Duplicate specs yieldERRORreward, preventing the same finding from inflating signal.ERRORinstead of forced BLOCKED/PASSED.ERROR. Already present, now applied after N-run consensus.New schemas
PoCRunResultPoCConsensusBaselineEvidenceCVSSEstimateExploitSpec new fields
Reward tiers
BLOCKEDPARTIALPASSEDERRORFiles changed (Python only)
decepticon/schemas/exploit_spec.py— new fieldsdecepticon/schemas/env_verification.py— new schema typesdecepticon/core/env_verifier.py— full verification pipelinedecepticon/tools/research/exploit_spec_writer.py—@tool exploit_spec_registerdecepticon/orchestrator.py— Phase 4 env-grounded path + legacy fallbackdecepticon/core/engagement_loop.py— PRE_DEFENSE snapshot + env-grounded pathdecepticon/agents/{exploit,recon,scanner}.py— tool registrationdecepticon/agents/prompts/{exploit,recon,scanner}.md— agent guidancetests/unit/core/test_env_verifier.py— 13 tests (8 original + 5 new)Test plan
uv run pytest tests/unit/core/test_env_verifier.py -v— 13/13 passuv run ruff check decepticon/— cleanuv run basedpyright decepticon/— 0 errorsVACCINE_USE_ENV_VERIFIER=1, confirmworkspace/rlvr/rewards.jsonlpopulated after vaccine phase