Integrate all 80 open PRs (#352-#441): conflict-free, fully tested by VoidChecksum · Pull Request #442 · PurpleAILAB/Decepticon

VoidChecksum · 2026-05-30T20:05:30Z

Integrates all 80 open PRs (#352–#441) into a single, conflict-free, fully-tested tree.

Each PR was individually green, but the repo ruleset uses strict_required_status_checks_policy: false, so per-PR checks cannot detect cross-PR breakage. This branch merges every PR locally, resolves conflicts, and reconciles the cross-PR interactions so the combined tree is green.

Textual conflicts resolved

fix(auth): stop leaking OAuth tokens in the Codex refresh error message #378 vs feat: wire P0 middleware/runtime modules + security & correctness hardening #353 — duplicate token-redaction in codex_chatgpt_handler.py (kept feat: wire P0 middleware/runtime modules + security & correctness hardening #353's).
fix(web): validate engagement name on PATCH and contain filesystem routes to WORKSPACE #432 vs feat: wire P0 middleware/runtime modules + security & correctness hardening #353 — duplicate engagement-slug validation in the web PATCH route (kept SLUG_RE, consistent with the POST route).

Cross-PR behavior/isolation reconciliations (stale test expectations updated to match intended new source; no tests removed/gamed)

fix(tools/defense): escape SPL values; emit valid KQL for Defender XDR push #417 hardened SPL quoting + Defender KQL emission → updated stale tests in test(tools/defense): add comprehensive push-function and edge-case tests #364/test(tools/defense): cover edr.py 19.8% -> 98% #416 (15 tests): quoted/escaped SPL, sha256 YARA indicator, KQL queryText.
fix(runtime): capture tool-call args + reasoning_content so replay is faithful #430 record/replay now captures tool_call.args → updated test(runtime/recording): cover serializers + record/replay round-trip — 45%→96% #385 _tool_request helper.
fix(backends): key HTTPSandbox job mirror by workspace + session #400 keys the sandbox job mirror by workspace+session → updated test(backends): cover backends/http_sandbox.py 28.8% -> 98% #418 to assert workspace normalization.
feat: wire P0 middleware/runtime modules + security & correctness hardening #353/fix(config): wire GEMINI_SESSION_COOKIES auth source #401 force-load the real oauth_token_store under its bare name so a sibling test's partial stub can't shadow it under pytest -n auto.
fix(llm): authenticate Ollama Cloud via OLLAMA_CLOUD_API_KEY + add cloud probe #441 ruff-format fix (the one PR that had a red CI).

Verification (mirrors the CI Python gate)

ruff check . — pass
ruff format --check . — pass
basedpyright — 0 errors
pytest -n auto -m "not slow" — 3510 passed, 28 skipped, 0 failed

Merging this with a merge commit makes every PR's head commit reachable from main, so all 80 PRs will be auto-marked Merged.

…_UPDATE) Two halves of the release/update lifecycle, both built to leave the proven release pipeline and self-update flow intact. Auto-release on merge - .github/workflows/auto-tag.yml: on push to main, computes the next semver from conventional commits since the last tag (scripts/next_version.py: feat->minor, fix/perf/revert->patch, !/BREAKING CHANGE->major, docs/chore/ ci/test->no release) and pushes vX.Y.Z. That tag drives the EXISTING release.yml (GoReleaser draft -> image build -> verify -> publish) unchanged, so release atomicity is preserved — auto-tag never builds. - One-time PAT (RELEASE_PLEASE_TOKEN, contents:write) is required for the auto-tag's tag to trigger release.yml (GitHub suppresses workflow triggers from GITHUB_TOKEN-pushed tags). Without it the tag is still created; build via release-recover.yml. Documented in RELEASE.md. Unattended auto-update - The launcher had a full updater (FetchLatestRelease/ApplyUpdate/SelfUpdate/ re-exec) and a startup PromptIfUpdateAvailable, but the documented AUTO_UPDATE flag was wired NOWHERE. start.go now honors it: unset -> prompt (today's behavior) true -> unattended apply+restart false -> skip entirely (air-gapped / pinned) - updater.AutoUpdateIfAvailable + shared applyAndReexec/resolveUpdateRef (refactored out of PromptIfUpdateAvailable so both paths are identical). Tests: scripts/next_version.py (9 cases incl. lookalike-type guards) and the Go dev-build skip gate. actionlint clean across all workflows.

…tag) release-recover.yml only verifies already-built images and finalizes the release; it cannot build one. When RELEASE_PLEASE_TOKEN is absent, a GITHUB_TOKEN-pushed tag never triggers release.yml, so no images exist and release-recover fails at its verify step. Document the real recovery: re-push the tag manually (delete remote + push) to trigger release.yml.

…dening Wires modules that were built+tested but never registered, then fixes 15 defects found in a focused security review of the result + adjacent code. P0 wiring - EventLogMiddleware: emit engagement events to <workspace>/events.jsonl (tool/llm/finding), registered as a slot; adds EventLog.for_workspace. - Register budget, prompt-injection-shield, HITL, and event-log as middleware slots (every role; HITL opt-in via DECEPTICON_HITL__ENABLED). - HITL web bridge: GET/POST /api/engagements/[id]/approvals + ApprovalGate UI (polling), reading/writing the SDK's FileBackedApprovalTransport JSONL. - CART live replay: SubAgentTaskSpec contract + make_replay_dispatcher; ReplayRunner.execute now dispatches real specs (was a 'live_unwired' stub). - Skillogy cutover hook in build_middleware (no-op unless DECEPTICON_USE_SKILLOGY). - PromptInjectionShield: registry-driven trusted-tool lookup (drops the load_skill/list_skills hardcode) + dedup vs UntrustedOutput (no double-wrap). Security / correctness fixes - event log no longer persists tool-arg VALUE contents (was leaking bash commands, Authorization/Cookie headers, session tokens to events.jsonl); finding.created emitted only on a successful tool result, after tool.result. - web: close authenticated path traversal via engagement name — PATCH now slug-validates name (root cause); approvals route re-validates (defense). - config/oauth_token_store: unique-temp (mkstemp) atomic write — fixes a concurrent-refresh race that corrupted/lost tokens. - config/codex_chatgpt_handler: stop embedding raw token data in refresh errors. - config/copilot_handler + grok_handler: preserve streamed tool_calls and upstream finish_reason (were dropping every streamed tool call). - config/gemini_handler: map candidates[].finishReason (was hardcoded 'stop'). - llm/factory: detect env-backed OAuth credentials as configured. - PROMPT_INJECTION_SHIELD added to SAFETY_CRITICAL_SLOTS (gate disable/replace). - HITL: resolve approval transport per-request from workspace_path (was frozen at graph import) and offload the blocking wait via asyncio.to_thread. - CART make_replay_dispatcher uses ReplayMiddleware(strict=False) (partial replay; misses fall through to live) instead of strict default. - skillogy swap-only (never resurrect an intentionally-disabled SKILLS slot). Verification: py_compile (all changed .py), a functional harness exercising the real logic with heavy deps stubbed, and Bun for the web slug test; unit tests added/updated. The project's full pytest/tsc/eslint/pre-commit were not run in this environment (no installed deps) and must run in CI.

…iring branch The Python CI job failed at the `ruff check` step, which masked two further breakages that the job never reached (basedpyright/pytest run after lint). - ruff: 7 lint errors (import ordering, missing EOF newline) plus 10 files needing `ruff format`. All autofixed; no logic changes. - test_prompt_injection_shield: three tests asserted the shield wraps `bash` output, but this branch adds the anti-double-wrap dedup that routes `bash`/`read_file`/`kg_*` to UntrustedOutputMiddleware. The shield correctly skips them now, so the tests point at a genuinely shield-owned external tool (`http_fetch`) instead. The dedup itself is already covered by test_shield_skips_untrusted_output_tools_no_double_wrap. - test_build._build_exploit_stack: the exploit role includes the SANDBOX_NOTIFICATION slot, whose factory now requires a non-None `sandbox` kwarg (it forwards the real HTTPSandbox the agent builds). Default a mock in the helper so the stack assembles. Local gate green: ruff check + ruff format --check clean, basedpyright 0 errors, pytest -n auto -m "not slow" -> 1781 passed, 26 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`_build_engagement_injection` accepted a `workspace` argument but hardcoded `/workspace` in three places: the "Workspace root" line, the "Treat … as the only engagement directory" line, and the `/workspace/plan/` guidance. Multi-tenant / SaaS launchers mount each engagement under a distinct root (via `config.configurable.workspace_path`, resolved by `_resolve_workspace_path`), so the agent was pointed at the wrong directory for any non-default workspace. Template all three occurrences with the passed root (trailing slash trimmed so `{root}/plan/` never doubles up). Default single-tenant behavior is unchanged — `workspace_path` still defaults to `/workspace`. Adds a regression test asserting a custom `workspace_path` round-trips into the injection text and that the stale `/workspace` default does not leak. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two unreachable-code defects surfaced by a code audit; no behavior change: - tools/reversing/tools.py: `bin_r2_script` had a duplicate, unreachable second `return _json({"source": r2_recon_script(binary)})` (copy-paste). Removed. - blue_cell/rule_match.py: in `_evaluate_condition`, the `elif token in {"(", ")"}` branch is unreachable — the preceding `if token in {"and","or","not","(",")"}` already matches "(" and ")". Removed; parenthesis handling is unchanged (the first branch appends them and they reach the whitelisted eval as before). Verified: ruff clean, basedpyright 0 errors, blue_cell + reversing unit tests pass (41), full fast-lane pytest green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…gnostic + add tests `deriveSubAgentSessions` (shared by the Web + CLI clients) determined a finished session's terminal status by reading an *undeclared* `status` field off the event via a cast. That works for the CLI, which normalizes the backend's `error` boolean into `status: "error" | "success"` before events reach the shared utilities — but a consumer that forwards the raw backend event (whose contract is `error: boolean`, per `SubagentCustomEvent`) would have its errored sessions silently rendered as "completed". - Model both signals on the shared `StreamEvent` type (`status?`, `error?`), removing the unsafe cast. - Detect failure from either `status === "error"` or `error === true`, so the result is correct regardless of which client shape feeds it. - Add the first unit tests for the (previously untested) shared session derivation: error via both shapes, running vs completed, tool counting, orphan-end handling, interleaved subagents, default description. No behavior change for the CLI (its events already carry the normalized `status`). Verified: cli typecheck clean, vitest 29 passed (8 new), web eslint clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ublic surface `decepticon_core.types` re-exported `engagement`/`llm`/`kg` but not `roe`, even though `types/roe.py` is a fourth submodule of the contract layer and is consumed across the package boundary (the framework's RoE-enforcement middleware imports `decepticon_core.types.roe`). The package docstring also claimed "Three submodules". - Re-export `roe` alongside its siblings (import + `__all__`); fix the docstring to describe all four and disambiguate the enforcement schema from `engagement.RoE` (the planning document). - Lock roe's six public symbols (EnforcementMode, ScopeRule, MachineEnforcement, Decision, evaluate_target, evaluate_command) in the public-API stability manifest and bump its count 69 -> 75, per the manifest's documented update process. CHANGELOG is release-curated (no Unreleased section), so it's left to the release flow. Verified: ruff clean, basedpyright 0 errors, decepticon-core suite green (88 passed) incl. the langchain-free guarantee. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…vent loop) `http_request` was a *sync* `@tool` that drove its async `HTTPSession` via `asyncio.get_event_loop().run_until_complete(_do())`. Inside LangGraph's running event loop that raises `RuntimeError: ... cannot be called from a running event loop`, so the tool would fail the moment a web agent is wired to it. It is latent today only because no standard agent imports `WEB_TOOLS`. Every other network tool (`bash_*`) is already an async `@tool`. Convert `http_request` to `async def` and `await` the session directly, dropping the `_do` wrapper, the `run_until_complete` call, and the local `asyncio` import. Behavior is otherwise identical. Adds the first tests for the web tool layer: invoking the tool from *within* a running loop (the exact failure condition) with a stubbed session, plus the invalid-headers path. Verified: ruff clean, basedpyright 0 errors, new tests pass (2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds a self-contained, additive MCP (Model Context Protocol) client so operators can connect external MCP tool servers (Kali MCP, HexStrike, etc.) and expose their tools to Decepticon agents. - Config via DECEPTICON_MCP__SERVERS (JSON), matching the existing DECEPTICON_<SECTION>__<KEY> convention. - langchain-mcp-adapters imported lazily inside load_mcp_tools(); absent package => one-line warning + empty list (agents keep working). - Per-server isolation: one bad endpoint logs+skips, others still load. - No dependency/lockfile changes (optional package documented as manual install; locked extra to follow). 17 mocked tests, no network.

…reds Covers analyze_gpo_abuse, analyze_delegation, analyze_shadow_credentials and all pure helpers (_is_sensitive_ou, _is_dc, _spn_targets_dc) with happy paths, edge cases, and dangling-node robustness tests (66 tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…rs and tool builder 59 tests covering _strip_frontmatter, _read_via_backend, _list_dir_via_backend, _validate_skill_path, _format_skill_body, and build_load_skill_tool end-to-end using a fake backend — no external services required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ilent dead port `build_grpc_server` constructed a `grpc.Server`, bound `0.0.0.0:<port>`, and returned an UNREGISTERED servicer — it never called `add_*Servicer_to_server` (there are no protoc-generated bindings for `skillogy.proto`) and the hand-rolled servicer returned plain Python objects with no message serializers. So when grpcio was installed the launcher opened a port that accepted connections but answered every RPC (ListSkills / LoadSkill / IngestSkill / Health) with `UNIMPLEMENTED` — a silent dead port that *looks* healthy. REST is the only wired transport (`RestSkillogyClient` → :9100; the skillogy middleware connects over REST). Make `build_grpc_server` raise `RuntimeError` with a clear message. `__main__._start_grpc` already catches `RuntimeError` and degrades to REST, so the service boots correctly with no dead port. Removed the dead servicer + `_RawResponse`/`_HealthResp` scaffolding and the now-unused `SkillMeta` import; updated the module docstring. Adds tests: `build_grpc_server` raises (message names gRPC + REST), and `_start_grpc` returns None (graceful REST degrade). Verified: ruff clean, basedpyright 0 errors, skillogy suite 36 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…d) and BlueCellTap transforms Covers JSON output shapes and error paths for solidity_scan, solidity_scan_file, slither_ingest, foundry_* wrappers, export_session_asciicast, list_session_recordings, iam_policy_audit, s3_buckets_from_text, user_data_secrets, k8s_audit, tfstate_audit, metadata_endpoints, plus _strip_ansi, _parse_line_to_event, TapEvent.to_dict, and BlueCellTap.read_batch/follow. All IO mocked; passes fully offline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Adds test_defense_push.py covering HTTP push functions (mocked requests) for splunk, elastic, sentinel, and EDR (Defender XDR + CrowdStrike), plus converter edge cases, severity/logsource mappings, and ConOps helper paths not covered by the existing test_defense.py — 104 tests total. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ing internals Adds 130 new tests covering binary.py, strings.py, symbols.py, rop.py, packer.py, and scripts.py — ELF32/big-endian/Mach-O/WASM formats, all four RET opcodes, every packer signature, crypto/email/version/import string categories, UTF-16LE extraction, symbol buckets and risk scoring, and to_dict serialization for every dataclass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ty modules Adds 111 new tests across three files: - test_chain.py: full coverage of chain.py (ChainStep, Chain dataclass, compute_edge_cost, plan_chains, promote_chain, critical_path_score, impact_analysis, unexplored_surface, credential_reachability) — all Neo4j calls mocked via _FakeStore. - test_cve_extended.py: _Cache (TTL, LRU eviction, persistence, flush), _rehydrate, async lookup_cve/lookup_cves/lookup_package with mocked httpx, plus NVD/EPSS parser edge cases. - test_bounty_extended.py: bounty_scope_check (domain wildcards, exclusions, normalization) and format_bounty_report (validated/unvalidated findings, severity labels, title stripping) with mocked KG store and filesystem. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@tool

…and executive.py helpers 63 deterministic tests covering all LangChain @tool wrappers (report_hackerone, report_bugcrowd_csv, report_executive, report_timeline) with mocked _load, plus deep edge-case coverage of _count_by_severity, _top_chains, _top_cves, and render_executive_summary (validated-findings cap, severity ordering, default fallbacks, graph stats, empty-graph branches). All pass offline with no services. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Covers _default_cache_root env-var override, ReferenceCache.to_dict, _entry, _dir_size (OSError paths), _run_git hardened-env contract, ensure_cached clone/pull/URL-mismatch/symlink/non-git-hint paths, _which, _parse_grep_line edge cases, _pyfind pure-Python fallback, and search_cache grep/timeout/FileNotFoundError/max_results paths. All network and subprocess calls mocked; 55 pass, 2 platform-skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add packages/decepticon/tests/unit/cli/test_scan_extra.py covering _git_diff_files (subprocess success/failure/timeout/OSError), _emit_jsonl_event, _dispatch_scan_via_sdk (missing SDK, happy-path streaming, asyncio timeout), _load_findings_graph (missing file, env-var workspace, load exception), and main() end-to-end paths (no-target, instruction-file error, diff-scope fallback, SDK RuntimeError/generic exception, happy-path, SARIF output, engagement-name/timeout overrides, non-interactive forwarding). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@tool

…y ad wrappers Covers ImportStats, _node_kind_for_bh, _BH_EDGE_MAP, _upsert_bh_object, _build_bh_index, _ingest_aces, _ingest_memberships, merge_bloodhound_json (dict/str/list/error paths), ingest_bloodhound_zip, and the bh_ingest_zip / bh_ingest_json @tool wrappers (JSON output shapes + error paths). All 81 tests are offline and deterministic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…G CRUD, tier-2 ingesters, and fuzz tools Adds test_tools_extended.py with comprehensive coverage of uncovered surfaces in tools/research/tools.py: _parse_props, _severity_from_score/ string, _is_web_port, _jwt/_cookie_finding_severity, all four dependency- file parsers, kg_add_node/edge/query/neighbors/stats, kg_ingest_subfinder/ dnsx/katana/masscan/ffuf/testssl/crackmapexec/asrep_hashes, fuzz_harness, fuzz_record_crash, and error paths throughout. All mocked offline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…task-tree formatter tools/opplan.py had zero direct unit coverage despite holding the OPPLAN status state machine and the agent-facing renderer. Adds 23 tests for: - _is_valid_transition / _valid_next / _VALID_TRANSITIONS: the full pending -> in-progress -> completed/blocked/cancelled matrix, terminal states, unknown sources, no self-transitions, sorted next-state hints, and a guard that the table only references real ObjectiveStatus values. - _build_opplan_payload: per-status summary counts, id-stable ordering (clean git diffs), zeroed empty-plan summary, and round-trip back into the OPPLAN model (wrapper fields dropped). - _format_opplan_for_agent: header/progress, priority-sorted table, blocked_by joining + owner fallback, hierarchical task tree, status markers, the lowest-priority 'Next' recommendation, all-complete and no-actionable branches, and a cycle-guard regression test proving a malformed/injected duplicate-id tree cannot drive unbounded recursion. Pure-logic only; no network/docker/LLM.

…hecks A trailing dot is DNS-equivalent ("metadata.google.internal." resolves identically to "metadata.google.internal"), but `_matches_rule` compared hosts with only `.lower()` — no trailing-dot normalization. In ENFORCE mode this let the FQDN form slip past the forbidden-destination guard AND any operator `out_of_scope` host rule: evaluate_target("metadata.google.internal.", enforce) -> allow=True evaluate_target("169.254.169.254.", enforce) -> allow=True i.e. the agent could reach the cloud metadata service and exfiltrate service-account credentials despite the built-in IMDS deny list, with the audit ledger recording the call as "allow". The IMDS-IP form additionally failed `ip_address()` parsing and fell through to default-allow. Strip a trailing dot (keeping the existing case-fold) on BOTH the rule pattern and the target in `_matches_rule`, across host, domain-glob, and CIDR matching — so forbidden_destinations, out_of_scope, and in_scope are all normalized consistently. Legitimate in-scope FQDN-form hosts still match; the raw target is preserved in the decision detail for the audit. Adds regression tests (TestFqdnTrailingDotNormalization) for the forbidden-destination, out-of-scope, and in-scope paths in both host and IMDS-IP forms. Verified: ruff clean, basedpyright 0 errors, test_roe + decepticon-core suite 130 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`SandboxBase._normalize_workspace_path` validated each path component against `[A-Za-z0-9_.-]{1,128}`, which accepts "." and "..". So `_normalize_workspace_path("/workspace/../../etc")` returned the string verbatim, and that traversable path flowed into the sandbox tmux/file operations meant to be confined to `/workspace/<engagement>` — escaping the per-engagement subtree. The sibling EngagementFilesystem layer already guards this (`middleware/filesystem.py` uses `posixpath.normpath` plus a documented test), but the sandbox_kernel + bash callers invoke `_normalize_workspace_path` directly with no such guard. Add the same fail-closed guard: if `posixpath.normpath(path) != path`, return the safe `/workspace` default. Catches "..", ".", and "//" traversal while preserving legitimate (incl. dotted) directory names. posixpath (not os.path) so Windows hosts don't get "/"->"\\" rewrites. Adds tests/unit/sandbox_kernel/test_workspace_path.py: legit paths preserved + traversal/escape forms fail closed. Verified: ruff clean, basedpyright 0 errors, sandbox_kernel suite green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ion eval, loaders test_rule_match.py exercised happy-path matching but left the silent-failure-prone internals at 77%. Adds 27 tests lifting rule_match.py to 99%: - _event_field: nested dotted paths, missing keys, non-dict mid-path, None and list values (a wrong return makes a rule silently never fire). - _compile_pattern: literal escaping vs re: regex mode, case-insensitivity. - _evaluate_condition: empty-condition all()/no-selection branches, unknown selection names, and malformed expressions — proving the sandboxed eval path fails closed (False) rather than raising. - load_rules / _load_from_jsonl / _load_from_json / _rule_from_dict: JSONL + directory loading, blank/malformed/no-id line skipping, list/single/scalar JSON shapes, match-shorthand, missing-file and nonexistent-path non-fatal handling — the untrusted-rule-file parser. Pure-logic + tmp_path only; no network/docker/LLM.

`_refresh_tokens` raised `litellm.AuthenticationError` with the raw token endpoint response interpolated into the message: message=f"Codex ChatGPT refresh response missing fields: {data}" On a partial-but-successful refresh (e.g. `access_token` present but `id_token` missing) `data` carries the freshly-minted access token and the rotated refresh token verbatim — which then land in logs and the caller-visible error. Interpolate only the (non-sensitive) field NAMES that were present, never their values. The handler runs inside the litellm container (litellm is not a dev/test dependency), so it has no standard-suite unit test; this is a one-line redaction verified by inspection. ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…idates) `UntrustedOutputMiddleware` wraps the output of every tool in `UNTRUSTED_TOOL_NAMES` in an `<UNTRUSTED_TOOL_OUTPUT>` envelope with a heuristic risk score, so attacker-influenceable bytes reach the model marked as data (and high-risk hits are logged to the quarantine ledger). The allowlist omitted the scanner prefilter tools: `scan_shard` walks `/workspace/target` and returns raw code snippets, and `rank_candidates` re-emits those hits. An injection payload planted in a scanned target file therefore reached the scanner agent's model UNwrapped — never enveloped, risk-scored, or recorded. Add `scan_shard` + `rank_candidates` to `UNTRUSTED_TOOL_NAMES`, and a regression test asserting both are enveloped with the right `origin`. Verified: ruff clean, basedpyright 0 errors, test_untrusted_output 42 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts: # config/codex_chatgpt_handler.py

# Conflicts: # clients/web/src/app/api/engagements/[id]/route.ts

…ng 80 PRs - defense (#364/#416 tests vs #417 source): quote/escape SPL expectations; add sha256 indicator to YARA fixtures + KQL queryText assertion for the new Defender push behavior - recording (#385 test vs #430 source): _tool_request uses tool_call.args - backends (#418 test vs #400 source): assert workspace-path normalization - llm oauth/gemini (#353/#401): force-load real oauth_token_store under the bare name so a sibling test's partial stub can't shadow it under -n auto - ollama (#441): ruff format

Adds 9 credentials-aware AuthMethods for OpenAI-compatible LLM gateways / aggregators, bringing Decepticon closer to oh-my-pi's provider breadth. Each routes through LiteLLM's openai/ provider with a fixed api_base override (the proven xiaomi_mimo / custom pattern), now table-driven via OPENAI_COMPAT_GATEWAYS so the batch shares one code path: opencode OpenCode Zen https://opencode.ai/zen/v1 vercel Vercel AI Gateway https://ai-gateway.vercel.sh/v1 hf Hugging Face Router https://router.huggingface.co/v1 venice Venice AI https://api.venice.ai/api/v1 nanogpt NanoGPT https://nano-gpt.com/api/v1 synthetic Synthetic https://api.synthetic.new/openai/v1 zenmux ZenMux https://zenmux.ai/api/v1 qianfan Baidu Qianfan (ERNIE) https://qianfan.baidubce.com/v2 cfgateway Cloudflare AI Gateway per-account base URL The model alias keeps the gateway prefix (opencode/claude-opus-4-6) so two gateways exposing the same upstream slug never collide in the LiteLLM model_list. Wired end-to-end exactly like the existing providers: - AuthMethod enum + METHOD_MODELS HIGH/MID/LOW matrix (types/llm.py) - OPENAI_COMPAT_GATEWAYS table + build_model_entry branch (litellm_dynamic_config.py) - static model_list entries + within-gateway fallback chains (litellm.yaml) - env-var credential auto-detection, default priority, CLI labels (factory.py) — auth_inventory + `decepticon-cli auth` pick them up automatically - .env.example keys, /model catalog (model.ts), setup-guide.md table Base URLs + model ids verified against each provider's current public docs and oh-my-pi's maintained provider catalog. Kimi-for-Coding was evaluated and dropped: its coding/v1 endpoint enforces a coding-agent client whitelist and requires the kimi-for-coding model id, so it is not usable through a generic LiteLLM proxy (Moonshot is already covered by moonshot_api). Cursor / GitLab Duo / Qwen Portal are deferred — they need proprietary or OAuth-device protocols that don't fit the OpenAI-compatible-via-LiteLLM path. Like the existing cerebras / xiaomi_mimo additions, these are configured via .env, not the Go onboard wizard. Tests: gateway routing / alias-collision / validation (test_litellm_dynamic_config.py), tier resolution + prefix invariants (test_models.py), credential detection (test_auth.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ith 'import' and 'import from'' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Brings the integration tree up to date with main (v1.1.4-v1.1.6 releases: skillogy publish fix, native arm64 runners, CHANGELOG; #443 make-as-CI- source-of-truth; #444-#448). Conflict: ci.yml coverage line — kept main's 'make ci-test-coverage' dispatch (#443) and carried #380's 35->60 coverage floor into the Makefile target so both intents hold.

Adds OpenCode Zen + 8 OpenAI-compatible provider gateways. Clean auto-merge with the integrated llm factory/types/dynamic-config changes (#353, #435, #441); reconciliation verified by the test suite.

VoidChecksum and others added 30 commits May 29, 2026 18:33

feat(cli): add audit ledger verification

9d6cc3f

feat(skills): resolve load_skill slugs

cefe4f9

Integration Bot added 22 commits May 30, 2026 21:40

Merge remote-tracking branch 'pr/423' into integration

5338f99

Merge remote-tracking branch 'pr/425' into integration

29b1131

Merge remote-tracking branch 'pr/426' into integration

1ffe3d8

Merge remote-tracking branch 'pr/427' into integration

839ee9e

Merge remote-tracking branch 'pr/428' into integration

78bba3d

Merge remote-tracking branch 'pr/429' into integration

990e341

Merge remote-tracking branch 'pr/430' into integration

b85703a

Merge remote-tracking branch 'pr/431' into integration

d8d0952

Merge remote-tracking branch 'pr/433' into integration

253d054

Merge remote-tracking branch 'pr/434' into integration

80efe76

Merge remote-tracking branch 'pr/435' into integration

2b8c6ac

Merge remote-tracking branch 'pr/436' into integration

23e2343

Merge remote-tracking branch 'pr/437' into integration

9b4f86b

Merge remote-tracking branch 'pr/438' into integration

2385b6b

Merge remote-tracking branch 'pr/439' into integration

cd5007b

Merge remote-tracking branch 'pr/440' into integration

5fb457a

Merge remote-tracking branch 'pr/441' into integration

aff958f

Merge remote-tracking branch 'pr/378' into integration

ebc1bba

# Conflicts: # config/codex_chatgpt_handler.py

Merge remote-tracking branch 'pr/432' into integration

dcac6a2

# Conflicts: # clients/web/src/app/api/engagements/[id]/route.ts

style: ruff format ollama-cloud files from #441

fb220e7

chore: drop basedpyright artifact

63e97aa

VoidChecksum requested a review from PurpleCHOIms as a code owner May 30, 2026 20:05

github-advanced-security AI found potential problems May 30, 2026

View reviewed changes

Comment thread packages/decepticon/tests/unit/ad/test_ad.py Fixed

VoidChecksum and others added 5 commits June 1, 2026 14:28

Potential fix for pull request finding 'CodeQL / Module is imported w…

01b59b8

…ith 'import' and 'import from'' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Merge PR #453 (feat/llm-opencode-zen-gateways) into integration

2b17ec2

Adds OpenCode Zen + 8 OpenAI-compatible provider gateways. Clean auto-merge with the integrated llm factory/types/dynamic-config changes (#353, #435, #441); reconciliation verified by the test suite.

style: sort imports in test_ad.py (ruff I001 after CodeQL import fix)

9dfe0ad

VoidChecksum merged commit 34bd0e9 into main Jun 1, 2026
26 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate all 80 open PRs (#352-#441): conflict-free, fully tested#442

Integrate all 80 open PRs (#352-#441): conflict-free, fully tested#442
VoidChecksum merged 186 commits into
mainfrom
integration/merge-all-open-prs

VoidChecksum commented May 30, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

VoidChecksum commented May 30, 2026

Textual conflicts resolved

Cross-PR behavior/isolation reconciliations (stale test expectations updated to match intended new source; no tests removed/gamed)

Verification (mirrors the CI Python gate)

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants