Integrate all 80 open PRs (#352-#441): conflict-free, fully tested#442
Merged
Conversation
…_UPDATE)
Two halves of the release/update lifecycle, both built to leave the proven
release pipeline and self-update flow intact.
Auto-release on merge
- .github/workflows/auto-tag.yml: on push to main, computes the next semver
from conventional commits since the last tag (scripts/next_version.py:
feat->minor, fix/perf/revert->patch, !/BREAKING CHANGE->major, docs/chore/
ci/test->no release) and pushes vX.Y.Z. That tag drives the EXISTING
release.yml (GoReleaser draft -> image build -> verify -> publish)
unchanged, so release atomicity is preserved — auto-tag never builds.
- One-time PAT (RELEASE_PLEASE_TOKEN, contents:write) is required for the
auto-tag's tag to trigger release.yml (GitHub suppresses workflow triggers
from GITHUB_TOKEN-pushed tags). Without it the tag is still created; build
via release-recover.yml. Documented in RELEASE.md.
Unattended auto-update
- The launcher had a full updater (FetchLatestRelease/ApplyUpdate/SelfUpdate/
re-exec) and a startup PromptIfUpdateAvailable, but the documented
AUTO_UPDATE flag was wired NOWHERE. start.go now honors it:
unset -> prompt (today's behavior) true -> unattended apply+restart
false -> skip entirely (air-gapped / pinned)
- updater.AutoUpdateIfAvailable + shared applyAndReexec/resolveUpdateRef
(refactored out of PromptIfUpdateAvailable so both paths are identical).
Tests: scripts/next_version.py (9 cases incl. lookalike-type guards) and the
Go dev-build skip gate. actionlint clean across all workflows.
…tag) release-recover.yml only verifies already-built images and finalizes the release; it cannot build one. When RELEASE_PLEASE_TOKEN is absent, a GITHUB_TOKEN-pushed tag never triggers release.yml, so no images exist and release-recover fails at its verify step. Document the real recovery: re-push the tag manually (delete remote + push) to trigger release.yml.
…dening Wires modules that were built+tested but never registered, then fixes 15 defects found in a focused security review of the result + adjacent code. P0 wiring - EventLogMiddleware: emit engagement events to <workspace>/events.jsonl (tool/llm/finding), registered as a slot; adds EventLog.for_workspace. - Register budget, prompt-injection-shield, HITL, and event-log as middleware slots (every role; HITL opt-in via DECEPTICON_HITL__ENABLED). - HITL web bridge: GET/POST /api/engagements/[id]/approvals + ApprovalGate UI (polling), reading/writing the SDK's FileBackedApprovalTransport JSONL. - CART live replay: SubAgentTaskSpec contract + make_replay_dispatcher; ReplayRunner.execute now dispatches real specs (was a 'live_unwired' stub). - Skillogy cutover hook in build_middleware (no-op unless DECEPTICON_USE_SKILLOGY). - PromptInjectionShield: registry-driven trusted-tool lookup (drops the load_skill/list_skills hardcode) + dedup vs UntrustedOutput (no double-wrap). Security / correctness fixes - event log no longer persists tool-arg VALUE contents (was leaking bash commands, Authorization/Cookie headers, session tokens to events.jsonl); finding.created emitted only on a successful tool result, after tool.result. - web: close authenticated path traversal via engagement name — PATCH now slug-validates name (root cause); approvals route re-validates (defense). - config/oauth_token_store: unique-temp (mkstemp) atomic write — fixes a concurrent-refresh race that corrupted/lost tokens. - config/codex_chatgpt_handler: stop embedding raw token data in refresh errors. - config/copilot_handler + grok_handler: preserve streamed tool_calls and upstream finish_reason (were dropping every streamed tool call). - config/gemini_handler: map candidates[].finishReason (was hardcoded 'stop'). - llm/factory: detect env-backed OAuth credentials as configured. - PROMPT_INJECTION_SHIELD added to SAFETY_CRITICAL_SLOTS (gate disable/replace). - HITL: resolve approval transport per-request from workspace_path (was frozen at graph import) and offload the blocking wait via asyncio.to_thread. - CART make_replay_dispatcher uses ReplayMiddleware(strict=False) (partial replay; misses fall through to live) instead of strict default. - skillogy swap-only (never resurrect an intentionally-disabled SKILLS slot). Verification: py_compile (all changed .py), a functional harness exercising the real logic with heavy deps stubbed, and Bun for the web slug test; unit tests added/updated. The project's full pytest/tsc/eslint/pre-commit were not run in this environment (no installed deps) and must run in CI.
…iring branch The Python CI job failed at the `ruff check` step, which masked two further breakages that the job never reached (basedpyright/pytest run after lint). - ruff: 7 lint errors (import ordering, missing EOF newline) plus 10 files needing `ruff format`. All autofixed; no logic changes. - test_prompt_injection_shield: three tests asserted the shield wraps `bash` output, but this branch adds the anti-double-wrap dedup that routes `bash`/`read_file`/`kg_*` to UntrustedOutputMiddleware. The shield correctly skips them now, so the tests point at a genuinely shield-owned external tool (`http_fetch`) instead. The dedup itself is already covered by test_shield_skips_untrusted_output_tools_no_double_wrap. - test_build._build_exploit_stack: the exploit role includes the SANDBOX_NOTIFICATION slot, whose factory now requires a non-None `sandbox` kwarg (it forwards the real HTTPSandbox the agent builds). Default a mock in the helper so the stack assembles. Local gate green: ruff check + ruff format --check clean, basedpyright 0 errors, pytest -n auto -m "not slow" -> 1781 passed, 26 skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`_build_engagement_injection` accepted a `workspace` argument but
hardcoded `/workspace` in three places: the "Workspace root" line, the
"Treat … as the only engagement directory" line, and the
`/workspace/plan/` guidance. Multi-tenant / SaaS launchers mount each
engagement under a distinct root (via `config.configurable.workspace_path`,
resolved by `_resolve_workspace_path`), so the agent was pointed at the
wrong directory for any non-default workspace.
Template all three occurrences with the passed root (trailing slash
trimmed so `{root}/plan/` never doubles up). Default single-tenant
behavior is unchanged — `workspace_path` still defaults to `/workspace`.
Adds a regression test asserting a custom `workspace_path` round-trips
into the injection text and that the stale `/workspace` default does not
leak.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two unreachable-code defects surfaced by a code audit; no behavior change:
- tools/reversing/tools.py: `bin_r2_script` had a duplicate, unreachable
second `return _json({"source": r2_recon_script(binary)})` (copy-paste).
Removed.
- blue_cell/rule_match.py: in `_evaluate_condition`, the
`elif token in {"(", ")"}` branch is unreachable — the preceding
`if token in {"and","or","not","(",")"}` already matches "(" and ")".
Removed; parenthesis handling is unchanged (the first branch appends
them and they reach the whitelisted eval as before).
Verified: ruff clean, basedpyright 0 errors, blue_cell + reversing unit
tests pass (41), full fast-lane pytest green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gnostic + add tests `deriveSubAgentSessions` (shared by the Web + CLI clients) determined a finished session's terminal status by reading an *undeclared* `status` field off the event via a cast. That works for the CLI, which normalizes the backend's `error` boolean into `status: "error" | "success"` before events reach the shared utilities — but a consumer that forwards the raw backend event (whose contract is `error: boolean`, per `SubagentCustomEvent`) would have its errored sessions silently rendered as "completed". - Model both signals on the shared `StreamEvent` type (`status?`, `error?`), removing the unsafe cast. - Detect failure from either `status === "error"` or `error === true`, so the result is correct regardless of which client shape feeds it. - Add the first unit tests for the (previously untested) shared session derivation: error via both shapes, running vs completed, tool counting, orphan-end handling, interleaved subagents, default description. No behavior change for the CLI (its events already carry the normalized `status`). Verified: cli typecheck clean, vitest 29 passed (8 new), web eslint clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ublic surface `decepticon_core.types` re-exported `engagement`/`llm`/`kg` but not `roe`, even though `types/roe.py` is a fourth submodule of the contract layer and is consumed across the package boundary (the framework's RoE-enforcement middleware imports `decepticon_core.types.roe`). The package docstring also claimed "Three submodules". - Re-export `roe` alongside its siblings (import + `__all__`); fix the docstring to describe all four and disambiguate the enforcement schema from `engagement.RoE` (the planning document). - Lock roe's six public symbols (EnforcementMode, ScopeRule, MachineEnforcement, Decision, evaluate_target, evaluate_command) in the public-API stability manifest and bump its count 69 -> 75, per the manifest's documented update process. CHANGELOG is release-curated (no Unreleased section), so it's left to the release flow. Verified: ruff clean, basedpyright 0 errors, decepticon-core suite green (88 passed) incl. the langchain-free guarantee. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…vent loop) `http_request` was a *sync* `@tool` that drove its async `HTTPSession` via `asyncio.get_event_loop().run_until_complete(_do())`. Inside LangGraph's running event loop that raises `RuntimeError: ... cannot be called from a running event loop`, so the tool would fail the moment a web agent is wired to it. It is latent today only because no standard agent imports `WEB_TOOLS`. Every other network tool (`bash_*`) is already an async `@tool`. Convert `http_request` to `async def` and `await` the session directly, dropping the `_do` wrapper, the `run_until_complete` call, and the local `asyncio` import. Behavior is otherwise identical. Adds the first tests for the web tool layer: invoking the tool from *within* a running loop (the exact failure condition) with a stubbed session, plus the invalid-headers path. Verified: ruff clean, basedpyright 0 errors, new tests pass (2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds a self-contained, additive MCP (Model Context Protocol) client so operators can connect external MCP tool servers (Kali MCP, HexStrike, etc.) and expose their tools to Decepticon agents. - Config via DECEPTICON_MCP__SERVERS (JSON), matching the existing DECEPTICON_<SECTION>__<KEY> convention. - langchain-mcp-adapters imported lazily inside load_mcp_tools(); absent package => one-line warning + empty list (agents keep working). - Per-server isolation: one bad endpoint logs+skips, others still load. - No dependency/lockfile changes (optional package documented as manual install; locked extra to follow). 17 mocked tests, no network.
…reds Covers analyze_gpo_abuse, analyze_delegation, analyze_shadow_credentials and all pure helpers (_is_sensitive_ou, _is_dc, _spn_targets_dc) with happy paths, edge cases, and dangling-node robustness tests (66 tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rs and tool builder 59 tests covering _strip_frontmatter, _read_via_backend, _list_dir_via_backend, _validate_skill_path, _format_skill_body, and build_load_skill_tool end-to-end using a fake backend — no external services required. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ilent dead port `build_grpc_server` constructed a `grpc.Server`, bound `0.0.0.0:<port>`, and returned an UNREGISTERED servicer — it never called `add_*Servicer_to_server` (there are no protoc-generated bindings for `skillogy.proto`) and the hand-rolled servicer returned plain Python objects with no message serializers. So when grpcio was installed the launcher opened a port that accepted connections but answered every RPC (ListSkills / LoadSkill / IngestSkill / Health) with `UNIMPLEMENTED` — a silent dead port that *looks* healthy. REST is the only wired transport (`RestSkillogyClient` → :9100; the skillogy middleware connects over REST). Make `build_grpc_server` raise `RuntimeError` with a clear message. `__main__._start_grpc` already catches `RuntimeError` and degrades to REST, so the service boots correctly with no dead port. Removed the dead servicer + `_RawResponse`/`_HealthResp` scaffolding and the now-unused `SkillMeta` import; updated the module docstring. Adds tests: `build_grpc_server` raises (message names gRPC + REST), and `_start_grpc` returns None (graceful REST degrade). Verified: ruff clean, basedpyright 0 errors, skillogy suite 36 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d) and BlueCellTap transforms Covers JSON output shapes and error paths for solidity_scan, solidity_scan_file, slither_ingest, foundry_* wrappers, export_session_asciicast, list_session_recordings, iam_policy_audit, s3_buckets_from_text, user_data_secrets, k8s_audit, tfstate_audit, metadata_endpoints, plus _strip_ansi, _parse_line_to_event, TapEvent.to_dict, and BlueCellTap.read_batch/follow. All IO mocked; passes fully offline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds test_defense_push.py covering HTTP push functions (mocked requests) for splunk, elastic, sentinel, and EDR (Defender XDR + CrowdStrike), plus converter edge cases, severity/logsource mappings, and ConOps helper paths not covered by the existing test_defense.py — 104 tests total. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ing internals Adds 130 new tests covering binary.py, strings.py, symbols.py, rop.py, packer.py, and scripts.py — ELF32/big-endian/Mach-O/WASM formats, all four RET opcodes, every packer signature, crypto/email/version/import string categories, UTF-16LE extraction, symbol buckets and risk scoring, and to_dict serialization for every dataclass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ty modules Adds 111 new tests across three files: - test_chain.py: full coverage of chain.py (ChainStep, Chain dataclass, compute_edge_cost, plan_chains, promote_chain, critical_path_score, impact_analysis, unexplored_surface, credential_reachability) — all Neo4j calls mocked via _FakeStore. - test_cve_extended.py: _Cache (TTL, LRU eviction, persistence, flush), _rehydrate, async lookup_cve/lookup_cves/lookup_package with mocked httpx, plus NVD/EPSS parser edge cases. - test_bounty_extended.py: bounty_scope_check (domain wildcards, exclusions, normalization) and format_bounty_report (validated/unvalidated findings, severity labels, title stripping) with mocked KG store and filesystem. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…and executive.py helpers 63 deterministic tests covering all LangChain @tool wrappers (report_hackerone, report_bugcrowd_csv, report_executive, report_timeline) with mocked _load, plus deep edge-case coverage of _count_by_severity, _top_chains, _top_cves, and render_executive_summary (validated-findings cap, severity ordering, default fallbacks, graph stats, empty-graph branches). All pass offline with no services. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Covers _default_cache_root env-var override, ReferenceCache.to_dict, _entry, _dir_size (OSError paths), _run_git hardened-env contract, ensure_cached clone/pull/URL-mismatch/symlink/non-git-hint paths, _which, _parse_grep_line edge cases, _pyfind pure-Python fallback, and search_cache grep/timeout/FileNotFoundError/max_results paths. All network and subprocess calls mocked; 55 pass, 2 platform-skipped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add packages/decepticon/tests/unit/cli/test_scan_extra.py covering _git_diff_files (subprocess success/failure/timeout/OSError), _emit_jsonl_event, _dispatch_scan_via_sdk (missing SDK, happy-path streaming, asyncio timeout), _load_findings_graph (missing file, env-var workspace, load exception), and main() end-to-end paths (no-target, instruction-file error, diff-scope fallback, SDK RuntimeError/generic exception, happy-path, SARIF output, engagement-name/timeout overrides, non-interactive forwarding). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y ad wrappers Covers ImportStats, _node_kind_for_bh, _BH_EDGE_MAP, _upsert_bh_object, _build_bh_index, _ingest_aces, _ingest_memberships, merge_bloodhound_json (dict/str/list/error paths), ingest_bloodhound_zip, and the bh_ingest_zip / bh_ingest_json @tool wrappers (JSON output shapes + error paths). All 81 tests are offline and deterministic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…G CRUD, tier-2 ingesters, and fuzz tools Adds test_tools_extended.py with comprehensive coverage of uncovered surfaces in tools/research/tools.py: _parse_props, _severity_from_score/ string, _is_web_port, _jwt/_cookie_finding_severity, all four dependency- file parsers, kg_add_node/edge/query/neighbors/stats, kg_ingest_subfinder/ dnsx/katana/masscan/ffuf/testssl/crackmapexec/asrep_hashes, fuzz_harness, fuzz_record_crash, and error paths throughout. All mocked offline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…task-tree formatter tools/opplan.py had zero direct unit coverage despite holding the OPPLAN status state machine and the agent-facing renderer. Adds 23 tests for: - _is_valid_transition / _valid_next / _VALID_TRANSITIONS: the full pending -> in-progress -> completed/blocked/cancelled matrix, terminal states, unknown sources, no self-transitions, sorted next-state hints, and a guard that the table only references real ObjectiveStatus values. - _build_opplan_payload: per-status summary counts, id-stable ordering (clean git diffs), zeroed empty-plan summary, and round-trip back into the OPPLAN model (wrapper fields dropped). - _format_opplan_for_agent: header/progress, priority-sorted table, blocked_by joining + owner fallback, hierarchical task tree, status markers, the lowest-priority 'Next' recommendation, all-complete and no-actionable branches, and a cycle-guard regression test proving a malformed/injected duplicate-id tree cannot drive unbounded recursion. Pure-logic only; no network/docker/LLM.
…hecks
A trailing dot is DNS-equivalent ("metadata.google.internal." resolves
identically to "metadata.google.internal"), but `_matches_rule` compared
hosts with only `.lower()` — no trailing-dot normalization. In ENFORCE
mode this let the FQDN form slip past the forbidden-destination guard AND
any operator `out_of_scope` host rule:
evaluate_target("metadata.google.internal.", enforce) -> allow=True
evaluate_target("169.254.169.254.", enforce) -> allow=True
i.e. the agent could reach the cloud metadata service and exfiltrate
service-account credentials despite the built-in IMDS deny list, with the
audit ledger recording the call as "allow". The IMDS-IP form additionally
failed `ip_address()` parsing and fell through to default-allow.
Strip a trailing dot (keeping the existing case-fold) on BOTH the rule
pattern and the target in `_matches_rule`, across host, domain-glob, and
CIDR matching — so forbidden_destinations, out_of_scope, and in_scope are
all normalized consistently. Legitimate in-scope FQDN-form hosts still
match; the raw target is preserved in the decision detail for the audit.
Adds regression tests (TestFqdnTrailingDotNormalization) for the
forbidden-destination, out-of-scope, and in-scope paths in both host and
IMDS-IP forms.
Verified: ruff clean, basedpyright 0 errors, test_roe + decepticon-core
suite 130 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`SandboxBase._normalize_workspace_path` validated each path component
against `[A-Za-z0-9_.-]{1,128}`, which accepts "." and "..". So
`_normalize_workspace_path("/workspace/../../etc")` returned the string
verbatim, and that traversable path flowed into the sandbox tmux/file
operations meant to be confined to `/workspace/<engagement>` — escaping
the per-engagement subtree. The sibling EngagementFilesystem layer
already guards this (`middleware/filesystem.py` uses `posixpath.normpath`
plus a documented test), but the sandbox_kernel + bash callers invoke
`_normalize_workspace_path` directly with no such guard.
Add the same fail-closed guard: if `posixpath.normpath(path) != path`,
return the safe `/workspace` default. Catches "..", ".", and "//"
traversal while preserving legitimate (incl. dotted) directory names.
posixpath (not os.path) so Windows hosts don't get "/"->"\\" rewrites.
Adds tests/unit/sandbox_kernel/test_workspace_path.py: legit paths
preserved + traversal/escape forms fail closed.
Verified: ruff clean, basedpyright 0 errors, sandbox_kernel suite green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ion eval, loaders test_rule_match.py exercised happy-path matching but left the silent-failure-prone internals at 77%. Adds 27 tests lifting rule_match.py to 99%: - _event_field: nested dotted paths, missing keys, non-dict mid-path, None and list values (a wrong return makes a rule silently never fire). - _compile_pattern: literal escaping vs re: regex mode, case-insensitivity. - _evaluate_condition: empty-condition all()/no-selection branches, unknown selection names, and malformed expressions — proving the sandboxed eval path fails closed (False) rather than raising. - load_rules / _load_from_jsonl / _load_from_json / _rule_from_dict: JSONL + directory loading, blank/malformed/no-id line skipping, list/single/scalar JSON shapes, match-shorthand, missing-file and nonexistent-path non-fatal handling — the untrusted-rule-file parser. Pure-logic + tmp_path only; no network/docker/LLM.
`_refresh_tokens` raised `litellm.AuthenticationError` with the raw token
endpoint response interpolated into the message:
message=f"Codex ChatGPT refresh response missing fields: {data}"
On a partial-but-successful refresh (e.g. `access_token` present but
`id_token` missing) `data` carries the freshly-minted access token and
the rotated refresh token verbatim — which then land in logs and the
caller-visible error. Interpolate only the (non-sensitive) field NAMES
that were present, never their values.
The handler runs inside the litellm container (litellm is not a dev/test
dependency), so it has no standard-suite unit test; this is a one-line
redaction verified by inspection. ruff clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…idates) `UntrustedOutputMiddleware` wraps the output of every tool in `UNTRUSTED_TOOL_NAMES` in an `<UNTRUSTED_TOOL_OUTPUT>` envelope with a heuristic risk score, so attacker-influenceable bytes reach the model marked as data (and high-risk hits are logged to the quarantine ledger). The allowlist omitted the scanner prefilter tools: `scan_shard` walks `/workspace/target` and returns raw code snippets, and `rank_candidates` re-emits those hits. An injection payload planted in a scanned target file therefore reached the scanner agent's model UNwrapped — never enveloped, risk-scored, or recorded. Add `scan_shard` + `rank_candidates` to `UNTRUSTED_TOOL_NAMES`, and a regression test asserting both are enveloped with the right `origin`. Verified: ruff clean, basedpyright 0 errors, test_untrusted_output 42 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
added 22 commits
May 30, 2026 21:40
# Conflicts: # config/codex_chatgpt_handler.py
# Conflicts: # clients/web/src/app/api/engagements/[id]/route.ts
…ng 80 PRs - defense (#364/#416 tests vs #417 source): quote/escape SPL expectations; add sha256 indicator to YARA fixtures + KQL queryText assertion for the new Defender push behavior - recording (#385 test vs #430 source): _tool_request uses tool_call.args - backends (#418 test vs #400 source): assert workspace-path normalization - llm oauth/gemini (#353/#401): force-load real oauth_token_store under the bare name so a sibling test's partial stub can't shadow it under -n auto - ollama (#441): ruff format
Adds 9 credentials-aware AuthMethods for OpenAI-compatible LLM gateways / aggregators, bringing Decepticon closer to oh-my-pi's provider breadth. Each routes through LiteLLM's openai/ provider with a fixed api_base override (the proven xiaomi_mimo / custom pattern), now table-driven via OPENAI_COMPAT_GATEWAYS so the batch shares one code path: opencode OpenCode Zen https://opencode.ai/zen/v1 vercel Vercel AI Gateway https://ai-gateway.vercel.sh/v1 hf Hugging Face Router https://router.huggingface.co/v1 venice Venice AI https://api.venice.ai/api/v1 nanogpt NanoGPT https://nano-gpt.com/api/v1 synthetic Synthetic https://api.synthetic.new/openai/v1 zenmux ZenMux https://zenmux.ai/api/v1 qianfan Baidu Qianfan (ERNIE) https://qianfan.baidubce.com/v2 cfgateway Cloudflare AI Gateway per-account base URL The model alias keeps the gateway prefix (opencode/claude-opus-4-6) so two gateways exposing the same upstream slug never collide in the LiteLLM model_list. Wired end-to-end exactly like the existing providers: - AuthMethod enum + METHOD_MODELS HIGH/MID/LOW matrix (types/llm.py) - OPENAI_COMPAT_GATEWAYS table + build_model_entry branch (litellm_dynamic_config.py) - static model_list entries + within-gateway fallback chains (litellm.yaml) - env-var credential auto-detection, default priority, CLI labels (factory.py) — auth_inventory + `decepticon-cli auth` pick them up automatically - .env.example keys, /model catalog (model.ts), setup-guide.md table Base URLs + model ids verified against each provider's current public docs and oh-my-pi's maintained provider catalog. Kimi-for-Coding was evaluated and dropped: its coding/v1 endpoint enforces a coding-agent client whitelist and requires the kimi-for-coding model id, so it is not usable through a generic LiteLLM proxy (Moonshot is already covered by moonshot_api). Cursor / GitLab Duo / Qwen Portal are deferred — they need proprietary or OAuth-device protocols that don't fit the OpenAI-compatible-via-LiteLLM path. Like the existing cerebras / xiaomi_mimo additions, these are configured via .env, not the Go onboard wizard. Tests: gateway routing / alias-collision / validation (test_litellm_dynamic_config.py), tier resolution + prefix invariants (test_models.py), credential detection (test_auth.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ith 'import' and 'import from'' Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Brings the integration tree up to date with main (v1.1.4-v1.1.6 releases: skillogy publish fix, native arm64 runners, CHANGELOG; #443 make-as-CI- source-of-truth; #444-#448). Conflict: ci.yml coverage line — kept main's 'make ci-test-coverage' dispatch (#443) and carried #380's 35->60 coverage floor into the Makefile target so both intents hold.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Integrates all 80 open PRs (#352–#441) into a single, conflict-free, fully-tested tree.
Each PR was individually green, but the repo ruleset uses
strict_required_status_checks_policy: false, so per-PR checks cannot detect cross-PR breakage. This branch merges every PR locally, resolves conflicts, and reconciles the cross-PR interactions so the combined tree is green.Textual conflicts resolved
codex_chatgpt_handler.py(kept feat: wire P0 middleware/runtime modules + security & correctness hardening #353's).SLUG_RE, consistent with the POST route).Cross-PR behavior/isolation reconciliations (stale test expectations updated to match intended new source; no tests removed/gamed)
queryText.tool_call.args→ updated test(runtime/recording): cover serializers + record/replay round-trip — 45%→96% #385_tool_requesthelper.oauth_token_storeunder its bare name so a sibling test's partial stub can't shadow it underpytest -n auto.Verification (mirrors the CI Python gate)
ruff check .— passruff format --check .— passbasedpyright— 0 errorspytest -n auto -m "not slow"— 3510 passed, 28 skipped, 0 failedMerging this with a merge commit makes every PR's head commit reachable from
main, so all 80 PRs will be auto-marked Merged.