Skip to content

fix(llm): authenticate Ollama Cloud via OLLAMA_CLOUD_API_KEY + add cloud probe#441

Open
VoidChecksum wants to merge 2 commits into
mainfrom
fix/ollama-cloud-api-key
Open

fix(llm): authenticate Ollama Cloud via OLLAMA_CLOUD_API_KEY + add cloud probe#441
VoidChecksum wants to merge 2 commits into
mainfrom
fix/ollama-cloud-api-key

Conversation

@VoidChecksum
Copy link
Copy Markdown
Collaborator

Problem

A Discord user on Ollama Cloud (WSL2) is stuck in the Soundwave engagement-planning interview loop on 1.1.3, even after reinstalling. The 1.1.3 changelog fix for the "Soundwave interview loop" only addressed the " (Recommended)" suffix leak (#328) — a different root cause.

The actual bug for Ollama Cloud is an environment-variable name mismatch:

  • The onboard wizard (clients/launcher/cmd/onboard.go) and setup docs (docs/setup-guide.md) write the key as OLLAMA_CLOUD_API_KEY.
  • The runtime read OLLAMA_API_KEY in config/litellm_dynamic_config.py and the cloud model resolver.

Nothing connected the two (grep confirms: OLLAMA_CLOUD_API_KEY was only ever written, never read). So a user who completes decepticon onboard for Ollama Cloud sends an empty Bearer token to https://ollama.com/v1401 on every turn → the agent never advances past the Soundwave interview = "stuck in the loop."

Fix

1. Key mismatch (the loop) — read OLLAMA_CLOUD_API_KEY first, fall back to OLLAMA_API_KEY (the official Ollama convention), in:

  • config/litellm_dynamic_config.py (Bearer auth wiring)
  • packages/decepticon-core/.../types/llm.py (_resolve_ollama_cloud_model)
  • packages/decepticon/.../llm/factory.py (_ollama_cloud_configured opt-in — now consistent with the resolver, which already treated the key as an opt-in signal)

No existing setup breaks; the wizard/docs already use the _CLOUD_ form, which now works.

2. Cloud tool-capability probe — the startup probe (config/ollama_probe.py / litellm_startup.py) only ever ran against local Ollama (OLLAMA_API_BASE, host.docker.internal). A cloud-only user (empty local base) short-circuited on "OLLAMA_API_BASE is empty" and got no cloud diagnostic. Added:

  • probe_cloud() — Bearer-authenticated reachability + /api/show capability check, targeting the native API root (strips the OpenAI-compat /v1 path).
  • Local/cloud prefix separation so litellm_startup probes each family independently.
  • Clear remediation lines for a missing key (401) or a non-tool-capable cloud model at boot.

Out of scope (flagged, not changed)

The default cloud model _OLLAMA_CLOUD_DEFAULT_MODEL = "llama3.2" (llm.py) isn't a real Ollama Cloud model — a user who sets base/key but omits OLLAMA_CLOUD_MODEL will 404. The new probe now surfaces this, but the default itself is left unchanged pending a maintainer decision.

Immediate workaround (for affected users, no rebuild)

Edit ~/.decepticon/.env:

OLLAMA_API_KEY=<same value as OLLAMA_CLOUD_API_KEY>
OLLAMA_CLOUD_API_BASE=https://ollama.com/v1
OLLAMA_CLOUD_MODEL=qwen3-coder:480b   # a real tool-capable cloud model, not the llama3.2 default

then decepticon restart.

Verification

  • +8 unit tests (key precedence, prefix filtering, native-base normalization, probe_cloud auth/401/capability).
  • Full tests/unit/llm/ + test_auth.py suites: 234 passed, 2 skipped (Windows-only).
  • ruff check clean; basedpyright 0 errors/0 warnings.

🤖 Generated with Claude Code

VoidChecksum and others added 2 commits May 30, 2026 03:29
…al red-team references

A disciplined harvest of the provided reference repos: each was gap-checked
against the existing 227-skill KB; only genuine, non-duplicative capability
gaps were authored as operator-grade, ATT&CK-mapped playbooks (with
authorized-use caveats). Duplicative/meta candidates were dropped, and
narrower techniques deferred with rationale (see PR body).

Added (source → skill):
- PayloadsAllTheThings → exploit/web/php-type-juggling
- Strix               → exploit/web/header-injection (CRLF + Host-header
                        poisoning / password-reset / X-Forwarded), exploit/web/bfla
- hackingBuddyGPT     → post-exploit/linux-privesc-enum
- 13o-bbr ML-Security → analyst/adversarial-ml-evasion, analyst/ml-model-extraction
- HexStrike-AI        → reverser/ctf-triage
- PentestGPT          → decepticon/pentest-task-tree (PTT reasoning)
- MCP-Kali-Server     → decepticon/kali-mcp-bridge (drives the new MCP client)
- CAI (alias robotics)→ iot/ros2-dds-attack, post-exploit/network-replay

Dropped as duplicative: crlf-injection (subsumed by header-injection),
reva-ghidra-mcp + reva-pyghidra-scripting (reverser/ghidra already covers the
Ghidra MCP bridge + scripting), autonomous-pentest-agent-ops + ai-reactive-loop
(meta; overlap decepticon/orchestration).

Verified: all 11 parse via both frontmatter parsers + discovered by
iter_skill_records; ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…probe

The onboarding wizard and setup docs write the Ollama Cloud key as
OLLAMA_CLOUD_API_KEY, but the runtime only ever read OLLAMA_API_KEY.
Nothing connected the two, so a user who completed `decepticon onboard`
sent an empty Bearer token to https://ollama.com/v1 and 401'd on every
turn — never advancing past the Soundwave interview (reported on Discord;
distinct from the " (Recommended)" suffix loop fixed in 1.1.3 / #328).

- Read OLLAMA_CLOUD_API_KEY first, fall back to OLLAMA_API_KEY (the
  official Ollama var) in litellm_dynamic_config, the cloud model
  resolver, and the cloud opt-in detector. No existing setup breaks;
  wizard/docs already use the _CLOUD_ form.
- Add probe_cloud(): Bearer-authenticated reachability + tool-capability
  check against the native /api/show endpoint (strips the OpenAI-compat
  /v1 path). litellm_startup now probes local and cloud separately, so a
  cloud-only user (empty OLLAMA_API_BASE) gets a real diagnostic instead
  of the silent "OLLAMA_API_BASE is empty" short-circuit. Missing key or
  non-tool-capable cloud model prints clear remediation at boot.

Tests: +8 (key precedence, prefix filtering, native-base, probe_cloud
auth/401/capability). Full llm + auth suites green; ruff + basedpyright
clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant