A small, RAM-aware AI gateway that carries you across big AI seas. Fits on a 16GB Mac M1, free-tier friendly.
A personal-machine AI coracle that intelligently splits work between free-tier "big" cloud AI (planning) and local Ollama models (reasoning + execution), without ever spiking RAM enough to crash the machine. Built to be consumed as a drop-in OpenAI-compatible "model" by opencode, Claude Code, codex, Cursor, Continue, etc.
Big AI models are great at planning. Small local models are great at executing. Free API tiers run out. Browser-driven web AIs are flaky. RAM on a 16GB Mac is precious. None of the existing tools combine all of these gracefully — so this one does:
- Resident reasoning model (
qwen2.5:7b) classifies every request and routes it to the right pipeline. - Big AI (Gemini, Groq, Ollama Cloud, headless-browser fallback to Claude.ai/ChatGPT/Gemini-web) handles deep planning when the classifier asks for it.
- Coder model (
qwen2.5-coder:7b) executes steps locally with a full tool belt (fs, shell, web, browser, git). - Single-LLM-slot scheduler ensures only one 7B model is in RAM at a time.
- SQLite job state powers instant status responses with zero RAM cost.
- One model name to the consumer:
coracle. Auto-routing is invisible.
opencode / Claude Code / codex
│ (OpenAI-compatible /v1/chat/completions)
▼
┌─────────────────────────────────────────────────────────────┐
│ Resident reasoning model (qwen2.5:7b) — CLASSIFIER │
│ → fast | deep | research | status │
└─────────────────────────────────────────────────────────────┘
│
┌────────┼────────┬─────────────────┐
▼ ▼ ▼ ▼
status fast deep research
(DB (local- (reason → (deep + web
read) only) big AI → tools biased)
parse →
coder →
verify)
Full design details: docs/PLAN.md.
Multi-arch (linux/amd64 + linux/arm64) images are published to GHCR by the
release-image workflow. Two variants:
ghcr.io/skgandikota/coracle— slim runtime, no browser deps.ghcr.io/skgandikota/coracle-browser— slim + Playwright/Chromium.
docker run --rm -p 8000:8000 \
-v "$HOME/.config/coracle:/etc/coracle" \
-v "$HOME/.local/share/coracle:/var/lib/coracle" \
ghcr.io/skgandikota/coracle:latestTags: :latest (newest semver), :vX.Y.Z / :vX.Y / :vX (per release),
:edge (head of main). See docs/RELEASES.md for the
release process and verification steps.
Per-tool how-to guides for plugging coracle into the coding agents that consume it as either an MCP server or an OpenAI-compatible model:
| Tool | Guide | Status |
|---|---|---|
| Claude Code | docs/integrations/claude-code.md |
✅ documented |
| opencode | coming via #23 | 🚧 placeholder |
| codex | coming via #25 | 🚧 placeholder |
Short version: LiteLLM is a paid-API gateway built for throughput; coracle is a personal-machine scheduler built for $0 budgets and a 16GB RAM ceiling. We use LiteLLM's SDK as our provider abstraction, but the product is a different thing entirely — see docs/VS_LITELLM.md for the full table.
| LiteLLM | coracle |
|
|---|---|---|
| Cost model | Pay-per-token | $0 — free tiers + local + headless-browser fallback |
| Topology | Stateless proxy | Stateful job coracle |
| Inference | Cloud-first | Local-first |
| RAM target | Server-class | 16GB Mac M1 |
| Tool execution | Caller's job | Coracle runs the tools (sandbox + MCP) |
| Status / progress | None | First-class, never loads an LLM |
🚧 Pre-alpha — implementation underway.
Skeleton (package layout, settings loader, structured logging) landed in #31.
Issues are organized into 7 phases (Phase 1 → Phase 7) tracked via GitHub Milestones. Each phase has an Epic issue summarizing scope and linking to its sub-tasks.
This project is agent-friendly: every issue contains enough context, acceptance criteria, file paths, and definition-of-done that a coding agent (or human contributor) can pick it up cold, clone the repo, and submit a PR.
- Pick a ready issue (label:
status:ready) — these have no unresolved dependencies. - Read the issue's Context, Acceptance Criteria, and Definition of Done.
- Reference
docs/PLAN.mdfor the bigger picture. - Open a PR linking the issue (
Closes #N). - Follow
CONTRIBUTING.md. - PRs are reviewed by a layered AI bot stack — see
docs/REVIEW_BOTS.md. Only our strictcode-reviewer-001bot has merge authority; it waits for the AI bots to weigh in before approving.
| Concern | Choice |
|---|---|
| Language | Python 3.11+ |
| Local models | Ollama (qwen2.5:7b, qwen2.5-coder:7b) |
| Big AI providers | litellm → Gemini, Groq, Ollama Cloud + Playwright headless fallback |
| External interface | OpenAI-compatible HTTP (primary) + MCP stdio + native HTTP + CLI |
| Server | FastAPI + Uvicorn |
| State | SQLite |
| Browser | Playwright (headless, separate subprocess per provider) |
| RAM monitor | psutil |
Mac M1 Pro, 16 GB RAM. Designed to never exceed ~11 GB resident.
The coracle can consume any number of remote/cloud MCP servers as local tools. Copy the example config and edit it:
cp config/mcp_servers.yaml.example config/mcp_servers.yaml
# edit config/mcp_servers.yaml — supports stdio | http | sse transports
coracle mcp list # show connected servers + tool counts
coracle mcp reload # re-read the config without restartingEnvironment variables in the config (e.g. ${GITHUB_TOKEN}) are expanded
at load time, so secrets stay out of source control.
Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).
You are free to share and adapt the material under these terms:
- Attribution — credit the original author and link to the license.
- NonCommercial — no commercial use.
- ShareAlike — distribute derivative works under the same license.
See LICENSE for the full legal text.