Skip to content

skgandikota/coracle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

coracle

License: CC BY-NC-SA 4.0 Status: Pre-alpha Python 3.11+ Platform: macOS (Apple Silicon) Agent-friendly

coracle

License: CC BY-NC-SA 4.0 Status: Pre-alpha Python 3.11+ Platform: macOS (Apple Silicon) Agent-friendly

A small, RAM-aware AI gateway that carries you across big AI seas. Fits on a 16GB Mac M1, free-tier friendly.

A personal-machine AI coracle that intelligently splits work between free-tier "big" cloud AI (planning) and local Ollama models (reasoning + execution), without ever spiking RAM enough to crash the machine. Built to be consumed as a drop-in OpenAI-compatible "model" by opencode, Claude Code, codex, Cursor, Continue, etc.

Why this exists

Big AI models are great at planning. Small local models are great at executing. Free API tiers run out. Browser-driven web AIs are flaky. RAM on a 16GB Mac is precious. None of the existing tools combine all of these gracefully — so this one does:

  • Resident reasoning model (qwen2.5:7b) classifies every request and routes it to the right pipeline.
  • Big AI (Gemini, Groq, Ollama Cloud, headless-browser fallback to Claude.ai/ChatGPT/Gemini-web) handles deep planning when the classifier asks for it.
  • Coder model (qwen2.5-coder:7b) executes steps locally with a full tool belt (fs, shell, web, browser, git).
  • Single-LLM-slot scheduler ensures only one 7B model is in RAM at a time.
  • SQLite job state powers instant status responses with zero RAM cost.
  • One model name to the consumer: coracle. Auto-routing is invisible.

Architecture at a glance

opencode / Claude Code / codex
            │  (OpenAI-compatible /v1/chat/completions)
            ▼
┌─────────────────────────────────────────────────────────────┐
│ Resident reasoning model (qwen2.5:7b) — CLASSIFIER          │
│  → fast | deep | research | status                           │
└─────────────────────────────────────────────────────────────┘
            │
   ┌────────┼────────┬─────────────────┐
   ▼        ▼        ▼                 ▼
 status   fast      deep             research
 (DB     (local-   (reason →         (deep + web
  read)  only)     big AI →          tools biased)
                   parse →
                   coder →
                   verify)

Full design details: docs/PLAN.md.

Run with Docker

Multi-arch (linux/amd64 + linux/arm64) images are published to GHCR by the release-image workflow. Two variants:

  • ghcr.io/skgandikota/coracle — slim runtime, no browser deps.
  • ghcr.io/skgandikota/coracle-browser — slim + Playwright/Chromium.
docker run --rm -p 8000:8000 \
  -v "$HOME/.config/coracle:/etc/coracle" \
  -v "$HOME/.local/share/coracle:/var/lib/coracle" \
  ghcr.io/skgandikota/coracle:latest

Tags: :latest (newest semver), :vX.Y.Z / :vX.Y / :vX (per release), :edge (head of main). See docs/RELEASES.md for the release process and verification steps.

Integrations

Per-tool how-to guides for plugging coracle into the coding agents that consume it as either an MCP server or an OpenAI-compatible model:

Tool Guide Status
Claude Code docs/integrations/claude-code.md ✅ documented
opencode coming via #23 🚧 placeholder
codex coming via #25 🚧 placeholder

How is this different from LiteLLM?

Short version: LiteLLM is a paid-API gateway built for throughput; coracle is a personal-machine scheduler built for $0 budgets and a 16GB RAM ceiling. We use LiteLLM's SDK as our provider abstraction, but the product is a different thing entirely — see docs/VS_LITELLM.md for the full table.

LiteLLM coracle
Cost model Pay-per-token $0 — free tiers + local + headless-browser fallback
Topology Stateless proxy Stateful job coracle
Inference Cloud-first Local-first
RAM target Server-class 16GB Mac M1
Tool execution Caller's job Coracle runs the tools (sandbox + MCP)
Status / progress None First-class, never loads an LLM

Status

🚧 Pre-alpha — implementation underway.

Skeleton (package layout, settings loader, structured logging) landed in #31.

Issues are organized into 7 phases (Phase 1 → Phase 7) tracked via GitHub Milestones. Each phase has an Epic issue summarizing scope and linking to its sub-tasks.

This project is agent-friendly: every issue contains enough context, acceptance criteria, file paths, and definition-of-done that a coding agent (or human contributor) can pick it up cold, clone the repo, and submit a PR.

How to contribute (humans and agents)

  1. Pick a ready issue (label: status:ready) — these have no unresolved dependencies.
  2. Read the issue's Context, Acceptance Criteria, and Definition of Done.
  3. Reference docs/PLAN.md for the bigger picture.
  4. Open a PR linking the issue (Closes #N).
  5. Follow CONTRIBUTING.md.
  6. PRs are reviewed by a layered AI bot stack — see docs/REVIEW_BOTS.md. Only our strict code-reviewer-001 bot has merge authority; it waits for the AI bots to weigh in before approving.

Tech stack

Concern Choice
Language Python 3.11+
Local models Ollama (qwen2.5:7b, qwen2.5-coder:7b)
Big AI providers litellm → Gemini, Groq, Ollama Cloud + Playwright headless fallback
External interface OpenAI-compatible HTTP (primary) + MCP stdio + native HTTP + CLI
Server FastAPI + Uvicorn
State SQLite
Browser Playwright (headless, separate subprocess per provider)
RAM monitor psutil

Hardware target

Mac M1 Pro, 16 GB RAM. Designed to never exceed ~11 GB resident.

Wiring external MCP servers

The coracle can consume any number of remote/cloud MCP servers as local tools. Copy the example config and edit it:

cp config/mcp_servers.yaml.example config/mcp_servers.yaml
# edit config/mcp_servers.yaml — supports stdio | http | sse transports
coracle mcp list      # show connected servers + tool counts
coracle mcp reload    # re-read the config without restarting

Environment variables in the config (e.g. ${GITHUB_TOKEN}) are expanded at load time, so secrets stay out of source control.

License

Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

License: CC BY-NC-SA 4.0

You are free to share and adapt the material under these terms:

  • Attribution — credit the original author and link to the license.
  • NonCommercial — no commercial use.
  • ShareAlike — distribute derivative works under the same license.

See LICENSE for the full legal text.

About

Personal-machine AI orchestrator: routes work between free-tier big AI (Gemini/Groq/Ollama Cloud + browser fallbacks) and local Ollama models (qwen2.5 + qwen2.5-coder) with a single-LLM-slot scheduler. Drop-in OpenAI-compatible model for opencode, Claude Code, codex, Cursor, Continue.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors