Building magic-code: An Open-Source AI Coding Agent That Runs on Your Hardware #73
kienbui1995
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Building magic-code: An Open-Source AI Coding Agent That Runs on Your Hardware
How we built a TUI coding agent in Rust, tested it with 274 scenarios across 5 platforms, and made it work with a $600 GPU.
The Problem
AI coding assistants are powerful — but they come with trade-offs. Cloud-based tools send your code to external servers. Proprietary agents lock you into specific providers. And the costs add up fast.
We wanted something different: a coding agent that's fast, private, and runs on your own hardware. That's why we built magic-code.
What is magic-code?
magic-code is an open-source TUI (terminal UI) agentic AI coding agent built in Rust. It works with any LLM provider — from Claude and GPT to self-hosted models like Qwen 3.5 on your own GPU.
The agent reads your code, plans changes, edits files, runs tests, and iterates — all from your terminal.
Key numbers
Architecture: 6 Crates, Zero Coupling
The strict rule:
mc-providerandmc-toolsnever depend on each other. Onlymc-coreorchestrates them. This keeps the codebase maintainable as it grows.The Self-Hosted Challenge
Our primary goal was making magic-code work well with Qwen 3.5 9B — a model that runs on a single RTX 4070 Ti. This is a fundamentally different challenge than building for Claude or GPT-4.
What we learned
1. Small models need explicit instructions
With Claude, you can say "add a greet function" and it figures out the rest. With Qwen 9B, you need "read src/lib.rs then add a greet function using edit_file." We built a 4-tier prompt system that adapts instructions based on model capability:
2. Thinking mode and tool calling don't mix (yet)
We discovered that Qwen 3.5 with vLLM's
--reasoning-parser qwen3puts tool calls inside thinking blocks — which the tool call parser can't extract. The fix: disable thinking when tools are present, re-enable for pure Q&A. This is actually recommended by the Qwen team.3. Context window matters more than model size
Qwen 3.5 9B with 256K context on vLLM outperforms larger models with smaller context windows for real coding tasks. We added Qwen to our model registry with proper context window settings and adaptive compaction thresholds.
Testing: 274 Scenarios, 5 Platforms, Honest Results
We built a comprehensive golden test suite to evaluate magic-code across different languages and app types. Every scenario runs in a Docker sandbox with a fresh project, and results are verified by checking actual file contents — not just "did the model respond."
Test structure
Verification levels
We don't just check if the model responded. We verify:
Results: Qwen 3.5 9B (self-hosted, RTX 4070 Ti)
Overall: 60% verified correct across 110 platform scenarios.
We're sharing these numbers honestly. A 9B model on a single GPU won't match Claude Sonnet — but it handles Python and Go tasks well, and it costs nothing to run.
Where Qwen 9B excels
Where it struggles
Comparison: Gemini 2.5 Pro via LiteLLM
Gemini 2.5 Pro scores significantly higher — but it's a cloud model. The beauty of magic-code is you can switch between models with a single flag:
What Makes magic-code Different
1. Provider agnostic
15+ providers out of the box. Anthropic, OpenAI, Gemini, Groq, DeepSeek, Mistral, Ollama, LiteLLM, vLLM — or any OpenAI-compatible endpoint.
2. Full agentic loop
Not just code completion. magic-code runs a ReAct loop: read code → plan → edit → run tests → iterate. It has 30 built-in tools including file operations, search, bash, browser, memory, and MCP support.
3. Context engineering
Smart compaction keeps conversations going without losing important context. Repo maps (via tree-sitter) give the model project awareness without reading every file. Memory persists facts across sessions.
4. Security by default
5. Headless mode
Integrate magic-code into CI/CD pipelines:
Installation
Self-Hosted Setup
Run Qwen 3.5 9B with vLLM:
vllm serve QuantTrio/Qwen3.5-9B-AWQ \ --port 8300 \ --gpu-memory-utilization 0.95 \ --max-model-len 262144 \ --quantization awq_marlin \ --enable-prefix-caching \ --reasoning-parser qwen3 \ --enable-auto-tool-choice \ --tool-call-parser qwen3_coder \ --served-model-name qwen3.5-9bPoint magic-code at it:
magic-code --base-url http://localhost:8300 --model qwen3.5-9b "your task"Or use LiteLLM as a proxy to switch between self-hosted and cloud models seamlessly.
What's Next
Try It
magic-code is MIT licensed and available on GitHub and crates.io.
We built this because we believe AI coding tools should be open, fast, and runnable on your own hardware. The results aren't perfect — but they're honest, reproducible, and improving with every release.
magic-code is built by kienbui1995. Star the repo if you find it useful. Contributions welcome.
Beta Was this translation helpful? Give feedback.
All reactions