Studiomc

Private AI that runs on your machine. No cloud. No account. No compromise.

Studiomc is a desktop AI assistant that runs large language models entirely on your hardware. It auto-detects your system, recommends the best model, and gives you a ChatGPT-quality experience — fully offline, fully private.

Why Studiomc

Problem	Studiomc
Local AI tools feel like dev tools	Polished UI — install, click, chat
Users pick the wrong model and get frustrated	Hardware scan + automatic model recommendation
Document Q&A is slow and uncited	CLaRa compression-native retrieval with citations
No way to know if a model will run well	Speed ratings and performance predictions in plain English
Multiple backends, multiple interfaces	One unified interface across Ollama, LM Studio, and frontier APIs

Features

Core

One-click install — Working chat in under 2 minutes
Autopilot model selection — Scans your hardware, picks the best model automatically
Multi-backend — Auto-detects Ollama and LM Studio, connects frontier APIs (OpenAI, Anthropic)
Chat — Streaming responses, conversation history, branching, memory
Local OpenAI-compatible API — Integrate with any tool that speaks OpenAI
Privacy-first — Everything runs locally. No telemetry. No accounts. No cloud unless you explicitly opt in.

Document intelligence (RAG + CLaRa)

Docs mode — Upload PDF/TXT/MD, ask questions, get cited answers grounded in your documents
CLaRa — Compression-native retrieval: semantic embeddings (sentence-transformers or TF-IDF fallback), per-collection indexes, top-k retrieval with citations (p95 ≤150 ms)
RAG pipeline — Extract → chunk (500–1000 tokens, overlap) → index → retrieve → generate with source citations

Reasoning & orchestration

Recursive reasoning loop — Plan → tool → observe → answer; supports cited (CLaRa), fast, and investigate modes
LRE (Local Reasoning Environment) — Safe tool layer for the loop: search, grep, open, summarize, table_extract, cite; sandboxed with call budgets
Investigate mode — Full reasoning trace visibility: tool calls, retrieved chunks, and final answer in one view

Inference

SpliceLLM — Our built-in out-of-core engine: run models of any size by streaming layers from disk one at a time. Model splitter turns HuggingFace checkpoints into per-layer safetensors; only one layer in memory at a time. Prefetch overlaps I/O with compute. Enables large models on limited VRAM/RAM (e.g. 70B on 4GB with clear “slow mode” expectations)
Multi-backend inference — Bundled engine, Ollama, LM Studio, or frontier APIs; router picks the right backend per model
Performance dashboard — Speed rating (Fast/OK/Slow), throughput, system metrics

Quick Start

No prerequisites. No Ollama. No Python. Just install and chat.

Install from Release

Download the latest .dmg from Releases
Open the DMG and drag Studiomc to Applications
Launch Studiomc — it scans your hardware and recommends a model
The model downloads from HuggingFace automatically
You're chatting in under 2 minutes

Optional: Ollama / LM Studio

If you already have Ollama or LM Studio installed, Studiomc auto-detects them and adds their models to your model list. No configuration needed.

Build from Source

# Clone
git clone https://github.com/mchawda/studiomc.git
cd studiomc

# Build macOS app + DMG
./scripts/build-macos.sh

# Output: dist/Studiomc-<version>-macos.dmg

Requires: Flutter SDK (stable channel)

Architecture

Flutter App ── HTTP/WS ──▶ Local Supervisor
                              ├── Inference Router
                              │     ├── Ollama (auto-detected)
                              │     ├── LM Studio (auto-detected)
                              │     ├── SpliceLLM (built-in, out-of-core)
                              │     └── Frontier APIs (optional)
                              ├── Model Manager
                              ├── Document Service (extract, chunk, store)
                              ├── CLaRa (compression-native retrieval + cited answer)
                              ├── Orchestrator (recursive reasoning loop: plan → tool → answer)
                              └── LRE (Local Reasoning Environment — tools for the loop)

Frontend: Flutter (Dart) — macOS, Windows, iOS, Android
Inference: SpliceLLM (layer streaming from disk), optional Ollama / LM Studio / frontier API
Models: HuggingFace GGUF or safetensors; splitter produces per-layer files for out-of-core
Backend: Python FastAPI — documents, CLaRa RAG, orchestrator, LRE
Storage: SQLite + filesystem

Project Structure

studiomc/
  studiomc_app/           # Flutter app
    lib/
      screens/            # Chat, Models, Documents, Settings, Training, etc.
      widgets/            # Reusable UI components
      services/           # API clients, inference, orchestrator, settings
      models/             # Data models
  services/               # Python backend
    supervisor/           # Process manager
    inference/            # Out-of-core engine, splitter, router (Ollama/LM Studio/frontier)
    model_manager/       # Model downloads, registry, autopilot
    documents/           # Document extraction, chunking, storage
    clara/                # CLaRa — compression-native retrieval + cited answer
    lre/                  # LRE — tools for orchestrator (search, grep, summarize, etc.)
    orchestrator/         # Recursive reasoning loop (plan → tool → observe → answer)
    training/             # Private training / personalization
  scripts/                # Build & release tooling
  product/                # Product specs & design docs

Development Status

The project follows four release phases. Phase 1 (core chat, models, docs, CLaRa, SpliceLLM) is mostly complete. Phases 2–4 (orchestrator/LRE, Personalize wizard, investigate mode) are in progress. See product/product-roadmap.md for details.

Performance Targets

Metric	Target
Install to first chat	≤ 2 min
First token latency	≤ 2.5s (recommended models)
Throughput	≥ 10 tok/s GPU, ≥ 4 tok/s CPU
Document retrieval	p95 ≤ 150ms

Contributing

Contributions welcome. Please open an issue first to discuss what you'd like to change.

Credits

Built on the shoulders of:

AirLLM — Inspired SpliceLLM’s out-of-core (layer-streaming) approach
Ollama — Local model runtime
Flutter — Cross-platform UI

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github		.github
LICENSES		LICENSES
scripts		scripts
services		services
studiomc_app		studiomc_app
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
todo.md		todo.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Studiomc

Why Studiomc

Features

Core

Document intelligence (RAG + CLaRa)

Reasoning & orchestration

Inference

Quick Start

Install from Release

Optional: Ollama / LM Studio

Build from Source

Architecture

Project Structure

Development Status

Performance Targets

Contributing

Credits

License

About

Uh oh!

Releases 14

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Studiomc

Why Studiomc

Features

Core

Document intelligence (RAG + CLaRa)

Reasoning & orchestration

Inference

Quick Start

Install from Release

Optional: Ollama / LM Studio

Build from Source

Architecture

Project Structure

Development Status

Performance Targets

Contributing

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages