nanoserve

nanoserve is a local LLM serving engine for Apple Silicon.

What it does

Serves an OpenAI-compatible chat API
Supports continuous batching with FCFS and synchronized admission
Reuses KV cache prefixes when prompts overlap
Includes fp16, INT8, INT4, and MLX paths
Exposes Prometheus metrics and a Grafana dashboard
Ships benchmark and evaluation harnesses

Architecture

flowchart LR
    A[Chat request] --> B[Scheduler]
    B --> C[Batch builder]
    C --> D[Model engine]
    D --> E[Streaming response]
    D --> F[(Metrics / eval artifacts)]

What’s included

API server
Scheduler and engine
Prefix cache
Quantization paths
Metrics and ops dashboards
Benchmark and eval scripts

Quick start

make dev-install
make models
make baseline-hf
make parity
make serve
make observe
make eval

Notes

The project is built around local MPS inference on Mac hardware.
Continuous batching helps when the workload and admission policy line up.
Quantization is useful when the runtime has native support for it; on MPS that depends on the path.

Portfolio Proof

Architecture and evaluation: docs/PORTFOLIO_PROOF.md
Verified metrics: results/ablations.csv and results/eval.csv
Benchmark matrix: docs/BENCHMARK_MATRIX.md
Demo and local mode: use the make commands above
Test commands: pytest, python -m compileall src

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
deploy		deploy
docs		docs
examples		examples
ops		ops
prompts		prompts
results		results
scripts		scripts
src/nanoserve		src/nanoserve
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nanoserve

What it does

Architecture

What’s included

Quick start

Notes

Portfolio Proof

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

nanoserve

What it does

Architecture

What’s included

Quick start

Notes

Portfolio Proof

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages