🐍 Kairu

⚡ Real-time inference optimizer for LLMs — faster generation, smarter decoding, and live observability 📊✨

🌊 Meaning

Kairu (流れる) — to flow, to stream.

Inference should be fluid — not blocked by latency, inefficiency, or opaque performance.

🚀 What it is

Kairu wraps any HuggingFace model and adds:

🦅 Speculative decoding (EAGLE-style)
⏩ Dynamic early exit
💸 Token budget enforcement
📊 Live dashboard:
- tokens/sec
- latency
- quality tradeoffs

❗ The problem

Speculative decoding works — but:

locked inside heavy frameworks (vLLM, etc.)
hard to experiment with
no lightweight tooling
no built-in observability

🧠 What you learn

Speculative decoding internals (EAGLE, Medusa)
KV cache management
Streaming inference
Performance optimization

🚀 Quick Start

pip install kairu

from kairu import wrap_model

model = wrap_model("your-model")
model.generate("Hello world")

🎯 Vision

Make LLM inference fast, transparent, and controllable.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github		.github
benchmarks/results		benchmarks/results
kairu		kairu
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
KONJO_PROMPT.md		KONJO_PROMPT.md
PLAN.md		PLAN.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐍 Kairu

🌊 Meaning

🚀 What it is

❗ The problem

🧠 What you learn

🚀 Quick Start

🎯 Vision

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🐍 Kairu

🌊 Meaning

🚀 What it is

❗ The problem

🧠 What you learn

🚀 Quick Start

🎯 Vision

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages