Skip to content

konjoai/kairu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🐍 Kairu

Language License Status

⚑ Real-time inference optimizer for LLMs β€” faster generation, smarter decoding, and live observability πŸ“Šβœ¨


🌊 Meaning

Kairu (ζ΅γ‚Œγ‚‹) β€” to flow, to stream.

Inference should be fluid β€” not blocked by latency, inefficiency, or opaque performance.


πŸš€ What it is

Kairu wraps any HuggingFace model and adds:

  • πŸ¦… Speculative decoding (EAGLE-style)

  • ⏩ Dynamic early exit

  • πŸ’Έ Token budget enforcement

  • πŸ“Š Live dashboard:

    • tokens/sec
    • latency
    • quality tradeoffs

❗ The problem

Speculative decoding works β€” but:

  • locked inside heavy frameworks (vLLM, etc.)
  • hard to experiment with
  • no lightweight tooling
  • no built-in observability

🧠 What you learn

  • Speculative decoding internals (EAGLE, Medusa)
  • KV cache management
  • Streaming inference
  • Performance optimization

πŸš€ Quick Start

pip install kairu
from kairu import wrap_model

model = wrap_model("your-model")
model.generate("Hello world")

🎯 Vision

Make LLM inference fast, transparent, and controllable.

About

🐍 Kairu β€” Speculative decoding engine for HuggingFace models ⚑. Adds EAGLE-style drafting πŸ¦…, dynamic early exit ⏩, and token budget control πŸ’Έ with a live performance dashboard πŸ“Šβœ¨

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages