Skip to content
rookiemann edited this page Apr 10, 2026 · 2 revisions

Multi-TurboQuant Wiki

Unified KV cache compression toolkit for LLM inference.

10 compression methods, 16 presets, GPU-validated, cross-platform. Compress KV cache 5-80x to run bigger models, longer context, and more agents on your GPU.

Getting Started

Understanding the Methods

Configuration

  • Configuration — CacheConfig, CacheMethod, all options
  • Presets — 16 named presets for common use cases
  • Calibration — which methods need it, how to generate

Planning & Hardware

Integration

Reference

Credits

Clone this wiki locally