Compression Methods

Method comparison

Method	Family	Bits	Compression	Calibration	Speed Impact	Best For
turbo2	TurboQuant	2.25	7.1x	Required	-16% decode	Maximum VRAM savings
turbo3	TurboQuant	3.25	4.9x	Required	-5% decode	Balanced compression
turbo4	TurboQuant	4.25	3.8x	Required	-4% decode	Near-lossless quality
turbo2_tcq	TCQ	2.25	7.1x	Required	-16% decode	Max savings + better quality
turbo3_tcq	TCQ	3.25	4.9x	Required	-5% decode	Best quality at 5x
iso3	IsoQuant	3.25	4.9x	No	~0% decode	K-only, zero speed cost
iso4	IsoQuant	4.25	3.8x	No	~0% decode	Higher quality K-only
planar3	PlanarQuant	3.25	4.9x	No	-1% decode	Simplest, Metal support
planar4	PlanarQuant	4.25	3.8x	No	~0% decode	Quality K-only
triattention	TriAttention	16	10-16x	Required	Varies	Long reasoning, compose with above

Choosing a method

If you want zero setup: Use iso3 or planar3. No calibration files needed, no speed penalty in K-only mode.

If you want maximum quality: Use turbo4 symmetric. Near-lossless at 3.8x compression. Requires calibration.

If you want maximum VRAM savings: Use turbo2_tcq symmetric (7.1x) or combine any method with TriAttention for 40-80x total.

If you're on AMD or Mac: Use iso3 or planar3. TurboQuant requires CUDA flash attention kernels.

If you want speed: Use iso3 K-only. Your benchmarks showed it can actually beat FP16 decode speed because the reduced memory bandwidth outweighs the rotation cost.

Asymmetric configurations

You can use different methods for K and V caches. This is useful because:

K cache benefits more from compression (attention score computation is bandwidth-bound)
V cache quality matters more for output quality (weighted sum of values)

Common asymmetric configs:

# K compressed, V full precision -- zero speed cost
CacheConfig(k_method=CacheMethod.ISO3, v_method=CacheMethod.FP16)

# K at higher compression, V at lower
CacheConfig(k_method=CacheMethod.TURBO3, v_method=CacheMethod.TURBO4)

Getting Started

Methods

Configuration

Planning

Integration

Reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression Methods

Compression Methods

Method comparison

Choosing a method

Asymmetric configurations

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally