MojoFusedNormRoPESageQuantStore: fused RoPE + KV-Quant with Key per-token Quant operator by NASA1473 · Pull Request #358 · XPU-Forces/mojo_opset

NASA1473 · 2026-06-12T07:04:24Z

Adds MojoFusedNormRoPESageQuantStore, fusing QK-Norm, RoPE, int8 K/V quant, and paged store into one op, with optional SAGE per-token int8 key + scale stored inline via the ixformer rms_norm_sage_qk_rotary_embedding kernel.

Copilot

Pull request overview

Adds a new experimental operator, MojoFusedNormRoPESageQuantStore, intended to fuse QK-RMSNorm + RoPE + static int8 KV quant + paged KV store, with an optional SAGE-style per-token int8 key + per-token scale snapshot (and ixformer fused-kernel support).

Changes:

Introduce MojoFusedNormRoPESageQuantStore torch reference implementation (norm + RoPE + KV static quant + paged store + optional per-token dynamic key quant).
Add ixformer backend implementation using ixformer.functions fused kernels (including the SAGE kernel variant).
Add accuracy and reference-contract tests, and export the operator via mojo_opset.experimental.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
mojo_opset/experimental/operators/fused_norm_rope_sage_quant_store.py	New fused operator (torch reference path) with optional SAGE per-token key quant and optional extra paged-cache stores.
mojo_opset/backends/ixformer/operators/fused_norm_rope_sage_quant_store.py	Ixformer backend implementation calling fused ixformer kernels (SWA stream + full stream with optional SAGE).
mojo_opset/tests/accuracy/operators/test_fused_norm_rope_sage_quant_store.py	New accuracy/reference tests for output contract, quant math, and determinism.
mojo_opset/experimental/operators/init.py	Exports the new operator from the experimental operators package.
mojo_opset/experimental/init.py	Exports the new operator from `mojo_opset.experimental`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        # SAGE: dynamic per-token int8 quant of the full key (over head_dim).
+        if self.enable_sage:
+            self.sage_full_k_quantize = MojoDynamicQuant._registry.get(self._backend)(quant_dtype=quant_dtype)


+        if (
+            self.enable_sage
+            and full_key_pt_int8 is not None
+            and cu_q_lens is not None
+            and sage_full_k_pt_cache is not None
+            and sage_full_k_pt_scale_cache is not None
+        ):


+        if self.head_dim != 128:
+            raise NotImplementedError(
+                f"ixformer fused kernel only supports head_dim=128, got {self.head_dim}"
+            )
+        if not (self.use_query_norm and self.use_key_norm):
+            raise NotImplementedError(
+                f"ixformer fused kernel only supports use_query_norm and use_key_norm, got {self.use_query_norm} and {self.use_key_norm}"
+            )


+        # --- Full stream: all-in-one fused kernel (+ per-token int8 K for SAGE) ---
+        if self.enable_sage:
+            (full_q_out, full_key_q, full_val_q,
+             full_key_pt_int8, full_key_pt_scale) = self._run_sage_stream_update_kv(
+                full_query, full_key, full_value, full_wq, full_wk,
+                full_ks, full_vs,
+                cos, sin, rotary_dim,
+                full_key_cache, full_value_cache,
+                block_tables, cu_q_lens, context_kv_lens, eps,
+                sage_full_k_pt_cache, sage_full_k_pt_scale_cache,
+            )


+@pytest.mark.parametrize("num_heads_swa_q, num_heads_swa_k, num_heads_full_q, num_heads_full_k, head_dim, rope_dim", CONFIGS)
+@pytest.mark.parametrize("batch_size, q_lens_val, context_kv_lens_val", SEQ_CONFIGS)
+@pytest.mark.parametrize("update_kv", [True, False])
+@bypass_not_implemented
+def test_diff_vs_torch_no_sage(
+    num_heads_swa_q, num_heads_swa_k, num_heads_full_q, num_heads_full_k, head_dim, rope_dim,
+    batch_size, q_lens_val, context_kv_lens_val,
+    update_kv,
+):


…p support

…to single_kv_store

new MojoFusedNormRoPESageQuantStore operator

e7284bc

This comment was marked as low quality.

Sign in to view

NASA1473 changed the title ~~Add MojoFusedNormRoPESageQuantStore: fused RoPE + KV-Quant with Key per-token Quant operator~~ MojoFusedNormRoPESageQuantStore: fused RoPE + KV-Quant with Key per-token Quant operator Jun 12, 2026

NASA1473 requested a review from Copilot June 12, 2026 07:52

Copilot started reviewing on behalf of NASA1473 June 12, 2026 07:52 View session

Copilot AI reviewed Jun 12, 2026

View reviewed changes

xudong.zhao and others added 9 commits June 16, 2026 07:27

update fused norm rope sage quant store

4a0e489

update ixformer version for ixf.rms_norm_sage_qk_rotary_embedding tem…

a4e6d28

…p support

store single k or v in cache

223eed3

up ixformer version.

d6bf45f

ixformer loading is more robust.

a1b268c

Merge branch 'dev_m13_ilu' into fused_sage_quant

d6defb1

Merge branch 'single_kv_store' of github.com:XPU-Forces/mojo_opset in…

6f3d7cd

…to single_kv_store

Merge branch 'single_kv_store' into fused_sage_quant

1d1753a

add fp32 kv cache store test

d5dc363

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MojoFusedNormRoPESageQuantStore: fused RoPE + KV-Quant with Key per-token Quant operator#358

MojoFusedNormRoPESageQuantStore: fused RoPE + KV-Quant with Key per-token Quant operator#358
NASA1473 wants to merge 10 commits into
XPU-Forces:dev/m13_ilufrom
NASA1473:fused_sage_quant

NASA1473 commented Jun 12, 2026

Uh oh!

This comment was marked as low quality.

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

NASA1473 commented Jun 12, 2026

Uh oh!

This comment was marked as low quality.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants