openai · X-Abhishek-X · Apr 17, 2026 · Apr 17, 2026 · Apr 20, 2026 · Apr 20, 2026
diff --git a/records/track_10min_16mb/2026-04-17_Stage3_SpinQuant_MPSGDTTT_1.0759/README.md b/records/track_10min_16mb/2026-04-17_Stage3_SpinQuant_MPSGDTTT_1.0759/README.md
@@ -0,0 +1,54 @@
+# Stage 3 + SpinQuant V1 + MP-SGD-TTT
+
+## Score: mean val_bpb = 1.07590 (3 seeds: 1.07591, 1.07609, 1.07570)
+
+Trained on 8×H100 80GB SXM in 587 seconds. Artifact ~15.73 MB (INT6 + brotli).
+
+## Approach
+
+Two techniques stacked on the Stage 3 depth-recurrence base (PR #1445):
+
+### 1. SpinQuant V1 — Hadamard Pre-Rotation Before GPTQ
+
+Pre-multiplies Q, K, V weight matrices with a random Hadamard matrix `R` before INT6 GPTQ quantization, spreading weight outliers uniformly across all dimensions. This reduces the quantization error for the most outlier-heavy attention projections.
+
+- `R` is generated deterministically from a SHA-256-derived seed and stored as `persistent=False` buffer — **zero serialized bytes added to the artifact**
+- At eval time, `F.linear(x @ R, W_rot)` is equivalent to `F.linear(x, W)` (verified: max relative error < 1e-4)
+- Hessian transform: `H_rot = R^T H R` applied before GPTQ for correct calibration in the rotated frame
+- Quantization penalty: +0.012–0.013 BPB vs pre-quant baseline (suppressed by MP-SGD-TTT)
+
+### 2. MP-SGD-TTT — Multi-Phase Global SGD Test-Time Training
+
+Score-first causal TTT from PR #1626 (dexhunter). Three SGD phases over the validation stream:
+- Each phase processes the already-scored prefix of documents
+- Base model weights updated (not just LoRA) via momentum SGD
+- Config: `prefix_docs=2000`, `num_phases=3`, `lr=0.001`, `momentum=0.9`
+- BPB accumulated under `torch.no_grad()` before any gradient update on each chunk
+
+## Results
+
+| Seed | Pre-quant BPB | Post-quant BPB | TTT BPB | Artifact Size |
+|------|:---:|:---:|:---:|:---:|
+| 42   | 1.07288 | 1.08544 | **1.07591** | 15,728,308 B |
+| 1337 | 1.07306 | 1.08584 | **1.07609** | 15,726,192 B |
+| 2024 | 1.07273 | 1.08521 | **1.07570** | 15,727,886 B |
+| **Mean** | | | **1.07590** | 15,727,462 B |
+| **Std** | | | **0.00019** | |
+
+All artifacts well under 16,000,000 bytes (decimal).
+
+## Training Config
+
+```
+ITERATIONS=20000, MATRIX_LR=0.026, WARMDOWN_FRAC=0.75
+MLP_CLIP_SIGMAS=12.0, ATTN_CLIP_SIGMAS=13.0, EMBED_CLIP_SIGMAS=20.0
+EMBED_BITS=7, SPINQUANT_ENABLED=1, SPINQUANT_SEED=20260416
+TTT_CHUNK_SIZE=48, TTT_LORA_LAYER_LR_ALPHA=0.5, LORA_PLUS_RATIO=1.0
+```
+
+## Attribution
+
+- Stage 3 architecture: PR #1445 (X-Abhishek-X)
+- MP-SGD-TTT: PR #1626 (dexhunter)
+- SP8192 tokenizer: PR #78 (mtybadger)
+- SpinQuant: Liu et al., Meta AI 2024 (arXiv:2405.16406)
diff --git a/records/track_10min_16mb/2026-04-17_Stage3_SpinQuant_MPSGDTTT_1.0759/submission.json b/records/track_10min_16mb/2026-04-17_Stage3_SpinQuant_MPSGDTTT_1.0759/submission.json
@@ -0,0 +1,46 @@
+{
+  "author": "Abhishek Leji",
+  "github_id": "X-Abhishek-X",
+  "track": "10min_16mb",
+  "name": "Stage 3 + SpinQuant V1 + MP-SGD-TTT",
+  "blurb": "First port of SpinQuant V1 (Hadamard pre-rotation of Q/K/V weights before INT6 GPTQ) onto the Stage 3 depth-recurrence architecture, composed with Multi-Phase Global SGD TTT from PR #1626. SpinQuant spreads weight outliers uniformly via a random Hadamard matrix R stored as a non-serialized buffer (zero artifact overhead). TTT config: prefix_docs=2000, num_phases=3, lr=0.001, momentum=0.9.",
+  "date": "2026-04-17",
+  "val_loss": 2.77916130,
+  "val_bpb": 1.07590,
+  "val_bpb_std": 0.00019,
+  "n_seeds": 3,
+  "seeds": [42, 1337, 2024],
+  "seed_results": {
+    "42":   {"val_loss": 2.77918734, "val_bpb": 1.07591003, "artifact_bytes": 15728308},
+    "1337": {"val_loss": 2.77964689, "val_bpb": 1.07608793, "artifact_bytes": 15726192},
+    "2024": {"val_loss": 2.77864967, "val_bpb": 1.07570188, "artifact_bytes": 15727886}
+  },
+  "pre_quant_val_bpb": 1.07289,
+  "bytes_total": 15727462,
+  "bytes_model_brotli": 15695732,
+  "bytes_code": 31730,
+  "model_params": 35944602,
+  "vocab_size": 8192,
+  "hardware": "8xH100 80GB SXM",
+  "train_time_seconds": 587,
+  "eval_time_seconds": 478,
+  "step_avg_ms": 98,
+  "train_steps_mean": 4860,
+  "matrix_lr": 0.026,
+  "compliance": {
+    "no_eval_time_gradient_updates": true,
+    "score_first_ttt": true,
+    "artifact_under_16mb": true,
+    "artifact_bytes_decimal_check": true,
+    "benchmark_script_unmodified": true,
+    "no_test_data_access": true,
+    "deterministic_predictor": true,
+    "spinquant_zero_serialized_bytes": true
+  },
+  "attribution": [
+    {"technique": "Stage 3 depth-recurrence architecture", "source": "PR #1445 (X-Abhishek-X)"},
+    {"technique": "MP-SGD Multi-Phase Global SGD TTT", "source": "PR #1626 (dexhunter)"},
+    {"technique": "SP8192 tokenizer", "source": "PR #78 (mtybadger)"},
+    {"technique": "SpinQuant V1 Hadamard rotation", "source": "Liu et al., Meta AI 2024 (arxiv 2405.16406)"}
+  ]
+}