Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257) by dannywillowliu-uchi · Pull Request #379 · openai/parameter-golf

dannywillowliu-uchi · 2026-03-22T00:08:47Z

Summary

val_bpb: 1.1257 (sliding window, stride=64) | 8xH100 SXM, 600s

Built on PR #374's SOTA stack with GPTQ-lite: per-layer optimal clip percentile search during int6 quantization.

Novel: GPTQ-lite

Standard int6 quantization uses row-wise absolute max for clipping. GPTQ-lite searches 5 clip percentiles per weight matrix (100%, 99.9%, 99.5%, 99%, 98%) and selects the one minimizing reconstruction error. This reduces quantization degradation at zero training cost.

Metric	Value
Steps	6,733 (89.1ms/step)
Pre-quant val_bpb	1.1417
Sliding window val_bpb (s64)	1.1257

Architecture: 11L, XSA4, Tight SWA, Partial RoPE 16/64, LN Scale, Late QAT, Value Embedding, SmearGate, BigramHash, FA3, int6+zstd-22, WD=0.04.

Full source and experiment history: https://github.com/dannywillowliu-uchi/parameter-golf-entry

From arXiv:2603.09078. Projects out the self-value component from attention output, forcing the network to use contextual information. Applied via GQA-aware zero-alloc view reshape on last 4 of 11 layers. Both top unmerged submissions (PR openai#374 at 1.1246 and PR openai#379 at 1.1260) use XSA as a key technique. Full next-gen stack now includes: 11L, XSA, Partial RoPE 16/64, Late QAT STE, Tight SWA, GPTQ-lite, LN Scale, FA3, SmearGate, BigramHash, int6+zstd, Muon WD, OrthoInit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

original_model.md: - Discard depth recurrence (amplifies quant error 900×, throughput loss) - New direction: eval-time optimization stack (PPM-C + GPTQ-lite) - Document all our experiment results (v3, v4, v4_30m, ringgolf) - Add TTT/XSA interaction findings (PR openai#303: mutually exclusive) - Add PR openai#375 meta-insight (1ms overhead = 0.006 BPB) - 4-phase execution plan targeting PPM-C as original contribution review_pr_records_track_10min_16mb.md: - Add 2026-03-22 update with PRs openai#374, openai#379, openai#390, openai#375, openai#303, openai#363 - New SOTA at 1.1246 (PR openai#374: Tight SWA + VE128) - Document negative results from $500 compute spend (PR openai#375) - Unexplored opportunities: PPM-C, Neural Cache review_records_track_10min_16mb.md: - Add timestamp note (17 records, no changes) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MatoTeziTanka · 2026-04-12T14:17:39Z

Community Review — Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257)

Compliance: HOLD — scored-region SLOT pending Issue #1336

Head SHA: bcd61a1
PR: #379 — "11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257)" by @dannywillowliu-uchi
Author: Danny Willow Liu

Check 1: N-gram Family Bug (CLOSE trigger: target token in hash key)

CLEAN. BigramHashEmbedding.bigram_hash() at line 753:

out[..., 1:] = torch.bitwise_xor(36313 * t[..., 1:], 27191 * t[..., :-1]) % mod

Input to forward() is input_ids = the input sequence x, not targets. Position i hashes (x[i], x[i-1]) — both context tokens, no target token in the key. This is standard BigramHash, explicitly noted as legal. NOT the disqualifying bug.

Check 2: Pre-Quant TTT (CLOSE trigger: multi-epoch AdamW on val_tokens without score-first)

CLEAN on strict criteria. sdttt_adapt() at line 1223 uses torch.optim.SGD, not AdamW. The CLOSE trigger requires AdamW specifically. However, this function does run 2 epochs over val_tokens computing CE loss on targets without score-first gating. The SGD distinction is narrow — the semantic violation (adapt on val targets pre-scoring) is present, but the exact CLOSE criterion (AdamW) is not met. Flagged for reviewer judgment but not auto-CLOSE.

Note: SDTTT was negative (-0.0003 bpb) and is disabled by default (SDTTT_ENABLED=0). The submission score was achieved without SDTTT active.

Check 3: Legal TTT / Score-First Per Chunk

CLEAN. eval_bpb_sliding_window() at lines 1104-1107:

s = 0 if ws == 0 else max(wlen - stride, 0)

**Verdict:** HOLD — the scored-region eval pattern needs a ruling from maintainers on Issue #1336 before this can be cleared. No other compliance flags found.

**Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica:** **HOLD** pending Issue #1336 ruling on scored-region SLOT.

---
*Reviewed by [@MatoTeziTanka](https://github.com/MatoTeziTanka) — [The Agora](https://matotezitanka.github.io/parameter-golf). Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source. If this review misread your code, please call it out so I can re-audit manually.*

MatoTeziTanka · 2026-04-12T14:18:24Z

PR 379: SDTTT + GPTQ-lite Int6

Review Summary

PR Title: Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257)
Status: OPEN | No reviewer comments
Train File: records/track_10min_16mb/2026-03-21_SDTTT_GPTQ_11L_Int6_MLP3x/train_gpt.py
Classification: PURE_NEURAL_CLEAN

Red Flag Analysis

Signal	Finding
target-in-key loss	CLEAN - Standard BPB metric
TTT/SLOT classes	CLEAN - No TTT logic in architecture
Custom tokenizer	CLEAN - Standard SentencePiece
loss[] indexing	CLEAN - No custom loss dict access

Findings

SDTTT reference: Directory name mentions SDTTT, but training uses standard GPTQ-lite (post-training quantization), not TTT architecture
Architecture: 11L transformer with 3x MLP expansion
Quantization: GPTQ-lite + Int6 (post-training, not QAT)
Result: val_bpb=1.1257 (record-track candidate)
Techniques: Standard quantization pipeline combined with architectural optimization

Recommendation

MERGE - GPTQ-lite is established post-training quantization method. No loss manipulation. 3x MLP is standard parameter reallocation. Clean submission.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE-pending standard record-track checks.

Reviewed by @MatoTeziTanka — The Agora. Classification via sibling-session agent (Haiku-backed). This review was drafted from a template and spot-checked before posting — if the template misread your code, please call it out so I can iterate the classifier.

Record: 11L GPTQ-lite + Self-Distillation TTT (val_bpb=1.1260)

bb6b90b

notapplica mentioned this pull request Mar 22, 2026

Parameter Golf Formerly Live AI Commentary ⛳ + Analysis / Ideas | every 10 minutes. Now disabled #140

Closed

Update: GPTQ-lite only, drop SDTTT (val_bpb=1.1257)

bcd61a1

dannywillowliu-uchi changed the title ~~Record: 11L GPTQ-lite + Self-Distillation TTT (val_bpb=1.1260)~~ Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257) Mar 22, 2026

dentity007 mentioned this pull request Mar 22, 2026

Non-record: 11L XSA4 + EMA + SDTTT (3-seed mean val_bpb=1.1287) #406

Open

5 tasks

This was referenced Mar 23, 2026

Non-record: 11L Partial RoPE + XSA4 + VE128 + Tight SWA + GPTQ-lite (val_bpb=1.1804) #534

Closed

Non-record: 11L Partial RoPE + XSA4 + VE128 + Tight SWA + GPTQ-lite (val_bpb=1.1804) #543

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257)#379

Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257)#379
dannywillowliu-uchi wants to merge 2 commits intoopenai:mainfrom
dannywillowliu-uchi:submission/sdttt-gptq-1.1260

dannywillowliu-uchi commented Mar 22, 2026 •

edited

Loading

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dannywillowliu-uchi commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Novel: GPTQ-lite

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Community Review — Record: 11L GPTQ-lite + Int6 MLP3x (val_bpb=1.1257)

Check 1: N-gram Family Bug (CLOSE trigger: target token in hash key)

Check 2: Pre-Quant TTT (CLOSE trigger: multi-epoch AdamW on val_tokens without score-first)

Check 3: Legal TTT / Score-First Per Chunk

Uh oh!

MatoTeziTanka commented Apr 12, 2026

PR 379: SDTTT + GPTQ-lite Int6

Review Summary

Red Flag Analysis

Findings

Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dannywillowliu-uchi commented Mar 22, 2026 •

edited

Loading