Add 2026-04-23_SP8192_PerLayerClip_UnfrozenTTT_1.0849#1794
Open
Programmerryoki wants to merge 1 commit intoopenai:mainfrom
Open
Add 2026-04-23_SP8192_PerLayerClip_UnfrozenTTT_1.0849#1794Programmerryoki wants to merge 1 commit intoopenai:mainfrom
Programmerryoki wants to merge 1 commit intoopenai:mainfrom
Conversation
3-seed mean 1.0849 BPB (std 0.00022) on seeds 4/30/2026. 8xH100 SXM. Training 600s + eval 600s, all artifacts <= 16,000,000 bytes. Legal under Issue openai#1017 Conditions 1-4. Contributions vs PR openai#1735 base: - Per-layer GPTQ clip sigmas (MLP=12.0, attn=13.5, emb=15.0) - Unfrozen score-first TTT (TTT_FREEZE_BLOCKS=0, TTT_LR=0.010, TTT_EPOCHS=5) - Eval wall-clock budget guard that truncates TTT adaptation at the cosine-LR tail when approaching 600s cap; scoring continues for every remaining chunk (legality preserved). Other techniques (cited inline in README): SpinQuant (PR openai#1695), int7 token embedding, attention output gate (PR openai#1667), score-first TTT pattern (PR openai#549).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New submission at
records/track_10min_16mb/2026-04-23_SP8192_PerLayerClip_UnfrozenTTT_1.0849/.3-seed mean: 1.0849 BPB (std 0.00022) on seeds 4 / 30 / 2026. 8xH100 SXM. 600s train + 600s eval. All artifacts <= 16,000,000 bytes. Legal under Issue #1017 Conditions 1-4.
Merged SOTA (PR #1493 @bigbag): 1.0810 BPB. This submission lands at +0.0039 vs merged SOTA — not a new record, contributed for the two new primitives and a recipe the community can sweep from.
Contributions vs PR #1735 base
TTT_FREEZE_BLOCKS=0 TTT_LR=0.010 TTT_EPOCHS=5— all non-embedding blocks adapt, with LR tuned higher than the usual 0.005.MAX_EVAL_SECONDS=600. Scoring continues for every remaining chunk (legality preserved). Decision rank-synced viadist.all_reduce(MAX)to keep NCCL in lockstep.Inline citations
Base architecture from PR #1735 (@Grad62304977). SpinQuant (PR #1695 @dexhunter). GPTQ + SDClip (PR #1394 @clarkkev). Attention output gate (PR #1667 @Grad62304977). Score-first TTT pattern (PR #549 @abaybektursun). Full citation list in
README.md.Test plan
eval:totalreported in each log, ranges 553.2s–557.4s)eval_val_ttt: each val chunk scored undertorch.no_grad()before any SGD step touches the weightstrain_seed{4,30,2026}.log;quantized_ttt val_bpb:is the reported BPB