Skip to content

Experiment: SmearGate BOS Fix + train-only logit calibration#1884

Open
someone114514 wants to merge 2 commits intoopenai:mainfrom
someone114514:smeargate-calibration-1868
Open

Experiment: SmearGate BOS Fix + train-only logit calibration#1884
someone114514 wants to merge 2 commits intoopenai:mainfrom
someone114514:smeargate-calibration-1868

Conversation

@someone114514
Copy link
Copy Markdown

Summary

Experimental variant of #1868 / #1851 SmearGate BOS Fix that adds a fixed post-GPTQ logit calibration pass.

The added calibration is deliberately small and train-only:

  • global logit temperature
  • coarse token-group bias buckets: byte length, starts-with-space, newline, digit, punctuation, alpha/case
  • no validation-derived fitting state
  • no frequency buckets by default
  • frozen after fitting, then applied before softmax in quantized diagnostic eval and phased score-first TTT

This is intended as a direct test of whether the post-GPTQ calibration signal observed locally transfers to the stronger #1868 stack.

Controls

Defaults added in this branch:

LOGIT_CALIB_ENABLED=1
LOGIT_CALIB_TOKENS=100000
LOGIT_CALIB_STRIDE=64
LOGIT_CALIB_BATCH_SEQS=8
LOGIT_CALIB_LR=0.003
LOGIT_CALIB_L2=0.01
LOGIT_CALIB_EPOCHS=1
LOGIT_CALIB_APPLY_TTT_UPDATE=1

Set LOGIT_CALIB_ENABLED=0 to recover the original #1868 behavior.

Legality / causality

Calibration is fitted only from training-token shards after GPTQ. It does not read validation targets or build validation-time state. At validation time the correction is a fixed affine transformation of logits before normal softmax, so the distribution remains normalized.

Status

No new 8xH100 score yet. This branch is prepared for a direct single-seed run against the #1868 reproduction command.

3-seed reproduction of PR openai#1851 (SmearGate BOS document boundary fix).
Code is byte-identical to openai#1851 by @aquariouseworkman.

Results (post-TTT BPB):
  Seed 42:   1.06128  (original openai#1851 author)
  Seed 314:  1.06087  (this submission)
  Seed 1234: 1.06220  (this submission)
  Mean:      1.06145 ± 0.00068

All artifacts < 16,000,000 bytes. All runs < 600s.

Co-authored-by: Copilot <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants