Skip to content

10L + Multi-Order N-gram Backoff (0.9123 BPB)#802

Closed
Bortlesboat wants to merge 4 commits intoopenai:mainfrom
Bortlesboat:submission/v6-ngram-backoff
Closed

10L + Multi-Order N-gram Backoff (0.9123 BPB)#802
Bortlesboat wants to merge 4 commits intoopenai:mainfrom
Bortlesboat:submission/v6-ngram-backoff

Conversation

@Bortlesboat
Copy link
Copy Markdown

Record submission

val_bpb: 0.9123 (mean of 3 seeds, post int5/int6+zstd quantization roundtrip)

Seed val_bpb artifact_bytes
42 0.9128 15,320,000
1337 0.9121 15,630,000
2024 0.9121 15,330,000

Architecture

  • 10 layers, d=512, GQA 8H/4KV, LeakyReLU(0.5)^2
  • Partial RoPE (16/64), LN Scale, XSA last 4, Value Residual
  • BigramHash(4096, dim=128), SmearGate, U-Net skips
  • Mixed int5 MLP / int6 attention + zstd-22
  • EMA(0.997), Muon WD=0.04, warmdown=3500

Eval: Multi-Order N-gram Backoff + Entropy-Adaptive Alpha

  • Hashed n-gram cache, orders 2 through 7 with backoff
  • Highest matching order wins (7-gram preferred, falls back to lower)
  • Entropy-adaptive alpha: alpha = 0.05 + 0.55 * sigmoid(2 * (H - 4.0))
  • Score-first: cache updated only AFTER scoring each segment
  • 4M hash buckets per order, min_count=2

Timing (8xH100 SXM)

  • Training: 600s (~6020 steps at 99ms/step)
  • Eval: ~163s (sliding window stride=64, batch_seqs=64)

Based on

Explores stacking eval-time techniques (neural cache, LoRA TTT) and
quantization-aware training on top of the openai#1 recipe. QAT has an export
mismatch bug resulting in high quantization penalty — submitting as
non-record to document the approach for iteration.
Non-record submission. 10 layers, d=512, GQA 8H/4KV, mixed int5/int6
quantization + zstd-22. BigramHash(4096, dim=128), SmearGate, SWA(0.4).
Mean of 3 seeds: 1.1507 +/- 0.0006 BPB. All artifacts under 16MB.
10L d=512, GQA 8H/4KV, LeakyReLU(0.5)^2, Partial RoPE, LN Scale,
XSA last 4, Value Residual, EMA(0.997). Mixed int5/int6 + zstd-22.
Eval: multi-order hashed n-gram backoff (orders 2-7) with entropy-
adaptive alpha. Mean of 3 seeds: 0.9123 +/- 0.0003 BPB.
Renamed to reflect actual technique (n-gram backoff + entropy alpha).
Removed old 1.1507 BPB seed logs. Added explicit compliance/legality
section per competition conventions.
bigbag pushed a commit to bigbag/parameter-golf that referenced this pull request Mar 26, 2026
Single change from PR openai#802: MATRIX_LR=0.03 (was 0.02).
Discovered through systematic screening (74 experiments, steps 10-12).

- 10L, 512d, GQA 8/4, LeakyReLU(0.5)², BigramHash 4096
- Multi-order n-gram backoff eval cache (orders 2-7)
- Entropy-adaptive alpha mixing (score-first, legal)
- 8xH100 SXM, 600s training, 138s eval
- Artifact: 15.32 MB

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Bortlesboat
Copy link
Copy Markdown
Author

Superseded by PR #876 (0.5863 BPB) and PR #912 (0.3461 BPB with PPM full-rescore).

@Bortlesboat Bortlesboat reopened this Mar 27, 2026
@Bortlesboat
Copy link
Copy Markdown
Author

Closing - uses n-gram backoff which is out of scope for this track.

@Bortlesboat Bortlesboat closed this Apr 4, 2026
taka6745 pushed a commit to taka6745/parameter-golf that referenced this pull request Apr 8, 2026
…s 2007)

THE biggest legal technique gap after LEGAL_TTT. Top 30 legal PRs in COMPETITION_SCOPE.md
all use multi-order n-gram backoff (openai#788/openai#802/openai#828/openai#761 = 0.91-0.96 BPB).

Implementation: at each position, use the HIGHEST-CONFIDENCE n-gram order ONLY:
- if peak(4-gram[h]) > T4: use 4-gram with weight 1.0
- elif peak(3-gram[h]) > T3: use 3-gram with weight α=0.4 (Brants 2007)
- else: use bigram with weight α²=0.16
The 'peak' = max log-prob across vocab — concentrated distributions = confident counts.
Hash-collision noise in lower orders is stripped by using only the most-confident order.

Marker: NGRAM_BACKOFF_MARKER. Env: USE_NGRAM_BACKOFF=1, NGRAM_BACKOFF_THRESH4=1.0,
NGRAM_BACKOFF_THRESH3=1.0, NGRAM_BACKOFF_ALPHA=0.4. Composes with NGRAM_GATE.

Smoke test in /tmp passes: marker present in patched file, syntax-valid Python.
EXPECTED_MARKERS now 46 (was 45).

Queued L09_ngram_backoff_S2_seed42/seed1337 on Pod C for n=2 cheap-pod validation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant