Skip to content

Non-record: Causal N-gram Logit Blend — Legal, Bug-Free, Null Result at Scale#1642

Open
himanshudongre wants to merge 1 commit intoopenai:mainfrom
himanshudongre:nonrecord/causal-ngram-null-result
Open

Non-record: Causal N-gram Logit Blend — Legal, Bug-Free, Null Result at Scale#1642
himanshudongre wants to merge 1 commit intoopenai:mainfrom
himanshudongre:nonrecord/causal-ngram-null-result

Conversation

@himanshudongre
Copy link
Copy Markdown

@himanshudongre himanshudongre commented Apr 15, 2026

Summary

This is a rigorous negative result paired with the first clean legal reference implementation of an eval-time causal n-gram additive-logit blend — the technique every closed ngram-titled record PR in this repo was trying to implement.

Why this is useful

  1. Every previous n-gram PR was closed for a specific C1/C2/C3/C4 violation per Issue A Field Guide to Valid Submissions #1017. This one passes all four. The automated legality harness is reusable for future eval-time adaptation techniques.
  2. Saves other participants from running the same experiment and submitting the same variant. The scaling trend is monotonic and leaves no room for a "just train slightly longer" rescue.
  3. Provides a per-bucket delta decomposition tool (code/localized_delta.py) that can be applied to any Track B technique to verify where its marginal gain comes from.

Legality

This is a non-record research submission. It does not claim a leaderboard position. Code, results, and logs are provided for reproduction of the reported negative result. See README.md for the full per-condition compliance table and the 12/12 passing automated tests (8 legality + 4 integration) on both CPU and CUDA.

No network calls. N-gram state is built entirely from already-scored eval tokens per Track B semantics. Single left-to-right pass. Score-before-update discipline enforced at chunk boundaries.

Test plan

  • python3 code/legality_harness.py → 8/8 PASS (CPU + CUDA on A40)
  • python3 code/test_integration.py → 4/4 PASS (CPU + CUDA on A40)
  • Phase 1-A: 4L 256d sp8192 2000 steps alpha sweep → peak Δ 0.00223 BPB at α=0.10 (logs in training_logs/)
  • Phase 1-B: 4L 256d sp8192 4000 steps α=0.05, 0.10 → peak 0.00018 BPB, hurts at α≥0.10
  • Full reproduction instructions in README.md

Happy to ping the specific closed PRs (#993, #1026, #1185) to point to this as the legal reference after it lands, if that's useful.

…at Scale

Rigorous negative result demonstrating that a legal causal n-gram additive-logit
blend does not scale to strong models, paired with the first clean reference
implementation verified against all three valerio-oai closure rulings (openai#993 hashed
caches, openai#1185 full-vocab renormalization, openai#959 two-pass rescoring).

Includes:
- 8-probe automated legality harness + 4-test integration suite
- Scaling curve across 6 model configurations (2L/4L, 128d/256d, 800-4000 steps,
  sp1024/sp8192) showing peak BPB improvement collapses from 0.0515 (weak baseline)
  to 0.00018 (strongest model) — well below the 0.0072 BPB record threshold
- Localized delta decomposition showing 100% of the gain comes from out-of-
  attention-window cache hits, and why the sp1024 → sp8192 transition erodes even
  that architectural floor
@himanshudongre himanshudongre force-pushed the nonrecord/causal-ngram-null-result branch from 51d0391 to 9fa635f Compare April 15, 2026 16:32
deborahnelson8788726 pushed a commit to deborahnelson8788726/parameter-golf that referenced this pull request Apr 22, 2026
Added experimental techniques for Parameter Golf exploration:
- LegalNgramMixer (PR openai#1642 compliant N-gram with exact tuple keys and
  full-vocab distribution) — too slow in Python, timed out on Modal
- Lion optimizer for SLOT (Trinity framework technique) — gave 0.71197
  on 1xH100 vs 0.72097 for AdamW; marginally better but both worse than v3
- Phi-rank softmax in SLOT eval (Trinity golden-ratio weighting) — worse
  at 0.81697; 50/50 blend hurts calibrated probabilities
- Configurable NGRAM_LEGAL, SLOT_OPTIMIZER, SLOT_PHI_RANK env vars
- Modal launch scripts for v4-v7 reproducibility
- RunPod training shell script for 8xH100 deployments

These are negative/marginal results kept for reproducibility. The clean v3
submission (PR openai#1722, 0.65802 BPB) remains our primary legal record.

Added to .gitignore: .secrets/, .obsidian/, cowork_transfer/

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant