Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
405 commits
Select commit Hold shift + click to select a range
d31cd54
Bandit_Wagon: clean HYPOTHESIS.md — remove stale oracle/proxy refs, f…
Mar 30, 2026
c8a2468
Bandit_Wagon: strip dead code from train_gpt.py (2378 → 1860 lines)
Mar 30, 2026
78a4e47
Bandit_Wagon: fix banner title
Mar 30, 2026
2e3d5bf
pod_setup.sh: switch branch to TEST_LAB, remove dead FLA/DeltaNet ins…
Mar 30, 2026
5417530
Bandit_Wagon: add ad-hoc winddown A/B suite
Mar 30, 2026
4ce945f
Add clean Rascal A/B lab for baseline, turbomuon, engramlite, combo
Mar 30, 2026
4401ff8
BW-00 anchor: 1.18616 int6 SW BPB (seed 444, 8×H100, 600s)
Mar 30, 2026
d02bb2c
Bandit_Wagon: add run_ablations.sh — BW-01..04 back-to-back at 350s, …
Mar 30, 2026
e135eb9
Bandit_Wagon: run_ablations.sh default NPROC=1 (single GPU signal)
Mar 30, 2026
9e8b69f
Bandit_Wagon: fix run_ablations.sh env var passing (use env)
Mar 30, 2026
56e3ff3
Bandit_Wagon: add 1-GPU winddown wrapper
Mar 30, 2026
656622a
pod_setup.sh: download tokenizer (fineweb_1024_bpe.model) in step 6
Mar 30, 2026
b64efeb
Bandit_Wagon: run_ablations.sh step-based stopping (ABLATION_STEPS=500)
Mar 30, 2026
33742df
Add fresh pod bootstrap and single-H100 signal runners
Mar 30, 2026
b04652b
Make Rascal runners use portable torchrun default
Mar 30, 2026
e2b3ec0
Add Bandit_wagon_5f_ablations: 4F vs 5F direct proxy comparison
Mar 30, 2026
7a36bf8
Add Rascal_Turbo race-ready TurboMuon-only variant
Mar 30, 2026
3d675e0
Bandit_wagon_5f_ablations: 4F+1C confirmed optimal, 5F hypothesis denied
Mar 30, 2026
f3ecde9
Add bandit_wagon_XSA: XSA coverage sweep on confirmed 4F+1C config
Mar 30, 2026
550edbf
Bandit_Wagon: add 8xH100 launcher and checkpoint arch autodetect
Mar 30, 2026
6a46f87
Add bandit_wagon_crawler_mlp: crawler MLP leaky slope sweep
Mar 30, 2026
be8459a
bandit_wagon_XSA: XSA=15 (full coverage) wins on BPB AND speed
Mar 30, 2026
07eb836
Rascal_Turbo: single run.py launcher
Mar 30, 2026
7f773de
bandit_wagon_choke: per-loop bottleneck choke sweep for crawler MLP
Mar 30, 2026
aeab681
bandit_wagon_smear: LoopSmearGate — depth error damping between crawl…
Mar 30, 2026
0df2921
Add single-H100 Rascal ablation matrix runner
Mar 30, 2026
2958dca
bandit_wagon_tap: per-loop gated encoder tap sweep (7 arms)
Mar 30, 2026
37f1dcf
Add sparse skip-gram ngram ablation for single-H100 Rascal
Mar 30, 2026
1b674cf
Revert unvalidated sparse skip-gram integration from Rascal runner path
Mar 30, 2026
fe3b7d7
bandit_wagon_battery: per-loop RoPE scale sweep + mega ablation runner
Mar 30, 2026
38c8826
Add isolated sparse skip-gram ablation (2200-step single-GPU)
Mar 30, 2026
c3d3b8f
bandit_wagon_crawler_mlp: log BW3 results — slope insensitive, stay a…
Mar 31, 2026
bac88a6
bandit_wagon_battery: fix MLP.forward to accept optional loop_idx
Mar 31, 2026
6b7f205
Remove ngram sparse ablation files; keep Rascal path ngram-free
Mar 31, 2026
f7f301a
Add stripped Rascal skip-gram 2200-step calibration runner
Mar 31, 2026
5ccd09c
Add bandit_wagon_choke_shaped experiment (BWCS series)
Mar 31, 2026
34ce3e4
Log 2026-03-31 single-H100 RASCAL ablation matrix results
Mar 31, 2026
cb25a92
Add next calibrated single-GPU RASCAL ablation pack
Mar 31, 2026
362b220
Crawler Leg3 README: add architecture philosophy
Mar 31, 2026
6b81bb0
Crawler Leg3 README: add active ablation work section
Mar 31, 2026
bb5b3d4
Log skip-gram calibration seed444 results
Mar 31, 2026
f7edb50
Add BWCB ablation: battery scales on pyramid-512 choke
Mar 31, 2026
66b94eb
Add loader-refine single-GPU ablation pack and notes
Mar 31, 2026
ed2ec71
Add BWCD ablation: descending battery on pyramid-512
Mar 31, 2026
dda1f5b
BWCD: add BWCD-03 wide-medium-wide bracket (9,3,9)
Mar 31, 2026
1108f46
Record mega ablation results in BWB HYPOTHESIS.md
Mar 31, 2026
84f8fbe
Add race-ready Rascal final submission package with loader_cache4 lau…
Mar 31, 2026
d4f8e74
BWCB results: ascending battery hurts pyramid, all configs worse
Mar 31, 2026
3c0ca4c
BWCB Run B (4 shards): 1,2,4+pyramid beats pyramid alone by -0.00210
Mar 31, 2026
ffa8b17
Enforce FA3 preflight and CUDA runtime path in final Rascal launcher
Mar 31, 2026
361114a
BWCD results: 9,1,1+pyramid wins at -0.01193 vs pyramid alone
Mar 31, 2026
3d229bc
BWCD complete: BWCD-03 (9,3,9) final — quant_gap +0.0062, worst of group
Mar 31, 2026
b8f371b
Add Bandit_Wagon_III: pyramid-512 + 9,1,1 battery production runner
Mar 31, 2026
46fb4bd
Rascal: record 2026-03-31 TTT sweep regression (seed 444)
Mar 31, 2026
f962265
Add bandit_wagon_cannon (BWE): per-loop output calibration ablation
Mar 31, 2026
f3cacec
Log Run C results: 1-shard pod with different val data — not directly…
Mar 31, 2026
249f3ba
Rascal: add rascal_master config copies
Mar 31, 2026
352d774
BW3 run.sh: clean competition runner with preflight guards
Mar 31, 2026
aeee4b4
Rascal_Master: SOTA-exact race script — fix COPRIME_MAX_LOADED_SHARDS…
Mar 31, 2026
b9fa53b
BW3 seed=444: 1.20684 int6_sw_bpb, 10.07MB — pyramid-512 + 9,1,1 results
Mar 31, 2026
fa04306
BW3 run.sh: auto-save checkpoint after run
Mar 31, 2026
17be781
Add Bandit_Wagon_IV: 9,1,1 battery without pyramid choke
Mar 31, 2026
8208f50
BW4 seed=444: 1.18731 int6_sw_bpb — beats Leg 3 SOTA with battery only
Mar 31, 2026
a80c8cc
Bandit cannon: log seed444 proxy results (TTT bust)
Mar 31, 2026
972dcf3
BW4: add gate_fullgraph.sh — Tier 1 COMPILE_FULLGRAPH=1 test
Mar 31, 2026
872a159
BW4 gate_fullgraph: fix broken ablation — revert TORCHDYNAMO_OPTIMIZE…
Mar 31, 2026
d4fc252
Add Bandit_Wagon_V: BW4 + COMPILE_FULLGRAPH=1 (Tier 1 speed win)
Mar 31, 2026
1bcab0b
Add master progress checklist
Mar 31, 2026
7f55f3d
BW5 seed=444 results: 1.18672 int6_sw_bpb
Mar 31, 2026
c18ad76
Bandit_Wagon_V_Cannon: single GPU cannon gate on BW5 base
Mar 31, 2026
86af1f3
Add QK_SLOT_Ablation: single-GPU cross-correlation harness
Mar 31, 2026
0b511a3
Lab cleanup: archive old BW experiments, add LAB_PROTOCOL.md
Mar 31, 2026
25b18fa
BW5 seed=300: 1.18758 — does not individually confirm
Mar 31, 2026
6001a70
Add one-shot 8x quick AB runner for Rascal GPTQ stream vs insta
Mar 31, 2026
ee004c9
Make 8x quick AB runner self-contained with FA3 preflight and no rg d…
Mar 31, 2026
7b2e280
Lock Rascal baseline launcher to record trainer and add one-shot base…
Mar 31, 2026
2f151c1
Bandit_Wagon_V_Cannon: cannon gate results — does not promote
Mar 31, 2026
c6caaad
Add one-shot cu124 baseline runner that reuses custom FA3 module
Mar 31, 2026
9f982e0
Make cu124+custom-FA3 runner auto-detect non-venv base python
Mar 31, 2026
e132ccd
Harden cu124 custom-FA3 runner with python-path and filesystem auto-d…
Mar 31, 2026
d4f108b
Extend custom FA3 detection to conda env pythons and .so module files
Mar 31, 2026
1b16d62
Add --no-deps FA3 wheel fallback to cu124 baseline runner
Mar 31, 2026
3f3fa94
Add pyramid/cannon gate scripts — 1gpu + 8gpu for each hypothesis
Mar 31, 2026
f396059
Add QK_SLOT_Ablation STATUS.md — current position log
Mar 31, 2026
c61a16a
Fix gate scripts: stop swallowing output with command substitution
Mar 31, 2026
73446af
Add sota_now.sh — clean single-file cu124 baseline runner
Mar 31, 2026
3a1bdc1
Fix sota_now.sh to use real submission file; vault the correct source
Mar 31, 2026
0e7c317
Relax CUDA check to 12.x — cu128 pod is valid
Mar 31, 2026
bbd4d8a
Fix pod_setup.sh: auto-detect WORKSPACE from script location
Mar 31, 2026
47f450f
Fix inverted awk stack parity check in sota_now.sh
Mar 31, 2026
82c3d26
Quarantine racecar lab confusion; fix records/ with vault file
Mar 31, 2026
fa91fda
Fix pod_setup.sh: dataset shard count crash under set -euo pipefail
Mar 31, 2026
1d9edb8
Add two-track lab structure: neural/ and crawler/
Mar 31, 2026
525cd34
BWVC 8GPU gate results: scalar cannon passes speed gate
Mar 31, 2026
90475fc
Add RESULTS.md stub for Bandit_Wagon_V_Pyramid
Mar 31, 2026
9e5fef2
Junkyard: move all legacy/inactive dirs off root surface
Mar 31, 2026
c688b8b
Add folder-based CLAUDE.md agent protocols
Mar 31, 2026
645857b
BWVP 1GPU gate results: pyramid STRONG PASS on quality
Mar 31, 2026
2f4a7ed
Move active BW5 gates into crawler/ track; fix train_gpt.py symlink
Mar 31, 2026
5756670
Fix PyramidCannon gate_1gpu.sh usage comment path
Mar 31, 2026
29caede
Add H→A→R cycle to protocol; scaffold all 3 files in new_leg.sh
Mar 31, 2026
c48e759
Add QK_GAIN_SLOT_Gate ablation experiment
Mar 31, 2026
35f212e
Add per-track science boards + auto-stub in new_leg.sh
Mar 31, 2026
006d1ee
Add submissions/ PR zone — validate script, protocol, templates
Mar 31, 2026
1ea0901
Move pod_setup.sh to scripts/ — accessible from repo root
Mar 31, 2026
85d11b1
BWVPC 1GPU gate: pyramid+cannon PASSES
Mar 31, 2026
cecb7b1
Fix smoke test threshold to scale with nproc
Mar 31, 2026
b20b82d
Fix gate_8gpu.sh: update path comment and step_avg pass criterion
Mar 31, 2026
310d5d1
Record smoke test result: 739ms/step is healthy on 1xH100
Mar 31, 2026
dfb8459
BWVPC 8GPU gate: DOES NOT PROMOTE — pyramid+cannon fails
Mar 31, 2026
50517e3
Update SCIENCE.md: close pyramid/cannon screw, reflect actual complet…
Mar 31, 2026
79d13ad
Two crawler legs: BW5_Cannon full run + BW6_Skipgram gate
Mar 31, 2026
f8fba27
Add BW6_Skipgram gate_8gpu.sh — 2000-step 8xH100 A/B gate
Mar 31, 2026
d28a2fb
Add PIPELINE.md — full ranked hypothesis queue, both tracks
Mar 31, 2026
c4ddf54
BW5_Cannon full run: DOES NOT PROMOTE — +0.00020 vs BW5 at 600s/8034 …
Mar 31, 2026
3c52f3a
Housekeeping: archive closed crawler legs, move QK_Gain_SLOT to neura…
Mar 31, 2026
e699814
BW6_Skipgram gate: null result — trigram neutral on crawler, −140KB s…
Mar 31, 2026
1293058
Archive BW6_Skipgram — null result, closed
Mar 31, 2026
9f84d6c
Update PIPELINE.md: close cannon/skipgram, fix paths, update neural t…
Mar 31, 2026
ba322b0
Fix SLOT backward crash + record run 1 results
Mar 31, 2026
06b4c2d
Add BW7 MegaGate: 8-arm ablation on 4xGPU
Mar 31, 2026
64992a0
Add BW7 MegaGate pod_setup.sh — fresh pod one-shot launcher
Mar 31, 2026
cc36ca6
Relax flash_attn preflight in BW7 MegaGate — warn not abort
Mar 31, 2026
0752174
Fix torchrun path: fallback to python3 -m torch.distributed.run
Mar 31, 2026
f225709
Update neural/SCIENCE.md with competitive intelligence + hypothesis r…
Mar 31, 2026
e2867c2
Add Arch+Sched Sweep: 6-case 4×GPU ablation (rope_32, bigram_4096, qa…
Mar 31, 2026
963b440
Expand sweep to 9 cases: add gptq, bigram_3072, warmdown_4k
Mar 31, 2026
1dc3a32
Sweep: gptq case reuses baseline checkpoint (SKIP_TRAIN=1 + LOAD_CHEC…
Apr 1, 2026
4c6ef06
Add SLOT legality analysis to neural/SCIENCE.md
Apr 1, 2026
479484a
Add BW8_Tap: shared encoder tap dim=32 — strongest MegaGate signal
Apr 1, 2026
3a93e20
Add QK_Gain_SLOT_Legal: context-only SLOT (legal causality-safe variant)
Apr 1, 2026
ef750ce
Add BW9_Anchor gate + update SCIENCE.md with MegaGate results
Apr 1, 2026
89a321e
Fix gate.sh: remove quotes from inline env var assignments
Apr 1, 2026
75840c7
Add BW10_GPTQ — loop-aware GPTQ gate on BW8 baseline
Apr 1, 2026
33a8144
Add gptq_full case: full training run with SKIP_GPTQ=0 (not post_only)
Apr 1, 2026
52cd457
Log Arch+Sched sweep results (seed 444, 4×GPU): all 9 cases dead
Apr 1, 2026
c0ceacd
Add Rascal_III_SLOT leg: context-only Legal SLOT on Rascal II base
Apr 1, 2026
b5e9e7c
Add BW11_5Flat — 5F+1C depth revisit on BW8 baseline
Apr 1, 2026
93ef50a
Rascal_III_SLOT run.sh: minimal racer, exact SOTA env + SLOT_ENABLED=1
Apr 1, 2026
5efb22b
BW10_GPTQ: gate PASS — −0.00486 int6_sw, step time clean
Apr 1, 2026
cb4f1c2
BW10_GPTQ: add production run.sh (8×H100, 600s, LOOP_AWARE_GPTQ=1)
Apr 1, 2026
dd2e06a
BW11_5Flat: add production run.sh (8×H100, 600s, NUM_FLAT_LAYERS=5)
Apr 1, 2026
0e428cd
Add RASCAL_WINDOWN_TESTING — 4-arm legal window strategy suite
Apr 1, 2026
2338fee
BW10_GPTQ: full run PROMOTES — 1.18292670 BPB, new champion
Apr 1, 2026
a70185a
Rascal_III_SLOT: surgical SLOT via hook, no model class changes
Apr 1, 2026
ef2c932
BW11_5Flat: full run PROMOTES — 1.17651313 BPB, new champion
Apr 1, 2026
6c864c1
Scaffold Crawler II submission — seed=444 pre-filled, seed=300 pending
Apr 1, 2026
385d704
Rename submission: Crawler II → Nightcrawler
Apr 1, 2026
7b9a11b
Nightcrawler: fill seed=300 results — 1.17490448 BPB, mean 1.1757
Apr 1, 2026
6f8e093
Nightcrawler: add seed logs; fix validate.sh set-e/arithmetic bug
Apr 1, 2026
bcd26f7
Nightcrawler: add seed=4 (1.17676091 BPB), update mean to 1.1761
Apr 1, 2026
e0b05ab
Rascal_III_SLOT: log both full runs — signal confirmed, size blocked
Apr 1, 2026
9332369
Rascal_III_SLOT: enforce cu124 — reject cu128 at launch
Apr 1, 2026
587534e
Enforce strict Neural SOTA stack parity on pod setup/run
Apr 1, 2026
2e445b7
Accept custom hopper FA3 path while keeping strict SOTA gating
Apr 1, 2026
0ed1ea3
Prefer working FA3 provider and fail clearly on ABI mismatch
Apr 1, 2026
580bf08
Disable FA3 wheel installs by default for custom-head pods
Apr 1, 2026
3203ed9
Restore custom FA3 discovery in pod_setup.sh
Apr 1, 2026
03698c1
Lock pod_setup.sh snapshot to vault (2026-04-01, cu124, custom FA3)
Apr 1, 2026
e7a5944
Re-enable FA3 wheel install by default (restore cu124 Hopper FA3)
Apr 1, 2026
0ae805f
Update vault pod_setup snapshot to e7a5944 (FA3 wheel default restored)
Apr 1, 2026
49c544f
Fix FA3 install: replace dead pytorch WHL URL with GitHub releases
Apr 1, 2026
f825d02
Add FRESHPOD_OF_BELAIR.sh — original working pod setup (fa91fda)
Apr 1, 2026
4fd4c94
Restore vault pod_setup to fa91fda original (FRESHPOD_OF_BELAIR)
Apr 1, 2026
c81ca06
Rascal_III_SLOT: strip agent-injected torch/CUDA version gate
Apr 1, 2026
cc1fc25
Add install_fa3_cu124.sh — FA3 wheel installer for cu124/torch2.4
Apr 1, 2026
6bc1d59
Add setup_pod_cu130.sh — install torch cu130, verify FA3
Apr 1, 2026
da6c41c
Harden pod_setup.sh + restore run.sh version gate
Apr 1, 2026
bc8a34b
Vault-lock pod_setup (cu124, sympy pin, hard assertions)
Apr 1, 2026
4ffaf67
pod_setup: add source-build fallback for FA3 when wheel 404s
Apr 1, 2026
bffbd34
pod_setup: add wheel to pip deps (needed for FA3 source build)
Apr 1, 2026
ad809f2
pod_setup: add 2.8.3 wheels (cp312 support), fix cross-device fallback
Apr 1, 2026
8743330
Add fresh pod wrapper for Rascal III SLOT
Apr 1, 2026
0622c32
Rascal III SLOT copy: fix SLOT batch gating and pack int6 to 6-bit
Apr 1, 2026
7c98509
Restore pod_setup.sh to pre-storm version (1ea0901)
Apr 1, 2026
1f0ab87
Restore all run scripts to pre-storm versions (9332369)
Apr 1, 2026
9e0905b
Remove storm debris scripts
Apr 1, 2026
f435825
Relax CUDA gate in Rascal III SLOT run.sh: 12.4* → 12.*
Apr 1, 2026
fa69361
Add RASCAL_III_SLOT_F: SLOT + true 6-bit int6 packing
Apr 1, 2026
cd8825d
Rascal III SLOT: add seed300 run log (2026-04-01 17:20)
Apr 1, 2026
5372432
pod_setup: smart FA3 install — Dao-AILab wheels + system discovery
Apr 1, 2026
aa1a24c
Add setup_cu124_fa3_venv.sh — automated cu124+FA3 bridge setup
Apr 1, 2026
dd48eac
setup_cu124_fa3_venv: symlink .so instead of copy, add disk cleanup
Apr 1, 2026
abd7f31
crawler: add BW12 2k interaction ablation sequence
Apr 1, 2026
8d806f9
crawler: record BW12 results and add BW13 4x interaction series
Apr 1, 2026
fabad3a
pod_setup: enforce true FA3 runtime (remove FA2 wheel fallback)
Apr 1, 2026
6929756
scripts: add fast trimmed-kernel FA3 installer; remove broken cu124 b…
Apr 1, 2026
822bf2f
setup: vault fast FA3 installer; remove best-effort FA3 bootstrap path
Apr 1, 2026
41fe618
fa3: canonical quick build uses --no-build-isolation
Apr 1, 2026
1fcaa69
rascal_slot: disable torch._dynamo optimize_ddp by default to avoid i…
Apr 1, 2026
9807abe
revert: restore Rascal III SLOT train_gpt_slot.py exactly (no agent e…
Apr 1, 2026
192b22f
hotfix: add sitecustomize to disable dynamo optimize_ddp and suppress…
Apr 1, 2026
4e18a40
crawler: record BW13 interaction ablation results
Apr 1, 2026
210c59c
fa3: revert installer flags to historical 4-flag baseline
Apr 1, 2026
a0dda13
crawler: add BW14 big-swing 2k ablation sequence
Apr 1, 2026
5b32740
fa3: add portable cu124 wheel build/install scripts
Apr 1, 2026
ed9222b
neural: add 2k same-checkpoint SLOT H2H probe
Apr 2, 2026
f0edf59
neural: relax h2h runner stack gate
Apr 2, 2026
fae52c1
crawler: unify all ablations into single 4x runner
Apr 2, 2026
b96ad3a
neural: add rascal ii 2k comparison runner
Apr 2, 2026
f029757
scripts: add locked cu124 fa3 pod installer
Apr 2, 2026
dbd0b07
scripts: add clean pre-storm pod_setup copy (1ea0901)
Apr 2, 2026
3736faa
crawler: add BW16 depth sweep for NUM_FLAT_LAYERS 6-11
Apr 2, 2026
833a9bc
vault: extract user-tested SOTA file from 99b790d (d70ec518, 2468 lines)
Apr 2, 2026
7cc5770
crawler: add BWX latest run (8F + quant sweep)
Apr 2, 2026
5a3c552
crawler: switch BWX latest baseline to 9F
Apr 2, 2026
e76f9df
neural: add Lucky — SOTA sequential + SLOT eval
Apr 2, 2026
154fe81
neural: bake Lucky defaults — seed=444, SLOT=1, SKIP_GPTQ=1
Apr 2, 2026
02c7beb
neural: Lucky — switch to coprime loader (matches March 30 SOTA)
Apr 2, 2026
db59a3a
crawler: harden BWX contender runner with window-first selection
Apr 2, 2026
e39955d
neural: Lucky sequential+SLOT seed444 log — 1.10514 BPB (size busted)
Apr 2, 2026
52332f7
crawler: add submission-ready Bandit Wagon X 9F full-run pack
Apr 2, 2026
bb33e48
pod_setup: switch to competition dataset source (willdepueoai/paramet…
Apr 2, 2026
513c1e8
crawler: add BW17 DGX-spark cadence longform interaction suite
Apr 2, 2026
35658d7
crawler: drop local BW17 smoke artifacts
Apr 2, 2026
689beaf
Lucky: bypass torch.save for int6 artifact — raw bytes + JSON header
Apr 2, 2026
6fa9f32
Lucky: remove "embed" from int6 categories — restore original quantiz…
Apr 2, 2026
21079da
bw17: harden cadence sweep resume + stride controls
Apr 2, 2026
3df3735
bw17: add single-command resume launcher from failed arm
Apr 2, 2026
2edf0a9
Add LC4 8x launcher script
Apr 2, 2026
adca6e1
Fix LC4 launcher pod paths
Apr 2, 2026
c5a2c82
Make LC4 launcher use pod env fallbacks
Apr 2, 2026
78812b9
Probe torch-capable python in LC4 launcher
Apr 2, 2026
b0d01f9
Add RECOVERY_TEST_1 LC4 trainer copy
Apr 2, 2026
3509647
Add lucky_slot recovery trainer
Apr 2, 2026
8e53b6f
crawler: add 24h audit table from BW12-14, BWX9F, isolated 10F
Apr 2, 2026
64395b5
Revert lucky_slot export to safe serializer
Apr 2, 2026
3634758
Replace lucky_slot trainer with Lucky
Apr 2, 2026
f8e3e8b
Add lucky_slot seed300 SLOT run log
Apr 2, 2026
50a5910
Make Lucky SLOT post-export and window-isolated
Apr 2, 2026
65fdc7a
Bake 30 percent SLOT defaults into Lucky
Apr 2, 2026
def0fdc
Allow partial dataset pulls in pod setup
Apr 2, 2026
fb7c598
Add 2k rascal signal hunt experiments
Apr 2, 2026
794cf99
Log 4gpu 2k build sweep results
Apr 2, 2026
f175bb4
Add Spark SLOT size hunt runners
Apr 2, 2026
89fd9f5
Add Lucky II contender with legal SLOT
Apr 2, 2026
ddcc281
Split QK4 contender from warmdown kit
Apr 2, 2026
3418f02
Add Lucky III: Lucky II + brotli byte-shuffle compression
Apr 2, 2026
6c13590
Add brotli to pod setup
Apr 2, 2026
7fac1b4
Lucky III: revert QK4 to baseline 1.5, keep brotli+SLOT
Apr 3, 2026
876d0bd
Add BW20_Brotli_2k: crawler compression gate (zstd → brotli)
Apr 3, 2026
6d51b78
Add SLOT_brotli: baseline + SLOT + brotli byte-shuffle
Apr 3, 2026
c609f04
Add BW XI: BWX 9F + brotli + GPTQ production run
Apr 3, 2026
f052d7c
Fix coprime_shards_per_batch default to 1 (match safepoint)
Apr 3, 2026
0d40c15
Fix loader_mode default to coprime (match safepoint)
Apr 3, 2026
f972331
Enable SLOT by default
Apr 3, 2026
33a03d3
Log SLOT_brotli seed 300 breakthrough result
Apr 3, 2026
6a7608a
BW XI: stack all confirmed gains — loop-aware GPTQ, QK4, loops=2, brotli
Apr 3, 2026
181ce55
Add Lucky IV: per-sample SLOT delta + 24 steps
Apr 3, 2026
af4700c
Add Ouroboros submission — 1.1364 BPB, 15.05MB
Apr 3, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
12 changes: 11 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,14 @@ data/manifest.json
data/docs_selected.jsonl
.mypy_cache/
.venv
logs/
logs/
experiments/archive/checkpoints/

# Large binaries — never commit
*.pt
*.ptz
junkyard/results/
junkyard/checkpoints/
junkyard/experiments/archive/checkpoints/
junkyard/experiments/GreenRod_X_1/lab_protocol_20260327/research_hub_*/
junkyard/experiments/GreenRod_X_1/lab_protocol_20260327/vast_tests/
10 changes: 10 additions & 0 deletions .hotfix/sitecustomize.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import os

try:
import torch._dynamo as d
# Keep compile enabled, but avoid known DDP graph optimizer crash path.
d.config.optimize_ddp = False
# If a graph still fails, fall back instead of killing the entire run.
d.config.suppress_errors = True
except Exception:
pass
77 changes: 77 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Parameter Golf Lab — Agent Protocol

## Orient first
```
cat neural/LEADER.md # current neural SOTA
cat crawler/LEADER.md # current crawler SOTA
```
These two files tell you where the lab stands. Read them before doing anything.

## Repo structure
```
neural/ ← Neural SOTA track (Rascal lineage) — leaderboard #1 focus
crawler/ ← Crawler track (Bandit_Wagon lineage) — compression/quality focus
submissions/ ← Competition PR zone. Read submissions/PROTOCOL.md before touching.
vault/ ← Immutable locked sources. Never modify.
records/ ← Leaderboard submission records. Never modify.
scripts/ ← Shared runners. sota_now.sh is the neural baseline runner.
data/ ← Dataset. Never modify.
junkyard/ ← Legacy experiments. Read-only reference only.
```

## Hard rules

**NEVER overwrite a test file.** Always create a new file. If you need to modify
a training script, copy it first, work on the copy, name it clearly.

**Confirm names before creating.** Ask the user what to name a new leg, script,
or directory before creating it. Never invent names silently.

**ONE variable per test.** If a run changes more than one thing vs the baseline,
the result is uninterpretable and the money is gone.

**Gate before 8x.** Every hypothesis runs a 1-GPU 2000-step gate (~$0.50) before
an 8×H100 full run (~$3-4). Never skip the gate.

**Never submit from TEST_LAB.** Submissions go: `submissions/` zone only.
Read `submissions/PROTOCOL.md`. Run `bash submissions/validate.sh <records_dir>` first.
Branch flow: `submission/<name>` → push `fork1` → PR to `openai/parameter-golf`.

## RunPod workflow
1. Pod always pulls from `TEST_LAB` branch
2. Commit and push scripts BEFORE launching the pod
3. On pod: `git pull && bash <script>`
4. Never push FROM the pod
5. Pod gets destroyed after the run — save checkpoints before destroying

## Test cycle: Hypothesis → Ablation → Results

Every leg follows this sequence. No skipping steps.

```
hypothesis.md ← write FIRST. ONE variable. Why. Gate target.
train_gpt.py ← copy from leader, make the ONE change
gate.sh ← commit+push → pod pulls TEST_LAB → run (1-GPU, 2000 steps)
ablation.md ← fill gate result. Pass? Proceed. Fail? Stop.
run.sh ← commit+push → pod pulls TEST_LAB → run (8×H100, 600s, seed=444)
ablation.md ← fill full run result. Beats leader? Run confirmation.
confirmation run (8×H100, 600s, seed=300)
RESULTS.md ← verdict (PROMOTES / DOES NOT PROMOTE), what we learned, next hyp
```

New legs are scaffolded with all three files pre-created:
```bash
bash scripts/new_leg.sh neural <name>
bash scripts/new_leg.sh crawler <name>
```

## Seeds
- Primary: 444
- Confirmation: 300
- Never use 1337

## Cost
- 8×H100 SXM: ~$13.36/hr
- Full 10-min run: ~$3-4
- Gate (1-GPU, 2000 steps): ~$0.50
- Do not suggest a run without a validated gate or clear hypothesis
121 changes: 121 additions & 0 deletions LAB_PROTOCOL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Lab Protocol — Parameter Golf

_We are competing for #1. Every pod dollar is a decision._

---

## The One Rule

**ONE variable changes per test. If you change two, the result is meaningless and the money is gone.**

Before committing any gate script: diff it against the baseline. Count the differences. If it's more than one, stop.

---

## Pipeline: Gate → Full → Submit

```
Hypothesis
Single GPU gate (2000 steps)
↓ passes?
8×H100 full run (600s, seed=444)
↓ beats baseline?
8×H100 confirmation (seed=300)
↓ both seeds confirm?
Submission branch → PR
```

**Never skip the gate.** A 2000-step single GPU run costs ~$0.50. A full 8×H100 run costs ~$3-4. Skipping the gate to save 10 minutes has cost us runs.

**Never submit on one seed.** Seed variance is real. Two seeds confirming = it's real.

---

## Cost Discipline

- 8×H100 SXM: ~$1.67/hr per GPU = **$13.36/hr for 8×**
- Full 10-min run (with pod overhead): **~$3-4**
- Per-race budget: **~$15**
- Do not suggest a run without a validated gate result or a clear hypothesis

**Reproducing a score we already own = no.** Never re-run a baseline we control unless the architecture changed.

---

## Checkpoints

After every full run, `final_model.pt` gets copied to a unique name immediately:

```bash
cp final_model.pt checkpoints/EXP_s${SEED}_$(date +%Y%m%d_%H%M%S)_bpb${BPB}.pt
```

The pod gets destroyed. If the checkpoint isn't saved before that, it's gone.

---

## Script Standards

- Every experiment lives in `experiments/<Name>/`
- Every experiment has: `run.sh`, `gate.sh` or `gate_1gpu.sh`, `RESULTS.md`
- `run.sh` uses `train_gpt.py` from the same directory (symlink or copy)
- Scripts are committed and pushed before the pod fires
- Never paste raw commands. Always a `.sh` file.
- Log files go to `experiments/<Name>/results/` or `logs/`

---

## Naming

- Confirm experiment names before creating directories
- Active series: `Bandit_Wagon_V`, `Bandit_Wagon_V_Cannon`, etc.
- Superseded experiments → `experiments/archive/`
- Never reuse a name from a previous run

---

## SOTA Garage

Three active models:

| Track | Model | BPB | Size |
|-------|-------|-----|------|
| Neural | Rascal II | 1.10987 | 15.44MB |
| Crawler | BW5 seed=444 | 1.18672 | 8.61MB |
| Compression | FX_WING_DELTA | 0.2233 | — (model lost) |

**Submission branch protocol:**
1. Never submit from TEST_LAB
2. Create dedicated branch → push to Open-parameter-golf-1 fork → PR to openai/parameter-golf
3. Every PR needs: `submission.json`, logs, README with reproduce instructions

---

## Experimental Design

- Proxy deltas (500 steps, 1 GPU) inflate **5–15×** vs full run. Never promote from proxy alone.
- Gate (2000 steps, 1 GPU) is the minimum signal to trust.
- SWA kicks in at step ~7650. Results before that step are pre-SWA.
- Wallclock budget is 600s. Extra parameters cost convergence speed — account for this.
- `COMPILE_FULLGRAPH=1` is now baseline for all BW5+ experiments.

---

## Seeds

- Primary: **444**
- Confirmation: **300**
- Never use 1337 for new experiments.

---

## Submission Checklist

- [ ] Two seeds confirmed, both beat baseline
- [ ] `submission.json` present
- [ ] Logs committed
- [ ] README with reproduce instructions
- [ ] File size ≤ 16MB
- [ ] Score-first always (no training on val before scoring)
- [ ] Branch is NOT TEST_LAB
Loading