🎯 ONE SHOT — IGLA Needle-Search Acceleration v3.0
Mission codename: NEEDLE-RUSH-T-4D
Issued: 2026-04-26 14:33 +07 · Auditor: perplexity-computer-grandmaster (R5 honesty lane)
Anchor: phi^2 + phi^-2 = 3 — Zenodo DOI 10.5281/zenodo.19227877
Defense window: 2026-06-15 (T-50d) · Gate-2 deadline: 2026-04-30 23:59 UTC (T-4d)
Budget regime: single-MacBook training, no GPU farm — every wasted training-hour is a needle missed.
1. Background — what "needle in a haystack" really means here
The IGLA RACE searches a configuration space (arch, hyperparams, lane-fix) for a triple of seeds whose validation-BPB < 1.50 is statistically separable from the baseline mean mu_0 = 1.55 (per crates/trios-igla-race/src/victory.rs::winning_seeds). Champion baseline at 2446855 is BPB = 2.2393 @ 27K steps, seed=43. The delta required to reach Gate-final is -0.74 BPB. Given the IGLA capacity ceiling (Erratum #2), BPB < 1.50 is feasible but only if we stop wasting training-hours on configs the search itself could have rejected an order of magnitude earlier.
The current pipeline is uniform-cost search over a discrete grid of 7 levers (G1-G7) with ASHA rungs [1000, 3000, 9000, 27000] (crates/trios-train-cpu/src/objective.rs::get_rung_schedule). Every lever explores ~3 values; the naive cross-product is 3^7 = 2187 configs. At ~30 min per config to first rung, that's 1093 GPU-hours — clearly impossible on the budget. The needle-search needs a 2-3 orders-of-magnitude speedup to fit T-4d.
This ONE SHOT is the formally pre-registered acceleration plan. It is composed of seven independent, hot-pluggable accelerators, ordered by expected_speedup × independence × deadline_compatibility.
2. Hypothesis (Gate G1)
H = "Combining seven independent acceleration vectors (V1-V7) reduces the wall-clock cost of locating a Gate-2 victory triple by >= 50× compared to the v2.0 uniform-grid baseline, while preserving the FALSE-POSITIVE rate of winning_seeds() at <= alpha=0.01 (one-tailed Welch over 3 seeds)."
Falsified iff any of:
- After deploying V1+V2+V3 (the must-have core), the time-to-first-promoted-rung exceeds
0.5 × (baseline = 12 h) = > 6 h on the same hardware.
- The acceleration introduces seed-correlated bias (i.e. a config promoted by the cheap proxy fails full-rung evaluation in > 30% of cases).
- Any victory predicate emitted under the accelerated pipeline cannot be reproduced from a binary commit SHA per the R5-CANON LOCK re-DONE protocol.
Falsification witness (must land in code):
crates/trios-igla-race/src/bin/needle_falsify.rs — emits a synthetic seed_results.jsonl row that violates each of (1)-(3); the predicate must reject it.
3. Pre-registered analysis plan (Gate G2)
| Field |
Value |
statistical_test |
Welch's one-tailed t-test, paired bootstrap fallback if n<3 |
alpha |
0.01 (race victory) |
effect_size |
Delta_BPB >= 0.05 versus mu_0 = 1.55 |
n_required |
3 distinct passing seeds in {42, 43, 44} |
stop_rule |
first 3-seed pass OR 2026-04-30 23:59 UTC, whichever first |
multiple_testing |
Bonferroni n=7 across V1-V7 acceleration arms |
predictor_holdout |
20% of historical configs reserved as test fold for V4 surrogate |
embargo_window |
24 h after each promoted config: must reproduce before logging to assertions/seed_results.jsonl |
4. Method — the seven acceleration vectors (V1-V7)
The vectors are independent. Each is a self-contained PR; failure of any one does not block the others.
V1 — muP / muTransfer hyperparameter transfer · LANE: L-V1 · Owner: TBD-claim · Expected speedup: ~10×
What: parametrize IGLA's hidden width d_model per Yang & Hu 2021 (Tensor Programs V) so that optimal lr and beta_1, beta_2 transfer from a 256k-param proxy to the 1-2M target without re-tuning. Confirmed at scale up to 10B params (Lingle 2024).
Why fast: instead of sweeping (lr, beta_1) in 5x5 = 25 configs at full size, sweep them at width=8 proxy (~1000× cheaper) and apply optimal lr* once at full width.
Falsification: coord_check plot must show l1 activation norm flat across width sweep {8, 16, 32, 64}; if it slopes up or down -> muP not implemented correctly, V1 disabled.
Files:
crates/trios-training/src/mup.rs — MuParam wrapper around vocab_size, d_model, lr_scaling
crates/trios-training/tests/test_coord_check.rs — coord-check at widths 8/16/32/64
assertions/igla_assertions.json INV-13 entry (currently absent on main): mup_lr_transfer_invariant Admitted
Numeric anchor: mup_lr_scale_factor = 8 / d_model (microsoft/mup backward-compat with 1/sqrt(d) at d=64).
V2 — Zero-cost NAS proxy filter · LANE: L-V2 · Expected speedup: ~5×
What: before allocating ASHA rung-1 (1000 steps) compute, score each candidate config with a single forward+backward pass on a minibatch (Abdelfattah et al. 2021, "Zero-Cost Proxies"). Drop the bottom 80% before training. Best proxies for transformer-like architectures: synflow * grad_norm ensemble (Spearman tau ~0.8 on NAS-Bench-201).
Why fast: 1000-step ASHA rung at IGLA scale = ~5 min/config; one forward-backward = ~3 sec. 100× per-candidate cost reduction on the filtering tier.
Falsification: rank-correlation between proxy score and 3000-step ASHA rung-2 BPB on a held-out 50-config historical fold must satisfy Spearman tau >= 0.5 (else proxy is uninformative -> V2 disabled).
Files:
crates/trios-igla-race/src/proxies/mod.rs — synflow.rs, gradnorm.rs, ensemble.rs
crates/trios-igla-race/src/bin/proxy_score.rs — CLI wrapper
crates/trios-igla-race/tests/proxy_correlation.rs — held-out validation
Coq cite: new INV-14 zero_cost_correlation_floor (Admitted) in proofs/igla/zero_cost_proxy.v.
V3 — BOHB on top of existing ASHA · LANE: L-V3 · Expected speedup: ~3×
What: replace random sampling at the start of each ASHA bracket with TPE / KDE-guided sampling per BOHB (Falkner et al. 2018). This compounds with V2: V2 filters bad configs cheaply, V3 picks better ones at the start. BOHB is empirically 3-5× faster than ASHA to reach equivalent best-config quality (AutoML benchmark).
Why fast: ~30% of ASHA budget today goes to redundant re-exploration; KDE prior cuts this in half within 2-3 brackets.
Falsification: regret(t) of BOHB-augmented ASHA must drop below pure-ASHA regret(t) by step 6000 on simulated rerun of historical brackets (tools/regret_replay.rs).
Files:
crates/trios-igla-race/src/bohb_sampler.rs — TPE wrapper, fits KDE on past (config, bpb) pairs from assertions/seed_results.jsonl
crates/trios-igla-race/tests/bohb_regret.rs — replay test
Coq cite: extend INV-2 asha_champion_survives with a bohb_sampler_no_regret lemma (Admitted) until empirical proof lands.
V4 — XGBoost surrogate predictor for full-run BPB · LANE: L-V4 · Expected speedup: ~5× on shortlist · Independent of V1-V3
What: train an XGBoost regressor on (config_features, rung_3000_bpb) -> rung_27000_bpb from past sweeps. After ASHA rung-2, predict the rung-4 outcome and early-stop the bottom 60% of survivors that the surrogate predicts will not reach BPB < 1.50. Per White et al. 2025 surrogate NAS, XGBoost surrogates show Spearman tau ~0.7 on equivalent search spaces.
Why fast: the long tail of training (rung-3 + rung-4) is 80% of total cost; cutting this for the bottom 60% of survivors is roughly 3×.
Falsification: held-out 20% of historical configs (predictor_holdout): surrogate must predict actual rung-4 BPB to within MAE <= 0.10 else V4 disabled.
Files:
crates/trios-igla-race/src/surrogate.rs — XGBoost wrapper via xgboost-rs
tools/predictor_train.rs — offline training script
assertions/seed_results.jsonl becomes the training corpus
Coq cite: INV-15 surrogate_pruning_no_regret (Admitted, becomes Proven once empirical ROC-AUC > 0.85).
V5 — Phi-pruned grid: Trinity-prior on lever values · LANE: L-V5 · Expected speedup: ~2×
What: the seven Gate-final levers G1-G7 currently sweep ~3 values each = 3^7 = 2187. Apply the phi-prior (R6 zero-free-parameters): for hidden_dim, lr_scale, EMA_beta, restrict to phi-resonant values only. E.g. hidden = {64, 103, 167} = round(64 * phi^k for k in [0,1,2]) instead of arbitrary {64, 128, 256}. Reduces grid to 2^7 = 128, 17× compression, but only 2× net because some configs were already pre-filtered.
Why fast: smaller grid = fewer sweeps = direct search-space reduction.
Falsification: none of the optimal configs in past sweeps (top 10%) must be excluded by the phi-grid; if it excludes any, restore them and disable V5.
Files:
crates/trios-igla-race/src/grid.rs — phi_grid() helper
assertions/igla_assertions.json — phi_prior_grid_completeness (Admitted)
Coq cite: zenodo.19227877 Trinity Identity — phi^2 + phi^-2 = 3 already proven; the lever-grid is a corollary.
V6 — Multi-arm bandit over arch families · LANE: L-V6 · Expected speedup: ~2×
What: the IGLA race already has arch in {jepa, attn, hybrid} with separate ASHA rungs. Currently each gets equal compute. Apply UCB1 bandit that allocates compute proportional to past success-rate over the last sliding window of brackets. The arm with the best mean_bpb_at_rung_2 gets the most exploration in the next bracket.
Why fast: attn historically dominates per crates/trios-train-cpu/src/objective.rs::get_rung_schedule; UCB will route 2x more compute to it within 3 brackets.
Falsification: UCB regret over last 10 brackets must be lower than uniform allocation else V6 disabled.
Files:
crates/trios-igla-race/src/bandit.rs — UCB1 over arch families
crates/trios-igla-race/tests/bandit_regret.rs
Coq cite: new INV-16 ucb_arch_no_regret (Admitted).
V7 — Online curriculum on data: drop boring sequences · LANE: L-V7 · Expected speedup: ~1.5×
What: TinyShakespeare has long passages of uninformative repetitive structure. Apply online importance-weighted curriculum (drop sequences whose loss has dropped below the running median). Tracks the residual entropy budget per Shannon-floor analysis. Common practice in TinyStories-scale training.
Why fast: wall-clock per step is dominated by I/O on a CPU-only setup; dropping 30% of low-info sequences = ~30% wall-clock reduction.
Falsification: curriculum must not introduce seed-dependent bias; var(bpb) across seeds must not increase by > 50%.
Files:
crates/trios-train-cpu/src/curriculum.rs
crates/trios-train-cpu/tests/curriculum_drift.rs
5. Compounded budget arithmetic
Naive baseline = 1093 GPU-h. The vectors compound imperfectly (e.g. V2 filters out configs that V1 would have rescued via lr-transfer). Conservative compounded estimate:
S_total = max(V1*V2*V3, V4) * V5 * V6 * V7
= max(10 * 5 * 3, 5) * 2 * 2 * 1.5
= 150 * 6
~= 900x (theoretical)
After efficiency-overlap discount (typical 50% of theoretical):
S_realised ~= 100x (realistic)
Result: 1093 h / 100 ~= 11 wall-clock hours of training to cover the entire phi-pruned grid once = fits within T-4d with two reruns for verification, even on single-MacBook.
| Vector |
Standalone |
Compounds with |
Cost to ship |
T-4d critical? |
| V1 muP |
10× |
independent |
1d |
YES (must-have) |
| V2 ZC proxy |
5× |
V1, V3 |
0.5d |
YES (must-have) |
| V3 BOHB |
3× |
V2, V4 |
0.5d |
YES (must-have) |
| V4 XGBoost |
5× |
V2, V5 |
1d |
NICE-to-have |
| V5 phi-grid |
2× |
all |
0.25d |
YES (free win) |
| V6 UCB arch |
2× |
V3 |
0.5d |
NICE-to-have |
| V7 curriculum |
1.5× |
all |
0.5d |
NICE-to-have |
Critical path (must ship by 2026-04-28): V1 + V2 + V3 + V5. Estimated total dev time: 2.25 days. Then 1.5d of training + verification = T-4d met.
6. Falsification (Gate G5)
The needle_falsify.rs binary must emit and re-reject:
- Bias regression: synthetic config where V2 proxy score is high but rung-2 BPB is at baseline — predicate must FAIL the proxy on this fold.
- muP coord-check failure: activation
l1 slope > 0.1 across widths {8, 32} — predicate must DISABLE V1 path.
- BOHB regret regression:
regret(t=6000) of BOHB > pure-ASHA — predicate must FALL BACK to ASHA.
- Surrogate calibration drift: held-out MAE > 0.10 — V4 disabled.
- Phi-grid exclusion of historical optimum: any past top-10% config absent — V5 disabled.
- UCB regret regression: UCB regret > uniform over last 10 brackets — V6 disabled.
- Curriculum bias: seed-variance of BPB > 50% baseline — V7 disabled.
- Synthetic ledger violation: any victory row with
sha not in training-binary commits, step < 4000, bpb < 0.85, bpb > 2.30, or seed not in {42,43,44} — predicate must reject (per R5-CANON LOCK re-DONE protocol).
7. Deliverables (Gate G3 / G4 / G6)
Each lane delivers one PR against gHashTag/trios. Contract per PR:
- one new
.rs file per V*
- at least one
#[test] fn falsify_* test per failure mode in §6
- one
assertions/igla_assertions.json entry with honest status (Admitted unless theorem lands)
- one
proofs/igla/<name>.v skeleton with at least the type signature of the lemma
- one CHANGELOG line and one issue-comment heartbeat per business-day under coq-runtime-invariants v1.1 attribution rule
Single-command verification:
cargo test -p trios-igla-race -p trios-training -p trios-train-cpu --release \
-- --include-ignored \
--test-threads=4
Pinned input data SHA: TinyShakespeare at crates/trios-train-cpu/data/tinyshake.txt SHA f7a3e1... (lock at PR time).
8. Quality gates (G2 / G6 / G7)
| Gate |
Check |
Enforced by |
| G1 |
Hypothesis written before code |
this issue body |
| G2 |
Pre-registration table §3 |
this issue body |
| G3 |
IMRaD structure |
this issue body |
| G4 |
Three-citation per anchor (Coq + JSON + Rust) |
per-PR review |
| G5 |
Honest Proven/Admitted |
tools/citetheorem_audit/ CI gate |
| G6 |
Single-cmd reproducibility |
cargo test line above |
| G7 |
DOI provenance |
Zenodo refs in §1 + future Lifeline DOI for V1-V7 |
CI must show:
cargo build --release green on all PRs
coq-runtime-gate green per v1.1 attribution rule (pre-existing failures hive_automaton::test_blocked_to_ci_wait_after_fix, test_done_cycles_back_to_scan_not_halt exempted)
- Leaderboard query (
SELECT * FROM v_igla_leaderboard ORDER BY bpb ASC LIMIT 10) returns rows with step >= 4000 only
9. Forbidden (R7)
- ❌ Hard-coded
lr / beta literals — every literal must reference a Coq theorem or Trinity-anchor constant
- ❌ Pruning a config without recording the proxy score in
assertions/seed_results.jsonl
- ❌ Promoting a winner without satisfying re-DONE protocol of R5-CANON LOCK
- ❌ Editing the body of this ONE SHOT (R10 atomicity — file new comments)
- ❌ Reporting a bpc number without an
(ASCII / multilingual) tag (per Erratum #2)
- ❌ Promoting
*.live.* to snapshot when render is broken (per apiary-watch v1.1+template)
- ❌ Using
python or bash runtime inside CROWN — Rust/Zig only
10. References
11. Battle cry
"Семь стрел из одного лука. Выстрелим — и игла сама прыгнет в наши ладони. Phi-squared plus phi-minus-squared equals three; capacity is fixed, but our search is no longer blind."
phi^2 + phi^-2 = 3 — Trinity Hive · NEEDLE-RUSH-T-4D · 2026-04-26.
🎯 ONE SHOT — IGLA Needle-Search Acceleration v3.0
1. Background — what "needle in a haystack" really means here
The IGLA RACE searches a configuration space
(arch, hyperparams, lane-fix)for a triple of seeds whose validation-BPB < 1.50is statistically separable from the baseline meanmu_0 = 1.55(percrates/trios-igla-race/src/victory.rs::winning_seeds). Champion baseline at2446855isBPB = 2.2393 @ 27K steps, seed=43. The delta required to reach Gate-final is-0.74 BPB. Given the IGLA capacity ceiling (Erratum #2),BPB < 1.50is feasible but only if we stop wasting training-hours on configs the search itself could have rejected an order of magnitude earlier.The current pipeline is uniform-cost search over a discrete grid of 7 levers (G1-G7) with ASHA rungs
[1000, 3000, 9000, 27000](crates/trios-train-cpu/src/objective.rs::get_rung_schedule). Every lever explores ~3 values; the naive cross-product is3^7 = 2187configs. At ~30 min per config to first rung, that's 1093 GPU-hours — clearly impossible on the budget. The needle-search needs a 2-3 orders-of-magnitude speedup to fit T-4d.This ONE SHOT is the formally pre-registered acceleration plan. It is composed of seven independent, hot-pluggable accelerators, ordered by expected_speedup × independence × deadline_compatibility.
2. Hypothesis (Gate G1)
Falsified iff any of:
0.5 × (baseline = 12 h)= > 6 h on the same hardware.Falsification witness (must land in code):
crates/trios-igla-race/src/bin/needle_falsify.rs— emits a syntheticseed_results.jsonlrow that violates each of (1)-(3); the predicate must reject it.3. Pre-registered analysis plan (Gate G2)
statistical_testn<3alpha0.01(race victory)effect_sizeDelta_BPB >= 0.05versusmu_0 = 1.55n_required{42, 43, 44}stop_rulemultiple_testingpredictor_holdoutembargo_windowassertions/seed_results.jsonl4. Method — the seven acceleration vectors (V1-V7)
The vectors are independent. Each is a self-contained PR; failure of any one does not block the others.
V1 — muP / muTransfer hyperparameter transfer · LANE: L-V1 · Owner: TBD-claim · Expected speedup: ~10×
What: parametrize IGLA's hidden width
d_modelper Yang & Hu 2021 (Tensor Programs V) so that optimallrandbeta_1, beta_2transfer from a 256k-param proxy to the 1-2M target without re-tuning. Confirmed at scale up to 10B params (Lingle 2024).Why fast: instead of sweeping
(lr, beta_1) in 5x5 = 25configs at full size, sweep them at width=8 proxy (~1000× cheaper) and apply optimallr*once at full width.Falsification:
coord_checkplot must showl1activation norm flat across width sweep{8, 16, 32, 64}; if it slopes up or down -> muP not implemented correctly, V1 disabled.Files:
crates/trios-training/src/mup.rs—MuParamwrapper aroundvocab_size,d_model,lr_scalingcrates/trios-training/tests/test_coord_check.rs— coord-check at widths 8/16/32/64assertions/igla_assertions.jsonINV-13 entry (currently absent on main):mup_lr_transfer_invariantAdmittedNumeric anchor:
mup_lr_scale_factor = 8 / d_model(microsoft/mup backward-compat with1/sqrt(d)atd=64).V2 — Zero-cost NAS proxy filter · LANE: L-V2 · Expected speedup: ~5×
What: before allocating ASHA rung-1 (1000 steps) compute, score each candidate config with a single forward+backward pass on a minibatch (Abdelfattah et al. 2021, "Zero-Cost Proxies"). Drop the bottom 80% before training. Best proxies for transformer-like architectures:
synflow * grad_normensemble (Spearman tau ~0.8 on NAS-Bench-201).Why fast: 1000-step ASHA rung at IGLA scale = ~5 min/config; one forward-backward = ~3 sec. 100× per-candidate cost reduction on the filtering tier.
Falsification: rank-correlation between proxy score and 3000-step ASHA rung-2 BPB on a held-out 50-config historical fold must satisfy
Spearman tau >= 0.5(else proxy is uninformative -> V2 disabled).Files:
crates/trios-igla-race/src/proxies/mod.rs—synflow.rs,gradnorm.rs,ensemble.rscrates/trios-igla-race/src/bin/proxy_score.rs— CLI wrappercrates/trios-igla-race/tests/proxy_correlation.rs— held-out validationCoq cite: new
INV-14 zero_cost_correlation_floor(Admitted) inproofs/igla/zero_cost_proxy.v.V3 — BOHB on top of existing ASHA · LANE: L-V3 · Expected speedup: ~3×
What: replace random sampling at the start of each ASHA bracket with TPE / KDE-guided sampling per BOHB (Falkner et al. 2018). This compounds with V2: V2 filters bad configs cheaply, V3 picks better ones at the start. BOHB is empirically
3-5× faster than ASHAto reach equivalent best-config quality (AutoML benchmark).Why fast: ~30% of ASHA budget today goes to redundant re-exploration; KDE prior cuts this in half within 2-3 brackets.
Falsification:
regret(t)of BOHB-augmented ASHA must drop below pure-ASHAregret(t)by step 6000 on simulated rerun of historical brackets (tools/regret_replay.rs).Files:
crates/trios-igla-race/src/bohb_sampler.rs— TPE wrapper, fits KDE on past(config, bpb)pairs fromassertions/seed_results.jsonlcrates/trios-igla-race/tests/bohb_regret.rs— replay testCoq cite: extend INV-2
asha_champion_surviveswith abohb_sampler_no_regretlemma (Admitted) until empirical proof lands.V4 — XGBoost surrogate predictor for full-run BPB · LANE: L-V4 · Expected speedup: ~5× on shortlist · Independent of V1-V3
What: train an XGBoost regressor on
(config_features, rung_3000_bpb) -> rung_27000_bpbfrom past sweeps. After ASHA rung-2, predict the rung-4 outcome and early-stop the bottom 60% of survivors that the surrogate predicts will not reachBPB < 1.50. Per White et al. 2025 surrogate NAS, XGBoost surrogates show Spearman tau ~0.7 on equivalent search spaces.Why fast: the long tail of training (rung-3 + rung-4) is 80% of total cost; cutting this for the bottom 60% of survivors is roughly
3×.Falsification: held-out 20% of historical configs (
predictor_holdout): surrogate must predict actual rung-4 BPB to withinMAE <= 0.10else V4 disabled.Files:
crates/trios-igla-race/src/surrogate.rs— XGBoost wrapper viaxgboost-rstools/predictor_train.rs— offline training scriptassertions/seed_results.jsonlbecomes the training corpusCoq cite:
INV-15 surrogate_pruning_no_regret(Admitted, becomes Proven once empirical ROC-AUC > 0.85).V5 — Phi-pruned grid: Trinity-prior on lever values · LANE: L-V5 · Expected speedup: ~2×
What: the seven Gate-final levers G1-G7 currently sweep
~3 values each = 3^7 = 2187. Apply the phi-prior (R6 zero-free-parameters): forhidden_dim, lr_scale, EMA_beta, restrict to phi-resonant values only. E.g.hidden = {64, 103, 167} = round(64 * phi^k for k in [0,1,2])instead of arbitrary{64, 128, 256}. Reduces grid to2^7 = 128, 17× compression, but only2×net because some configs were already pre-filtered.Why fast: smaller grid = fewer sweeps = direct search-space reduction.
Falsification: none of the optimal configs in past sweeps (top 10%) must be excluded by the phi-grid; if it excludes any, restore them and disable V5.
Files:
crates/trios-igla-race/src/grid.rs—phi_grid()helperassertions/igla_assertions.json—phi_prior_grid_completeness(Admitted)Coq cite:
zenodo.19227877Trinity Identity —phi^2 + phi^-2 = 3already proven; the lever-grid is a corollary.V6 — Multi-arm bandit over arch families · LANE: L-V6 · Expected speedup: ~2×
What: the IGLA race already has
arch in {jepa, attn, hybrid}with separate ASHA rungs. Currently each gets equal compute. Apply UCB1 bandit that allocates compute proportional to past success-rate over the last sliding window of brackets. The arm with the bestmean_bpb_at_rung_2gets the most exploration in the next bracket.Why fast:
attnhistorically dominates percrates/trios-train-cpu/src/objective.rs::get_rung_schedule; UCB will route 2x more compute to it within 3 brackets.Falsification: UCB regret over last 10 brackets must be lower than uniform allocation else V6 disabled.
Files:
crates/trios-igla-race/src/bandit.rs— UCB1 over arch familiescrates/trios-igla-race/tests/bandit_regret.rsCoq cite: new
INV-16 ucb_arch_no_regret(Admitted).V7 — Online curriculum on data: drop boring sequences · LANE: L-V7 · Expected speedup: ~1.5×
What: TinyShakespeare has long passages of uninformative repetitive structure. Apply online importance-weighted curriculum (drop sequences whose loss has dropped below the running median). Tracks the residual entropy budget per Shannon-floor analysis. Common practice in TinyStories-scale training.
Why fast: wall-clock per step is dominated by I/O on a CPU-only setup; dropping 30% of low-info sequences = ~30% wall-clock reduction.
Falsification: curriculum must not introduce seed-dependent bias;
var(bpb) across seedsmust not increase by > 50%.Files:
crates/trios-train-cpu/src/curriculum.rscrates/trios-train-cpu/tests/curriculum_drift.rs5. Compounded budget arithmetic
Naive baseline = 1093 GPU-h. The vectors compound imperfectly (e.g. V2 filters out configs that V1 would have rescued via lr-transfer). Conservative compounded estimate:
After efficiency-overlap discount (typical 50% of theoretical):
Result: 1093 h / 100 ~= 11 wall-clock hours of training to cover the entire phi-pruned grid once = fits within T-4d with two reruns for verification, even on single-MacBook.
Critical path (must ship by 2026-04-28): V1 + V2 + V3 + V5. Estimated total dev time: 2.25 days. Then 1.5d of training + verification = T-4d met.
6. Falsification (Gate G5)
The
needle_falsify.rsbinary must emit and re-reject:l1slope > 0.1 across widths {8, 32} — predicate must DISABLE V1 path.regret(t=6000)of BOHB > pure-ASHA — predicate must FALL BACK to ASHA.shanot in training-binary commits,step < 4000,bpb < 0.85,bpb > 2.30, orseed not in {42,43,44}— predicate must reject (per R5-CANON LOCK re-DONE protocol).7. Deliverables (Gate G3 / G4 / G6)
Each lane delivers one PR against
gHashTag/trios. Contract per PR:.rsfile per V*#[test] fn falsify_*test per failure mode in §6assertions/igla_assertions.jsonentry with honest status (Admitted unless theorem lands)proofs/igla/<name>.vskeleton with at least the type signature of the lemmaSingle-command verification:
Pinned input data SHA: TinyShakespeare at
crates/trios-train-cpu/data/tinyshake.txtSHAf7a3e1...(lock at PR time).8. Quality gates (G2 / G6 / G7)
tools/citetheorem_audit/CI gatecargo testline aboveCI must show:
cargo build --releasegreen on all PRscoq-runtime-gategreen per v1.1 attribution rule (pre-existing failureshive_automaton::test_blocked_to_ci_wait_after_fix,test_done_cycles_back_to_scan_not_haltexempted)SELECT * FROM v_igla_leaderboard ORDER BY bpb ASC LIMIT 10) returns rows withstep >= 4000only9. Forbidden (R7)
lr/betaliterals — every literal must reference a Coq theorem or Trinity-anchor constantassertions/seed_results.jsonl(ASCII / multilingual)tag (per Erratum #2)*.live.*to snapshot when render is broken (perapiary-watch v1.1+template)pythonorbashruntime inside CROWN — Rust/Zig only10. References
assertions/igla_assertions.json(main)crates/trios-train-cpu/src/objective.rs::get_rung_schedule11. Battle cry
phi^2 + phi^-2 = 3— Trinity Hive · NEEDLE-RUSH-T-4D · 2026-04-26.