Flower Brain v3: SmearGate + LoRA-TTT + GPTQ — val_bpb 1.0680 (unlimited compute, 2hr 1xH100)#1896
Open
G3sparky wants to merge 4 commits intoopenai:mainfrom
Open
Flower Brain v3: SmearGate + LoRA-TTT + GPTQ — val_bpb 1.0680 (unlimited compute, 2hr 1xH100)#1896G3sparky wants to merge 4 commits intoopenai:mainfrom
G3sparky wants to merge 4 commits intoopenai:mainfrom
Conversation
…limited compute) Novel ternary architecture: 6 specialized cells in Flower of Life topology. 32.5M params, 512-dim, 12 layers, depth recurrence (17 virtual layers). Pre-quant 1.1155 on 1xH100 SXM (~60 min). Post-quant 1.80 (ternary gap 0.68). Compresses to 10.4MB (35% under 16MB budget). Experimental findings: - Void fraction (16-17%) is architecture-determined, not training-dependent - STE makes quantization gap WORSE (0.68 vs 0.29 without STE) - Weight decay regularizes for quantization (WD=0 → catastrophic 2.67 gap) - Gap B is a projection problem, not a training problem Category: Unlimited compute (NOT eligible for 10-min 8xH100 record track) Peer reviewed: Flynn (numbers), Tron (audit + legality), Lauren (sign-off) Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a new non-record 16MB record entry documenting the “Flower Brain: 6-Cell Ternary Architecture” experiment, including the training/eval script, compression approach, and run artifacts.
Changes:
- Adds a full training + evaluation + mixed (ternary+GPTQ) serialization pipeline (
train_gpt_ternary.py). - Adds record metadata (
submission.json), experiment writeup (README.md), and a captured run log (run1_h100.log). - Adds exploratory/reference implementations for the “cell” architecture and learned ternary compression (
cells.py,compression_cell.py).
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| records/track_non_record_16mb/2026-04-29_FlowerBrain_TernaryArchitecture/train_gpt_ternary.py | Training/eval script with mixed ternary+GPTQ compression and artifact serialization. |
| records/track_non_record_16mb/2026-04-29_FlowerBrain_TernaryArchitecture/submission.json | Submission metadata for the record entry. |
| records/track_non_record_16mb/2026-04-29_FlowerBrain_TernaryArchitecture/run1_h100.log | Captured training + quantization run output supporting the reported numbers. |
| records/track_non_record_16mb/2026-04-29_FlowerBrain_TernaryArchitecture/compression_cell.py | Reference “compression cell” implementation and ternary packing utilities. |
| records/track_non_record_16mb/2026-04-29_FlowerBrain_TernaryArchitecture/cells.py | Reference 6-cell “Flower Brain PG” model scaffold. |
| records/track_non_record_16mb/2026-04-29_FlowerBrain_TernaryArchitecture/README.md | Experiment summary, results, and reproduction instructions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- LZMA → brotli in README + submission.json (matches actual compressor) - Remove ETLB dead code and hyperparameters - Add track field to submission.json - Fix ~50 min → ~60 min timing - Remove unused struct import Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Author
|
All 10 Copilot review comments addressed in commit 3846338:
Items 8-10 are cosmetic/performance and do not affect correctness or reproducibility. |
Unlimited compute: 2hr on 1xH100 SXM. Pre-quant 1.0805, GPTQ 1.0933, sliding 1.0766, LoRA-TTT 1.0680. Beats Ifrim (1.1239) by 0.056. v3 improvements: SmearGate, QK-Gain 5.25, LoRA-TTT (rank-128 3-phase), TTT_CHUNK=16384, GPTQ int6/int8 (from ternary), LZMA bootstrap, EMA checkpoint save, torch.compile fix for LoRA-TTT. Peer reviewed: Flynn PASS, Tron PASS, Lauren PASS. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
8xH100 SXM, 10-min wallclock, GPTQ int6/int8, under 16MB. SmearGate + QK-Gain 5.25 + score-first TTT + depth recurrence. Seeds 42/314/999: 1.0876/1.0877/1.0877. Novel architecture answering the organizer wish list. OUR design. G3sparky. Peer reviewed: Flynn PASS, Tron PASS, Lauren PASS. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
val_bpb = 1.0680 (unlimited compute, 2hr 1xH100 + LoRA-TTT). Artifact under 16MB.
We also ran this stack at the 10-min main board track and got 1.0877 (3-seed mean, std 0.0001). Eval came in at 658-684s per seed, overshooting the 600s eval cap by 60-84 seconds. Submitting the main-board attempt for organizer review with the cap overshoot disclosed; if it's enforced, treat that entry as non-record. The unlimited compute 1.0680 above stands either way.
Main board attempt (10-min train, 3-seed, eval overshoots 600s cap)
What's new here
Base
Flower Brain 6-cell architecture + SP8192 + depth recurrence + parallel residuals.
Credit to @clarkkev #1394, @dexhunter #1331/#1437, @abaybektursun #549, @Robby955 #1412, @msisovic #1204.
Compliance
C1 causal, C2 standard softmax over full vocab, C3 score-before-update, C4 single pass. All seeds under 16MB artifact. Train under 600s on the main-board attempt. Eval ran 658-684s on the main-board attempt — over the 600s cap, disclosed above. Unlimited compute path has no time cap.
Experimental finding
STE makes the ternary quantization gap worse (0.68 vs 0.29 without STE). GPTQ int6 gap: 0.011. Documented in submission.json.