Skip to content

Flower Brain v3: SmearGate + LoRA-TTT + GPTQ — val_bpb 1.0680 (unlimited compute, 2hr 1xH100)#1896

Open
G3sparky wants to merge 4 commits intoopenai:mainfrom
G3sparky:flower-brain-unlimited
Open

Flower Brain v3: SmearGate + LoRA-TTT + GPTQ — val_bpb 1.0680 (unlimited compute, 2hr 1xH100)#1896
G3sparky wants to merge 4 commits intoopenai:mainfrom
G3sparky:flower-brain-unlimited

Conversation

@G3sparky
Copy link
Copy Markdown

@G3sparky G3sparky commented Apr 28, 2026

val_bpb = 1.0680 (unlimited compute, 2hr 1xH100 + LoRA-TTT). Artifact under 16MB.

We also ran this stack at the 10-min main board track and got 1.0877 (3-seed mean, std 0.0001). Eval came in at 658-684s per seed, overshooting the 600s eval cap by 60-84 seconds. Submitting the main-board attempt for organizer review with the cap overshoot disclosed; if it's enforced, treat that entry as non-record. The unlimited compute 1.0680 above stands either way.

Main board attempt (10-min train, 3-seed, eval overshoots 600s cap)

Seed TTT BPB Artifact
42 1.0876 15,994,050 bytes
314 1.0877 15,993,082 bytes
999 1.0877 15,994,440 bytes
Mean 1.0877 std 0.0001

What's new here

  • SmearGate: learned per-dimension blending with previous token embedding
  • QK-Gain 5.25
  • Score-first TTT: 3-epoch SGD per chunk, C3-compliant
  • LoRA-TTT (unlimited compute path)
  • GPTQ int6/int8 + Brotli-11 compression
  • SDPA attention (PyTorch native)
  • Depth recurrence: layers 3-5 loop x2

Base

Flower Brain 6-cell architecture + SP8192 + depth recurrence + parallel residuals.

Credit to @clarkkev #1394, @dexhunter #1331/#1437, @abaybektursun #549, @Robby955 #1412, @msisovic #1204.

Compliance

C1 causal, C2 standard softmax over full vocab, C3 score-before-update, C4 single pass. All seeds under 16MB artifact. Train under 600s on the main-board attempt. Eval ran 658-684s on the main-board attempt — over the 600s cap, disclosed above. Unlimited compute path has no time cap.

Experimental finding

STE makes the ternary quantization gap worse (0.68 vs 0.29 without STE). GPTQ int6 gap: 0.011. Documented in submission.json.

…limited compute)

Novel ternary architecture: 6 specialized cells in Flower of Life topology.
32.5M params, 512-dim, 12 layers, depth recurrence (17 virtual layers).
Pre-quant 1.1155 on 1xH100 SXM (~60 min). Post-quant 1.80 (ternary gap 0.68).
Compresses to 10.4MB (35% under 16MB budget).

Experimental findings:
- Void fraction (16-17%) is architecture-determined, not training-dependent
- STE makes quantization gap WORSE (0.68 vs 0.29 without STE)
- Weight decay regularizes for quantization (WD=0 → catastrophic 2.67 gap)
- Gap B is a projection problem, not a training problem

Category: Unlimited compute (NOT eligible for 10-min 8xH100 record track)
Peer reviewed: Flynn (numbers), Tron (audit + legality), Lauren (sign-off)

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copilot AI review requested due to automatic review settings April 28, 2026 16:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new non-record 16MB record entry documenting the “Flower Brain: 6-Cell Ternary Architecture” experiment, including the training/eval script, compression approach, and run artifacts.

Changes:

  • Adds a full training + evaluation + mixed (ternary+GPTQ) serialization pipeline (train_gpt_ternary.py).
  • Adds record metadata (submission.json), experiment writeup (README.md), and a captured run log (run1_h100.log).
  • Adds exploratory/reference implementations for the “cell” architecture and learned ternary compression (cells.py, compression_cell.py).

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
records/track_non_record_16mb/2026-04-29_FlowerBrain_TernaryArchitecture/train_gpt_ternary.py Training/eval script with mixed ternary+GPTQ compression and artifact serialization.
records/track_non_record_16mb/2026-04-29_FlowerBrain_TernaryArchitecture/submission.json Submission metadata for the record entry.
records/track_non_record_16mb/2026-04-29_FlowerBrain_TernaryArchitecture/run1_h100.log Captured training + quantization run output supporting the reported numbers.
records/track_non_record_16mb/2026-04-29_FlowerBrain_TernaryArchitecture/compression_cell.py Reference “compression cell” implementation and ternary packing utilities.
records/track_non_record_16mb/2026-04-29_FlowerBrain_TernaryArchitecture/cells.py Reference 6-cell “Flower Brain PG” model scaffold.
records/track_non_record_16mb/2026-04-29_FlowerBrain_TernaryArchitecture/README.md Experiment summary, results, and reproduction instructions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- LZMA → brotli in README + submission.json (matches actual compressor)
- Remove ETLB dead code and hyperparameters
- Add track field to submission.json
- Fix ~50 min → ~60 min timing
- Remove unused struct import

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@G3sparky
Copy link
Copy Markdown
Author

All 10 Copilot review comments addressed in commit 3846338:

  1. ✅ LZMA → brotli in README compression section
  2. ✅ LZMA → brotli in README key design choices
  3. ✅ LZMA → brotli in submission.json compression field
  4. ✅ Removed ETLB dead code + hyperparameters from train_gpt_ternary.py
  5. ✅ Added track: non_record_16mb to submission.json
  6. ✅ Removed unused import struct from compression_cell.py
  7. ✅ Fixed ~50 min → ~60 min in category_note and blurb
  8. Error message typos (TRAIN_SEQ_LEN, VAL_BATCH_SIZE) — noted, cosmetic only
  9. Causal mask allocation per forward pass — noted, performance optimization for future
  10. Unused RoPE params in cells.py — noted, cells.py is reference implementation

Items 8-10 are cosmetic/performance and do not affect correctness or reproducibility.

G3sparky and others added 2 commits April 30, 2026 06:07
Unlimited compute: 2hr on 1xH100 SXM. Pre-quant 1.0805, GPTQ 1.0933,
sliding 1.0766, LoRA-TTT 1.0680. Beats Ifrim (1.1239) by 0.056.

v3 improvements: SmearGate, QK-Gain 5.25, LoRA-TTT (rank-128 3-phase),
TTT_CHUNK=16384, GPTQ int6/int8 (from ternary), LZMA bootstrap,
EMA checkpoint save, torch.compile fix for LoRA-TTT.

Peer reviewed: Flynn PASS, Tron PASS, Lauren PASS.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
8xH100 SXM, 10-min wallclock, GPTQ int6/int8, under 16MB.
SmearGate + QK-Gain 5.25 + score-first TTT + depth recurrence.
Seeds 42/314/999: 1.0876/1.0877/1.0877.

Novel architecture answering the organizer wish list.
OUR design. G3sparky.

Peer reviewed: Flynn PASS, Tron PASS, Lauren PASS.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@G3sparky G3sparky changed the title Record: Flower Brain 6-Cell Ternary Architecture — val_bpb 1.1155 (unlimited compute) Flower Brain v3: SmearGate + GPTQ — val_bpb 1.0877 (3-seed mean) + 1.0680 unlimited compute Apr 30, 2026
@G3sparky G3sparky changed the title Flower Brain v3: SmearGate + GPTQ — val_bpb 1.0877 (3-seed mean) + 1.0680 unlimited compute Flower Brain v3: SmearGate + LoRA-TTT + GPTQ — val_bpb 1.0680 (unlimited compute, 2hr 1xH100) Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants