[Submission] Random LinearMaps + LoRA Adapters by austinluk · Pull Request #1295 · openai/parameter-golf

austinluk · 2026-04-03T10:11:14Z

Submission

himanshudongre · 2026-04-03T14:28:03Z

Great to see someone else exploring this direction! I've been working on the same wishlist item and just submitted my findings in PR #1301.

TL;DR: Your "Potential Improvements" section nails it — selective freezing is the key.

I tested both full freeze + adapters (your approach) and selective freeze (freeze only MLP gate+up, learn attention fully) on FineWeb data. The results are dramatic:

Approach	Frozen%	Best CE (FineWeb)	vs Baseline
Full freeze + VeRA rank=8	94%	2.3388	+80% gap
Full freeze + VeRA rank=16	94%	2.3288	+79% gap
Full freeze + VeRA rank=32	94%	2.3221	+79% gap
Selective freeze (gate+up only)	37%	1.2792	-1.5% BETTER than baseline

Increasing adapter rank from 8→32 barely helps — the bottleneck is frozen attention weights that can't learn relational patterns, not adapter capacity.

The fix: freeze only the MLP gate and up projections (feature expansion — where Johnson-Lindenstrauss applies naturally), learn everything else. This preserves the model's ability to learn attention patterns while getting artifact savings from frozen random projections.

On the artifact-normalized comparison (the real competition question), a larger frozen model beats a smaller fully-trained model at the same artifact budget:

Config	CE (FineWeb)	Artifact
6L 192d fully-trained + dropout	3.2531	2.4MB
12L 384d selective freeze + dropout	2.8803	7.3MB

The frozen model has 4× more effective params at 3× the artifact cost — and it wins by 11.5%.

Full details + code in PR #1301. Would be interesting to see if your 12L 768d backbone with selective freeze (learn attention, freeze only MLP gate+up) closes the gap further.

himanshudongre · 2026-04-04T12:15:02Z

Related work: I've been running extensive experiments on selective freeze (freezing gate+up projections only, 37% frozen) as an alternative to your full freeze + LoRA approach.

Key finding: selective freeze (37% frozen) dramatically outperforms full freeze + LoRA (94% frozen) — the LoRA approach has an ~80% quality gap while selective freeze shows -2.1% improvement over baseline on H100.

I also developed "progressive freeze" — train all weights fully for N steps, then freeze mid-training. This outperforms random-init freeze by 1.3 percentage points on FineWeb sp4096.

Full results with 7 architecture variants across H100 and A40: PR #1301.

MatoTeziTanka · 2026-04-12T05:43:13Z

Community Review — [Submission] Random LinearMaps + LoRA Adapters

Compliance: LOOKS CLEAN — pure-neural submission, no TTT/SLOT/n-gram-cache

Summary PR #1295 ("Random Linear Maps + LoRA Adapters") submits a pure neural approach with no illegal enhancements. Head SHA: `77cec21`. ## Checks N-gram family bug (ILLEGAL): No n-gram, bigram, BigramHash, or XOR hash logic anywhere in the file. Clean. Pre-Quant TTT — multi-epoch on val_tokens (ILLEGAL): `val_tokens` is loaded once at line 711 and used only for standard inference-mode evaluation in `eval_val()` (lines 207-246). `eval_val` is called twice: once per validation interval during training (line 848) and once post-quantization for a roundtrip check (line 944). Neither call trains on val_tokens; both run under `torch.inference_mode()` with `model.eval()`. No gradient updates touch val_tokens at any point. Clean. Score-first TTT (LEGAL): Not present. No TTT pattern at all. Scored-region SLOT: Not present. The architecture is a standard transformer with frozen random base weights and trainable LoRA adapters. Architecture description: 12-layer transformer (768 dim, 12 heads, 4 KV heads) where all linear layers use frozen deterministically-seeded random weights plus trainable LoRA rank-16 adapters. Only adapters, embeddings, norms, and scalars are trained and serialized. Frozen backbone is regenerated from seed at load time (0 bytes in artifact). Optimizer: Muon for LoRA matrices, Adam for embeddings and scalars. Quantization: INT8 per-row quantization of trainable-only state dict, then zlib compression. Roundtrip validation confirms quantized model produces same BPB before final log. Conclusion: This is a clean pure-neural submission. No illegal val_tokens training, no n-gram cheating, no scored-region slot holding. The "random backbone regenerated from seed" trick is legitimate architectural cleverness (seed stored in code, not...

Verdict: LOOKS CLEAN.

Recommendation to @cocohearts @valerio-oai @0hq @yuzhougu-oai @notapplica: MERGE pending the usual record-track checks (3-seed validation, under-16MB artifact cap, ≤600s train + ≤600s eval on 8×H100 SXM). No compliance flags from the audit — this looks like a clean pure-neural submission.

Reviewed by @MatoTeziTanka — The Agora. Compliance audit via LLM agent (Sonnet) reviewing full train_gpt.py source, cross-checked against deterministic AST classifier. If this review misread your code, please call it out so I can re-audit manually.

austinluk added 3 commits April 3, 2026 00:35

changed readme.

7ba5111

Add Random Linear Maps + LoRA Adapters submission

c4a33ee

Add Random Linear Maps + LoRA Adapters submission + fixed README

77cec21

himanshudongre mentioned this pull request Apr 3, 2026

Non-Record: Learning Adapters on Random Linear Maps — Selective Freeze, Progressive Freeze, and 7 Architecture Variants (H100 + A40 Validated) #1301

Open

austinluk closed this Apr 19, 2026

austinluk reopened this Apr 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Submission] Random LinearMaps + LoRA Adapters#1295

[Submission] Random LinearMaps + LoRA Adapters#1295
austinluk wants to merge 3 commits intoopenai:mainfrom
austinluk:submission/random-linear-maps-lora

austinluk commented Apr 3, 2026 •

edited

Loading

Uh oh!

himanshudongre commented Apr 3, 2026

Uh oh!

himanshudongre commented Apr 4, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

austinluk commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

himanshudongre commented Apr 3, 2026

Uh oh!

himanshudongre commented Apr 4, 2026

Uh oh!

MatoTeziTanka commented Apr 12, 2026

Community Review — [Submission] Random LinearMaps + LoRA Adapters

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

austinluk commented Apr 3, 2026 •

edited

Loading