Skip to content

Non-record: SP8192 + LoRA on tied embedding (1.07994, 1 seed)#1759

Open
yijieyuan wants to merge 1 commit intoopenai:mainfrom
yijieyuan:submit/lora-embedding-nonrecord
Open

Non-record: SP8192 + LoRA on tied embedding (1.07994, 1 seed)#1759
yijieyuan wants to merge 1 commit intoopenai:mainfrom
yijieyuan:submit/lora-embedding-nonrecord

Conversation

@yijieyuan
Copy link
Copy Markdown

Single-seed (seed 42) extension of bigbag's 2026-04-09 SOTA stack (PR #1493) with two tiny additions at the GPTQ quantization stage, applied only to the tied token embedding:

  1. Rank-1 int8 LoRA residual on tok_emb — SVD of the post-rounding residual, rank-1 (A, B) stored as int8 with fp16 scales. Recovers ~14% of residual Frobenius energy at ~8 KB net cost.
  2. Hessian-weighted shrinkage in GPTQ rounding — columns with below-mean Hessian diagonal get an extended zero-zone (thresh 0.55, H-cutoff 0.5).

Seed 42 val_bpb 1.07994 vs bigbag seed 42 1.08079 (−0.00085 BPB). Below the 0.005-nat record threshold, so submitted as non-record.

See records/track_non_record_16mb/2026-04-21_LoRA_Embedding/README.md for full details (architecture, compliance, reproduction).

Single-seed extension of bigbag PR openai#1493 with two small additions
at the GPTQ quantization stage, applied only to the tied tok_emb:
rank-1 int8 LoRA residual + Hessian-weighted shrinkage in rounding.

seed 42 val_bpb 1.07994 vs bigbag seed 42 1.08079 (−0.00085 BPB).
Below 0.005-nat record threshold, submitted as non-record.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant