Skip to content

Non-record: Mac mini M4 16GB, no H100s, still golfing (val_bpb=1.5200)#1762

Open
frido22 wants to merge 1 commit intoopenai:mainfrom
frido22:codex/macmini-m4-recurrent-tail-qk-import-safe
Open

Non-record: Mac mini M4 16GB, no H100s, still golfing (val_bpb=1.5200)#1762
frido22 wants to merge 1 commit intoopenai:mainfrom
frido22:codex/macmini-m4-recurrent-tail-qk-import-safe

Conversation

@frido22
Copy link
Copy Markdown

@frido22 frido22 commented Apr 21, 2026

Summary

  • non-record Apple Silicon / MLX submission under records/track_non_record_16mb
  • supersedes my earlier Mac mini PR Non-record: Mac mini M4 16GB, no H100s, still golfing (val_bpb=1.5672) #643
  • verified best local run: 2026_04_21_run_0041
  • final exact post-quant score: val_bpb = 1.51996743
  • final exact post-quant loss: val_loss = 3.42884704
  • hardware: Mac mini M4 16GB
  • packaged artifact size: 15,749,267 bytes
  • packaged train_gpt.py size: 77,032 bytes
  • int8+zlib model size: 15,672,235 bytes

Why this is a better resubmission

This submission materially improves my previous Mac mini non-record result from 1.56720003 to 1.51996743.

It also addresses the concrete review issue on #643: the old records-folder train_gpt.py failed CPU smoke import when mlx was unavailable.

Method

This stays in the compact SP1024 9x512 KV4 Apple Silicon / MLX family, with:

  • tied embeddings plus learned logit_bias and logit_gain
  • a rank-64 previous-token bigram adapter in the output path
  • two recurrent tail blocks with learned residual gates
  • tail float budget reallocated toward recurrent-tail attention geometry
  • quant-aware endgame with periodic roundtrip blending and EMA on sensitive tail tensors
  • int8 per-row quantization with transpose-aware handling for mlp.fc.weight

Packaging hardening

Relative to the exact verified source-run script, the records-folder train_gpt.py adds only packaging-oriented changes:

  • guarded optional mlx imports so top-level import succeeds when mlx is absent
  • repository-root-relative default DATA_PATH and TOKENIZER_PATH so the script can run directly from the records folder after standard data download
  • one file-name string fix from train_gpt_mlx.py to train_gpt.py

These changes add 91 counted code bytes versus the source-run package but do not change the training/eval logic used for the reported run.

Validation

I checked the packaged submission with:

  • fresh 10-minute rerun in the source repo using the import-safe script, producing the reported 1.51996743
  • Python 3.10 py_compile on the packaged records/.../train_gpt.py
  • Python 3.10 top-level import smoke with numpy and sentencepiece installed but mlx absent, verifying that import succeeds and the script records _OPTIONAL_IMPORT_ERROR instead of crashing

Included files

  • README.md
  • submission.json
  • requirements.txt
  • train.log
  • train_gpt.py

@frido22 frido22 marked this pull request as ready for review April 21, 2026 15:03
@frido22 frido22 changed the title Non-record: Mac mini M4 16GB recurrent tail QK import-safe (val_bpb=1.51997) Non-record: Mac mini M4 16GB, no H100s, still golfing (val_bpb=1.5200) Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant