Add SP8192 + ParResid + DR + LoRA TTT + Mixed int4/int6/int8 + AWQ su…#1919
Conversation
…bmission Adds an unverified submission targeting val_bpb=1.0587 (3-seed mean) on the 10-min 8xH100 track, mirrored into the non-record track. Both folders contain the README, submission.json, and single-file train_gpt.py entry point. Status: NOT YET VERIFIED ON H100 — per-seed train logs and runtime compliance flags are pending the 8xH100 reproduction run.
|
Hi @dev-pratap-singh — thanks for sharing this submission. Quick technical note for the community thread, since I noticed there are no other comments yet: The AWQ activation-aware scale calibration in
The AWQ rescaling factors This is the same class as PR #1350 / PR #1351 pre-quant calibration on val data (flagged in earlier reviews), which Issue #677 (illegal-submissions megathread) covers. A clean fix: calibrate AWQ scales on a held-out slice of train shards (e.g., the last N tokens of The rest of the mechanic looks quite clean — parallel residuals, depth recurrence, and LoRA score-first TTT all appear well-formed. Happy to discuss if I've misread the calibration path. |
…bmission
Adds an unverified submission targeting val_bpb=1.0587 (3-seed mean) on the 10-min 8xH100 track, mirrored into the non-record track. Both folders contain the README, submission.json, and single-file train_gpt.py entry point.
Status: NOT YET VERIFIED ON H100 — per-seed train logs and runtime compliance flags are pending the 8xH100 reproduction run.