Skip to content

Support simple GPT2 and finetune away LN std#1

Merged
danbraunai merged 6 commits intodevfrom
gf-feature/no-ln
Sep 3, 2025
Merged

Support simple GPT2 and finetune away LN std#1
danbraunai merged 6 commits intodevfrom
gf-feature/no-ln

Conversation

@danbraunai
Copy link

Description

  • Adds gpt2_simple which has explicit k_proj, v_proj, q_proj weights rather than all of them concatenated. This makes it simpler for doing SPD analysis.
  • Allows for finetuning, optionally with ablating the online std calculation of layernorms (using the technique described in https://arxiv.org/abs/2507.02559
  • Allows for from_pretrained and from_run_info to take in wandb paths.

How Has This Been Tested?

Test that the custom layernorm implementation we needed for gpt2_simple matches nn.LayerNorm

Does this PR introduce a breaking change?

No

@danbraunai danbraunai changed the title Support simple GPT2 and finetune away LN Support simple GPT2 and finetune away LN std Sep 3, 2025
@danbraunai danbraunai merged commit cad9607 into dev Sep 3, 2025
1 check passed
@danbraunai danbraunai deleted the gf-feature/no-ln branch October 31, 2025 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants