The repository contains the code accompanying the paper You can remove GPT2's LayerNorm by fine-tuning. See our HuggingFace repository for models with removed LayerNorm. The code is based on karpathy/nanoGPT, with just some small changes to the model and training script.
Note (2025): This code contains two bugs which made LN removal more unstable than it had to be. Both are fixed in this follow-up project.