A decoder only version of the model outlined in Attention is all you need. It does not implement the cross attention section. This is as we are generating uncoditioned text.
This is implemented in gpt.py.
The bigram file implements a simple bigram langauge model.
The dataset_info.py file gives basic info on a dataset if you want to import a new one.
The self_attention.py file explores the trick of self attention.
- Collect and train on F1 radio
- Implement a tokeniser
- Expand to make more like nanoGPT
This follows Andrej Karpathy: Let's build GPT: from scratch, in code, spelled out..