In this repo, we'll be working through an example of how we can create, and then train, the original Transformer from the Attention is All You Need paper.
⚙️The colab link to the code is found and (will also be included in this repo) here.
🫂The video walkthrough can be found here.
Build the major components of an encoder/decoder style transformer network from scratch using PyTorch.
Train our new network on a toy dataset to showcase how the training loop works and how we pass data through our network.