Transformer-based Language Model from Scratch

This repository is devoted to the construction of a Transformer-based language model from the ground up. Aimed at unraveling the complexities of the Transformer architecture, this project serves as an educational toolkit for understanding and implementing the mechanisms underpinning state-of-the-art natural language processing (NLP) technologies, particularly focusing on text generation.

Project Overview

Transformers have led to significant advancements in NLP. This project breaks down the architecture into its fundamental components, providing a hands-on approach to learning about self-attention mechanisms, positional encoding, and more. It's designed as a learning resource for enthusiasts eager to dissect and comprehend the mechanics of one of AI's most influential models.

Features

Detailed Implementation of Transformer Components: Includes step-by-step coding of crucial Transformer model elements like self-attention and multi-head attention mechanisms.
Text Generation: Allows for training on a text corpus and generating novel text, showcasing the model's capabilities.
Modular and Extensible Code: The project is organized into separate modules for clarity and ease of understanding, facilitating further experimentation and learning.

Getting Started

Installation

To set up the project environment, clone this repository and install the necessary dependencies:

git clone https://github.com/OrbotOp/Transformer-based-Language-Model-from-Scratch.git
cd Transformer-based-Language-Model-from-Scratch

Prerequisites

Before you begin, ensure you have installed:

Python 3.8+
PyTorch 1.8+
NumPy

Installing Dependencies

Navigate to the project directory and install the required Python packages:

pip install -r requirements.txt

Usage

To Train the Model:

Run the train_save_model.py script to train the model using the WarrenBuffet.txt file as training data. This script is also responsible for generating text post-training.

python train_save_model.py

Project Structure

transformer_blocks.py - Implements the core components of the Transformer model, such as the self-attention mechanism.
language_model.py - Defines the overall Transformer-based language model, integrating the components.
train_save_model.py - Handles model training and text generation, utilizing the WarrenBuffet.txt dataset included in the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer-based Language Model from Scratch

Project Overview

Features

Getting Started

Installation

Prerequisites

Installing Dependencies

Usage

To Train the Model:

Project Structure

Results

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Images		Images
README.md		README.md
WarrenBuffet.txt		WarrenBuffet.txt
language_model.py		language_model.py
requirements.txt		requirements.txt
train_save_model.py		train_save_model.py
transformer_blocks.py		transformer_blocks.py

HKanoje/Transformer-based-Language-Model-from-Scratch

Folders and files

Latest commit

History

Repository files navigation

Transformer-based Language Model from Scratch

Project Overview

Features

Getting Started

Installation

Prerequisites

Installing Dependencies

Usage

To Train the Model:

Project Structure

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages