This project contains a PyTorch-based LSTM model for real-time sign language detection using MediaPipe keypoints.
improved_model.py
- Improved LSTM model definition and training script with data augmentationload_model.py
- Model loading and real-time predictionimproved_sign_language_model.pth
- Trained model weights (12.7MB)requirements_pytorch.txt
- Python dependencies
main.py
- Data collection script for capturing sign language gesturesutils.py
- Utility functions for MediaPipe detection and keypoint extraction
MP_Data/
- Training data directory containing keypoint sequences for:hello/
- Hello gesture data (30 sequences)thanks/
- Thanks gesture data (30 sequences)iloveyou/
- I love you gesture data (30 sequences)
env/
- Virtual environment with all dependencies
# Create and activate virtual environment
python -m venv env
source env/bin/activate # On Windows: env\Scripts\activate
# Install dependencies
pip install -r requirements_pytorch.txt
python improved_model.py
python load_model.py
The improved model uses a robust LSTM-based architecture with data augmentation:
- Input: 30 frames × 1662 keypoints (pose + face + hands)
- LSTM Layers: 2-layer LSTM with 128 hidden units and dropout (0.2)
- Fully Connected: 128 → 64 → 3 units with ReLU activation
- Regularization: Dropout (0.3) at multiple layers
- Output: 3 classes (hello, thanks, iloveyou) with softmax
- Data Augmentation: Noise addition and time shifting
- Class Balancing: 3x augmentation for "hello" class
- Early Stopping: Prevents overfitting
- Learning Rate Scheduling: Adaptive learning rate
- Validation: 20% test split with stratification
- Training Accuracy: 100%
- Validation Accuracy: 100%
- Test Accuracy: 100% on all classes
- Model Size: 12.7MB
- Hello - Wave gesture
- Thanks - Thank you gesture
- I Love You - ILY sign gesture
- Run
python main.py
to collect training data - Follow the on-screen instructions to record gestures
- Each gesture requires 30 sequences of 30 frames each
- Run
python improved_model.py
to train the model - Training includes data augmentation and validation
- Best model is automatically saved as
improved_sign_language_model.pth
- Run
python load_model.py
for webcam prediction - Make sign language gestures in front of the camera
- Ensure good lighting and clear hand visibility
- Press 'q' to quit
from load_model import load_model, predict_sign
# Load the trained model
model, actions = load_model()
# Make predictions
predicted_sign, confidence, probabilities = predict_sign(model, actions, sequence_data)
- Python 3.8+
- PyTorch 2.0+
- MediaPipe
- OpenCV
- NumPy
- Scikit-learn
- TensorBoard (for training logs)
See requirements_pytorch.txt
for exact versions.
- Ensure good lighting for MediaPipe detection
- Position hands clearly in front of the camera
- Use exact same gestures as training data
- Check hand visibility - hands must be fully visible
- The model achieves 100% accuracy on training data
- Real-time performance depends on webcam conditions
- Lower confidence threshold if needed for real-time use
sign_language_detection/
├── improved_model.py # Training script
├── improved_sign_language_model.pth # Model file
├── load_model.py # Model loading & prediction
├── main.py # Data collection
├── utils.py # Utilities
├── requirements_pytorch.txt # Dependencies
├── README.md # This file
└── MP_Data/ # Training data
├── hello/ # Hello gestures
├── thanks/ # Thanks gestures
└── iloveyou/ # I love you gestures