A simple Naive Bayes classifier that predicts gender from the last two letters of a name. This project trains a Multinomial Naive Bayes model using character bigrams and provides a command-line interface for predictions.
This project demonstrates a lightweight approach to gender classification based on name suffixes. It uses scikit-learn’s CountVectorizer with character bigrams and a MultinomialNB classifier. The implementation is kept intentionally minimal for educational purposes.
The model expects a CSV file named genders.csv in the project root with at least the following columns:
name— the person’s namegender— the target label (e.g.,male,female)
Example:
name,gender Alex,male Maria,female
- Feature: last two letters of each name
- Vectorization: character bigrams (
ngram_range=(2,2)) - Classifier: Multinomial Naive Bayes
- Clone the repository
- Install dependencies:
pip install pandas numpy scikit-learnEnsure genders.csv is in the same directory as the script, then run:
python Gender_clasifiction_Naive_Bayes_Classifier.pyYou will be prompted to enter a name. Type exit to quit.
- This is a simple baseline model and may not perform well on diverse or international names.
- For better results, consider richer features (full name, language-specific suffixes) and a larger dataset.
No license specified. Add a LICENSE file if you plan to share or reuse this project publicly.
Created by mAhsanZafar.