This project predicts customer churn in a telecom dataset using a Random Forest classifier. The pipeline includes data cleaning, feature encoding, model training, and evaluation with key classification metrics.
- Contains customer demographic information, service subscriptions, billing details, and churn status.
- Target variable:
Churn(Yes/No).
- Dropped
customerIDas it is a unique identifier. - Converted
TotalChargescolumn to numeric, handling missing or invalid values. - Dropped rows with missing data after conversion.
- One-hot encoded categorical features to convert them into numerical format.
- Split dataset into train and test sets with stratification to preserve class distribution.
- Trained a Random Forest classifier with 100 trees and a fixed random seed.
- Accuracy
- Precision
- Recall
- F1 Score
- Confusion Matrix (visualized)
- Install dependencies:
pip install pandas scikit-learn matplotlib
- Run the script:
customer_churn_predictions.py
- The script will output evaluation metrics and display the confusion matrix.