Skip to content

This is my 2nd Capstone Project. Enhancing my skills in ML Algorithm and gaining knowledge of the models and the proper steps from data preprocessing to model training and maintaing their accuracy more higher and accurate.

Notifications You must be signed in to change notification settings

anshuman9468/Churn-Prediction-Model-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

15 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿ“Š Telecom Customer Churn Prediction โ€“ End-to-End ML System

A full-stack machine learning system for predicting customer churn in the telecommunications industry. This project uses an XGBoost classifier, a FastAPI backend, and a Streamlit frontend to deliver real-time churn prediction, customer risk analysis, and batch scoring capabilities.

๐Ÿš€ Project Overview

Customer churn is one of the biggest revenue drains in the telecom industry. Early identification of at-risk customers enables companies to apply targeted retention strategies, significantly increasing customer lifetime value.

This project focuses on maximizing recall (79.36%) to ensure the model captures as many potential churners as possible โ€” aligning directly with business impact.

๐Ÿ—๏ธ System Architecture โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Streamlit UI โ”‚ โ”‚ โ€ข Real-time predictions โ”‚ โ”‚ โ€ข Batch scoring โ”‚ โ”‚ โ€ข Customer risk analytics โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ HTTP/REST โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ FastAPI Backend โ”‚ โ”‚ โ€ข Prediction endpoint โ”‚ โ”‚ โ€ข Preprocessing pipeline โ”‚ โ”‚ โ€ข Model + threshold loading โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ Load Model โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ XGBoost ML Model โ”‚ โ”‚ โ€ข churn_xgb.pkl โ”‚ โ”‚ โ€ข Recall = 79.36% (Primary metric) โ”‚ โ”‚ โ€ข F1 Score = 0.642 โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โœจ Key Features โœ” High-Recall ML Model (79.36%)

Captures maximum churners โ†’ critical for retention strategy.

โœ” FastAPI-Powered REST API

Production-ready inference endpoint deployed on Render.

โœ” Interactive Streamlit Frontend

Simple and intuitive interface for non-technical business users.

โœ” Real-Time + Batch Predictions

Predict churn for individual customers or entire datasets.

โœ” Fully Deployed

Backend: Render

Frontend: Streamlit Cloud

๐Ÿ“‚ Dataset

Source: Telco Customer Churn Dataset (Kaggle) File: WA_Fn-UseC_-Telco-Customer-Churn.csv Shape: 7,043 rows ร— 21 features

Target Variable

Churn โ†’ Yes/No

Imbalance: 73.5% non-churn / 26.5% churn

Feature Categories Demographics

Gender

SeniorCitizen

Partner

Dependents

Services

PhoneService

MultipleLines

InternetService (DSL / Fiber Optic / No)

OnlineSecurity

OnlineBackup

DeviceProtection

TechSupport

StreamingTV

StreamingMovies

Account Information

Tenure

Contract

PaperlessBilling

PaymentMethod

MonthlyCharges

TotalCharges

๐Ÿ”ง Data Preprocessing โœ” Cleaning

Dropped customerID

Converted TotalCharges to numeric

Filled missing values using median

Removed OnlineBackup due to quality issues

โœ” Encoding

Label Encoding for binary fields One-Hot Encoding for multi-class fields

Final Feature Count: 22 engineered features ๐Ÿค– Model Development Train/Test Split

Train: 80%

Test: 20%

random_state = 42

๐ŸŸฉ Primary Model โ€” XGBoost Classifier Hyperparameters n_estimators=200 learning_rate=0.05 max_depth=5 subsample=0.8 colsample_bytree=0.8 scale_pos_weight=2.7

Performance Metric Score Accuracy 76.58% F1 Score 0.642 โญ Recall 79.36% Why Selected?

Highest recall โ†’ captures 24% more churners than Logistic Regression

Balanced F1 score

Handles class imbalance effectively

๐ŸŸจ Secondary Model โ€” Logistic Regression Metric Score Accuracy 81.83% Recall 55.50% F1 Score 0.618

Good for interpretability, but not suitable for high-recall objectives.

๐ŸŽฏ Business Metric Prioritization

Recall โ†’ Most important (donโ€™t miss churners)

F1 Score โ†’ Balanced evaluation

Accuracy โ†’ Least important (misleading on imbalanced datasets)

Example:

A stupid model predicting โ€œNo Churnโ€ for everyone gets 73.5% accuracy โ†’ completely useless.

โš™๏ธ Threshold Optimization

The default 0.5 threshold is not suitable for imbalanced churn data.

After optimization: Optimal threshold = 0.01

This aggressively maximizes recall for business impact.

๐Ÿ“ฆ Saved Artifacts churn_xgb.pkl # Trained XGBoost model threshold.pkl # Optimized threshold = 0.01

๐ŸŒ FastAPI Backend

Example FastAPI code snippet:

from fastapi import FastAPI import joblib import numpy as np

app = FastAPI()

model = joblib.load("models/churn_xgb.pkl") threshold = joblib.load("models/threshold.pkl")

@app.post("/predict") def predict(data: dict): features = preprocess(data) prob = model.predict_proba([features])[0][1] pred = int(prob >= threshold)

return {
    "churn_probability": prob,
    "prediction": pred,
    "risk_level": "High" if pred == 1 else "Low"
}

Deployed at:

https://churn-prediction-2qrp.onrender.com/

๐Ÿ–ฅ๏ธ Streamlit Frontend

Real-time prediction

Batch CSV upload

Visual analytics

Deployed at:

https://churn-frontend-g3ku8j45b7mfsg6s4ztjfy.streamlit.app/

๐Ÿ› ๏ธ Installation & Setup Clone repository git clone https://github.com/yourusername/telecom-churn-prediction.git cd telecom-churn-prediction

Backend Setup cd backend pip install -r requirements.txt uvicorn app:app --reload

Frontend Setup cd churn-frontend pip install -r requirements.txt streamlit run app.py

๐Ÿงช API Usage Example curl -X POST "https://churn-prediction-2qrp.onrender.com/predict"
-H "Content-Type: application/json"
-d '{ "gender": "Female", "SeniorCitizen": 0, "Partner": "Yes", "tenure": 12, "MonthlyCharges": 70 }'

๐Ÿ“ˆ Business Impact

Assumptions:

Avg revenue/user = $64/month

Lifetime value โ‰ˆ $1500

Retention offer cost: $75

Retention success: 40%

With XGBoost (79.36% recall)

Correctly identifies 1,483 churners

Potential revenue saved: $890,000

Campaign cost: $111,225

Net Benefit = $778,775 saved

With Logistic Regression (55.50% recall)

Saves only $623,000

๐Ÿ‘‰ XGBoost prevents $267,000 additional revenue loss.

๐Ÿ“Œ Repository Structure telecom-churn-prediction/ โ”‚ โ”œโ”€โ”€ backend/ โ”‚ โ”œโ”€โ”€ app.py โ”‚ โ”œโ”€โ”€ preprocessing.py โ”‚ โ”œโ”€โ”€ models/ โ”‚ โ”‚ โ”œโ”€โ”€ churn_xgb.pkl โ”‚ โ”‚ โ””โ”€โ”€ threshold.pkl โ”‚ โ”œโ”€โ”€ requirements.txt โ”‚ โ””โ”€โ”€ render.yaml โ”‚ โ”œโ”€โ”€ churn-frontend/ โ”‚ โ”œโ”€โ”€ app.py โ”‚ โ”œโ”€โ”€ pages/ โ”‚ โ”‚ โ”œโ”€โ”€ 01_prediction.py โ”‚ โ”‚ โ”œโ”€โ”€ 02_batch.py โ”‚ โ”‚ โ””โ”€โ”€ 03_analytics.py โ”‚ โ”œโ”€โ”€ utils/ โ”‚ โ”‚ โ”œโ”€โ”€ api_client.py โ”‚ โ”‚ โ””โ”€โ”€ visualizations.py โ”‚ โ”œโ”€โ”€ requirements.txt โ”‚ โ””โ”€โ”€ .streamlit/ โ”‚ โ””โ”€โ”€ config.toml โ”‚ โ”œโ”€โ”€ notebooks/ โ”‚ โ””โ”€โ”€ main_analysis.ipynb โ”‚ โ”œโ”€โ”€ README.md โ””โ”€โ”€ .gitignore

๐Ÿ”ฎ Future Enhancements ML Improvements

SHAP explainability

Feature Engineering Pipeline Automated feature extraction from raw customer data including recency-frequency-monetary (RFM) metrics, engagement velocity tracking, product usage patterns, and customer lifetime value calculations. This creates a rich set of predictive signals that update dynamically as new data arrives. Explainable AI Dashboard

SHAP or LIME-based explanations showing which factors drive each customer's churn risk score. This gives your retention team actionable insights like "customer likely to churn due to: declining login frequency (45% impact), support ticket volume (30%), reduced feature usage (25%)" rather than just a black-box probability. Segmented Retention Strategies

Automatic customer clustering based on churn drivers and behavioral patterns, with recommended retention tactics per segment. For example, price-sensitive churners get discount offers, while feature-confused users get onboarding calls. Each segment gets tracked for intervention effectiveness. API and Real-time Scoring

API endpoint that scores individual customers on-demand for real-time use cases like triggered email campaigns, chat interventions during support calls, or dynamic pricing. Includes batch scoring capabilities for daily/weekly refresh of entire customer base risk scores

๐Ÿ“ฌ Contact

For issues, suggestions, or collaboration: ๐Ÿ“ง workwithanshuman9468@gmail.com

โœ… Project Status

Production Ready

Model Version: 1.0

Last Updated: Dec 2025

About

This is my 2nd Capstone Project. Enhancing my skills in ML Algorithm and gaining knowledge of the models and the proper steps from data preprocessing to model training and maintaing their accuracy more higher and accurate.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •