A machine learning project that predicts whether a loan application will be approved or not, based on applicant financial and demographic data.
FinApprove uses supervised classification algorithms to analyze loan application data and predict approval outcomes. The project covers the full ML pipeline — from data cleaning and exploratory analysis to model training, evaluation, and feature engineering.
File: loan_approval_data.csv
The dataset includes the following features:
| Feature | Description |
|---|---|
| Applicant_ID | Unique identifier for each applicant (dropped before training) |
| Gender | Gender of the applicant |
| Marital_Status | Marital status of the applicant |
| Education_Level | Education background |
| Employment_Status | Whether the applicant is employed |
| Employer_Category | Type of employer |
| Applicant_Income | Monthly income of the applicant |
| Coapplicant_Income | Monthly income of the co-applicant |
| Credit_Score | Credit score of the applicant |
| DTI_Ratio | Debt-to-Income ratio |
| Savings | Savings amount |
| Loan_Purpose | Purpose of the loan |
| Property_Area | Area type of the property |
| Loan_Approved | Target variable (Yes / No) |
- Loaded dataset using
pandas
- Numerical columns filled using mean imputation
- Categorical columns filled using most frequent value imputation
- Class distribution of loan approval (pie chart)
- Gender and education level distribution (bar plots)
- Income distribution for applicant and co-applicant (histograms)
- Outlier analysis using box plots (Income, Credit Score, DTI Ratio, Savings)
- Relationship between Credit Score and Loan Approval
LabelEncoderapplied toEducation_LevelandLoan_ApprovedOneHotEncoder(drop first) applied to:Marital_Status,Employment_Status,Loan_Purpose,Gender,Employer_Category,Property_Area
- Heatmap generated to identify relationships between features
- Top correlated features with
Loan_Approvedidentified
StandardScalerapplied to training and test sets
Three classification models were trained and evaluated:
| Model | Metrics |
|---|---|
| Logistic Regression | Accuracy, Precision, Recall, F1, Confusion Matrix |
| K-Nearest Neighbors (K=5) | Accuracy, Precision, Recall, F1, Confusion Matrix |
| Gaussian Naive Bayes | Accuracy, Precision, Recall, F1, Confusion Matrix |
- Added squared features:
DTI_Ratio_sqandCredit_Score_sq - Original
DTI_RatioandCredit_Scorecolumns dropped after engineering - Gaussian Naive Bayes retrained on engineered features
- Language: Python 3
- Notebook: Jupyter Notebook
- Libraries:
pandas— data manipulationnumpy— numerical operationsmatplotlib&seaborn— data visualizationscikit-learn— preprocessing, model training, and evaluation
- Clone the repository
git clone https://github.com/your-username/FinApprove.git
cd FinApprove- Install required libraries
pip install pandas numpy matplotlib seaborn scikit-learn- Place the dataset in the project folder
Make sure loan_approval_data.csv is in the same directory as the notebook.
- Run the notebook
jupyter notebook FinApprove.ipynbFinApprove/
│
├── FinApprove.ipynb # Main Jupyter Notebook
├── loan_approval_data.csv # Dataset (add manually)
└── README.md # Project documentation
- Logistic Regression — baseline linear classifier
- K-Nearest Neighbors — distance-based classifier
- Gaussian Naive Bayes — probabilistic classifier, also tested with engineered features
This project is open source and available under the MIT License.