Machine learning pipeline for retention prediction

This repository contains code to demo a Python machine learning pipeline I built to predict and explain student retention and other outcomes when I was working at Ithaca College. In the real world, the model was deployed in the cloud (first AWS, then Azure) and ran daily to pull in student data from various sources, generate new predictions and explanations, and update a Tableau dashboard that displayed information to campus stakeholders. I can't share the full project or real student data due to privacy concerns, but this repo contains the core parts of the prediction pipeline. This includes:

Custom scikit-learn feature transformers to preprocess the data
XGBoost classifier with Bayesian hyperparameter tuning via hyperopt to generate predictions
Explanations of model predictions using SHAP, a game-theoretic approach

This is all implemented in an object-oriented programming framework in which an entire pipeline is stored as an object of the RetentPipe class. I originally built the pipeline to predict student retention but it could be used to predict any binary outcome from any combination of continuous and categorical features. You can see a demo of the pipeline using tree survival data by running survival_demo.py.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.idea		.idea
data		data
README.md		README.md
pipeline.py		pipeline.py
survival_demo.py		survival_demo.py
transformers.py		transformers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine learning pipeline for retention prediction

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Machine learning pipeline for retention prediction

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages