Efficiently deploy a machine learning model using containerized environments with data versioning, training, and live inference capabilities. Includes reproducible pipelines, monitoring, logging, and infrastructure automation to ensure scalable and robust performance for ML applications.
This repository contains the code and scripts to train, deploy, and serve a simple neural network model for the MNIST dataset. The solution is designed to be reproducible, scalable, and efficient, with infrastructure automation, monitoring, and logging.
The environment is set up using Docker to ensure reproducibility across different platforms.
- Clone this repository:
git clone git@github.com:Cpicon/e2e_ml_application.git cd e2e_ml_application
- Python 3.10
- Docker
- Poetry (Python dependency management tool)
-
Create a Virtual Environment:
- First, create a virtual environment using Python 3.10:
python3.10 -m venv venv source venv/bin/activate
- First, create a virtual environment using Python 3.10:
-
Install Poetry:
- Install Poetry within the virtual environment:
pip install poetry
- Install Poetry within the virtual environment:
-
Install Project Dependencies:
- Use Poetry to install all the required packages:
poetry install
- Use Poetry to install all the required packages:
-
Set Environment Variables:
- Export the necessary environment variables for AWS and MLflow:
export AWS_ACCESS_KEY_ID="awsaccesskey" export AWS_SECRET_ACCESS_KEY="awssecretkey" export LOCAL_MLFLOW_S3_ENDPOINT_URL="http://localhost:9000"
- Export the necessary environment variables for AWS and MLflow:
-
Build Docker Images:
- Use
maketo build the Docker images:make build
- Use
-
Run Docker Containers:
- After building the images, start the services using:
cp deployment/dev/local.env deployment/dev/.env make run
- After building the images, start the services using:
-
Stop Docker Containers:
- To stop the services, use:
make stop
- To stop the services, use:
-
Remove Docker Containers:
- To remove the containers, use:
make clean
- To remove the containers, use:
-
Dagster (Pipeline Orchestrator):
- Runs on http://localhost:3000/.
- Navigate to the "Overview/Jobs" section to view and manage your pipelines.
-
MLflow (Experiment Tracking and Model Registry):
- Available on http://localhost:5005.
- Use MLflow to track your experiments, register models, and manage model versions.
-
Minio (Object and Model Storage):
- Runs on http://localhost:9001.
- Minio serves as the object storage solution for the models and data.
-
MLServer (Model Deployment Service, HTTP Backend):
- Accessible at http://localhost:9595.
- Use MLServer for deploying machine learning models with a RESTful API backend.
- To test the deployed model, navigate to the root project folder and run:
python model_query_example.py
The architecture of this project is centered around several key components orchestrated using Docker containers. Below is a high-level overview of the services involved:
-
Purpose: Dagster is used as the pipeline orchestrator for managing and executing data pipelines.
-
Components:
- Dagster PostgreSQL: This service runs a PostgreSQL database for storing Dagster's run storage, schedule storage, and event logs.
- Dagster User Code: This container runs the gRPC server that loads your user code, enabling Dagster to execute pipelines. It is configured to use the same image when launching runs in new containers.
- Dagster Webserver: This service provides a web interface for interacting with Dagster, where you can view and manage pipelines.
- Dagster Daemon: The daemon process is responsible for taking runs off of the queue and launching them, as well as handling schedules and sensors.
-
References:
- Purpose: MLflow is used for tracking experiments, registering models, and managing model versions.
- Components:
- MLflow PostgreSQL: A PostgreSQL database for storing MLflow tracking metadata.
- MLflow Tracking Server: This service hosts the MLflow server for tracking experiments and managing models.
- Purpose: Minio is used as an object storage solution for storing datasets, models, and other artifacts.
- Components:
- Minio Server: A high-performance object storage server.
- Minio Client (mc): A command-line tool for interacting with Minio.
-
Purpose: MLServer is used for deploying machine learning models via a RESTful API.
-
Components:
- MLServer: The main service responsible for serving the machine learning models.
-
Reference: MLServer Documentation
- Networks: All services are connected through a
project_networkto facilitate communication between containers. - Volumes: Persistent storage is managed using Docker volumes, ensuring that data persists across container restarts.
-
deployment/
- Contains Docker-related configurations and deployment scripts.
- dev/: Contains development-specific Dockerfiles, Docker Compose files, and configuration files for setting up the environment.
- prod/: Contains production-specific Docker Compose file.
- stage/: Contains staging-specific Docker Compose file.
-
e2eML/
- The main package directory for the project. It contains various submodules responsible for different stages of the machine learning lifecycle.
- clients/: Contains code to interact with external services such as MLServer.
- evaluate/: Handles the evaluation of machine learning models.
- inference/: Contains scripts for making predictions using trained models.
- ingest/: Responsible for data ingestion processes, such as loading datasets.
- models/: Contains definitions for machine learning models.
- orchestrator/: Contains code related to pipeline orchestration, likely tied to Dagster.
- pipeline_configs/: Holds configuration files for various pipelines.
- train/: Contains scripts and modules for training machine learning models.