Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README + renaming #21

Merged
merged 3 commits into from
Feb 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 25 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,35 @@

Simple ML pipeline repo for experimenting with CI/CD / DevOps / MLOps.

## The idea

This repo contains code to train, evaluate, and serve a simple machine learning model. The objective of this project is to work on my MLOps and ML engineering skills.

Some ideas I am implementing in this repo are:

- Do things as simply but professionally as possible. A simple working solution is better than a sophisticated solution that isn't deployed. ([Agile is the only thing that works](https://www.youtube.com/watch?v=9K20e7jlQPA).)
- [Oneflow](https://www.endoflineblog.com/oneflow-a-git-branching-model-and-workflow) as the Git branching strategy.
- Cleanish architecture. For now, this means that code is separated between "core" and "non-core" tasks and data structures, and "core" code doesn't depend on "non-core" code.
- One repo and one docker image for train, eval, and serve. IMO, this makes sharing functionality across tasks easier and artifact versioning simpler. (But I'm interested in hearing about drawbacks, too.)
- Use Python standard tooling to make collaboration easier. In particular, the project is a Python package with [poetry](https://python-poetry.org/) as the build backend.
- CI is done using [pre-commit](https://pre-commit.com/) and GitHub actions (since we're in GitHub).
- CD should be done depending on how the project is to be deployed. Currently, I'm experimenting with AWS for deployment, so I also use it for CD.

Since the point of this project is _not_ to sharpen my data anaylsis/science skills, the actual data for the project is completely simulated. Maybe later I will try to modify this in order to actually solve a useful problem.

## To do

- ~~Use Python 3.11 in CI github action~~
- ~~make pipelines functions~~
- ~~add loggers to stuff~~
- ~~add local deployment code...~~
- ~~add versioning to training... in deployment?~~
- ~~add eval pipeline, model comparison~~
- ~~add "best model" mark. add "get_best_model"~~
- ~~add Dockerfile~~
- add real prediction logging func
- add simple demo unit tests
- add db conn / func to save inference cases (local deployment)
- add build script to push to ECR (AWS deployment)
- add rest of AWS deployment (using S3, EC2, AWS CodePipeline)
- Add section detailing v1 CD + deployment on AWS (with CodeBuild, ECR, Fargate ECS tasks and services, and ELB).
- Create deployment stack using IaC tool (could be AWS CloudFormation).
- Add real prediction logging func
- Add simple demo unit tests
- Add db conn / func to save inference cases
- Add build script to push to ECR (AWS deployment)

# Commands to remember

This is a bit inelegant. Sorry.

- python ml_pipelines/deployment/local/train.py
- python ml_pipelines/deployment/local/eval.py
- python ml_pipelines/deployment/local/serve.py
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
from pydantic import BaseModel, Field
from sklearn.linear_model import LogisticRegression

from ml_pipelines.logic.common.feature_eng import (
from ml_pipelines.core.common.feature_eng import (
FeatureEngineeringParams,
transform_features,
)
from ml_pipelines.logic.common.model import predict
from ml_pipelines.core.common.model import predict


class Point(BaseModel):
Expand Down
File renamed without changes.
4 changes: 2 additions & 2 deletions ml_pipelines/deployment/aws/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@
from matplotlib.figure import Figure
from sklearn.linear_model import LogisticRegression

from ml_pipelines.logic.common.feature_eng import FeatureEngineeringParams
from ml_pipelines.logic.serve.serve import Point
from ml_pipelines.core.common.feature_eng import FeatureEngineeringParams
from ml_pipelines.core.serve.serve import Point
from ml_pipelines.pipeline.train_pipeline import TrainArtifacts


Expand Down
2 changes: 1 addition & 1 deletion ml_pipelines/deployment/common/serve.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

import uvicorn

from ml_pipelines.logic.serve.serve import PredictionLoggingFunc, create_fastapi_app
from ml_pipelines.core.serve.serve import PredictionLoggingFunc, create_fastapi_app
from ml_pipelines.pipeline.train_pipeline import TrainArtifacts


Expand Down
4 changes: 2 additions & 2 deletions ml_pipelines/deployment/local/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
from matplotlib.figure import Figure
from sklearn.linear_model import LogisticRegression

from ml_pipelines.logic.common.feature_eng import FeatureEngineeringParams
from ml_pipelines.logic.serve.serve import Point
from ml_pipelines.core.common.feature_eng import FeatureEngineeringParams
from ml_pipelines.core.serve.serve import Point
from ml_pipelines.pipeline.train_pipeline import TrainArtifacts


Expand Down
2 changes: 1 addition & 1 deletion ml_pipelines/pipeline/data_gen_pipeline.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from ml_pipelines.logic.common.dgp import generate_raw_data
from ml_pipelines.core.common.dgp import generate_raw_data

data = generate_raw_data(10_000, 813)
data.to_csv("raw_data.csv", index=False)
6 changes: 3 additions & 3 deletions ml_pipelines/pipeline/eval_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@
import pandas as pd
from sklearn.linear_model import LogisticRegression

from ml_pipelines.logic.common.feature_eng import (
from ml_pipelines.core.common.feature_eng import (
FeatureEngineeringParams,
transform_features,
)
from ml_pipelines.logic.common.model import predict
from ml_pipelines.logic.eval.eval import (
from ml_pipelines.core.common.model import predict
from ml_pipelines.core.eval.eval import (
calculate_metrics,
make_calibration_plot,
make_roc_plot,
Expand Down
4 changes: 2 additions & 2 deletions ml_pipelines/pipeline/train_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@
import pandas as pd
from sklearn.linear_model import LogisticRegression

from ml_pipelines.logic.common.feature_eng import (
from ml_pipelines.core.common.feature_eng import (
FeatureEngineeringParams,
fit_feature_transform,
transform_features,
)
from ml_pipelines.logic.train.train import split_data, train_model
from ml_pipelines.core.train.train import split_data, train_model


class TrainArtifacts(TypedDict):
Expand Down
Loading