azuur · azuur · Feb 13, 2024 · Feb 13, 2024 · Feb 13, 2024 · Feb 13, 2024
diff --git a/README.md b/README.md
@@ -2,23 +2,35 @@
 
 Simple ML pipeline repo for experimenting with CI/CD / DevOps / MLOps.
 
+## The idea
+
+This repo contains code to train, evaluate, and serve a simple machine learning model. The objective of this project is to work on my MLOps and ML engineering skills.
+
+Some ideas I am implementing in this repo are:
+
+- Do things as simply but professionally as possible. A simple working solution is better than a sophisticated solution that isn't deployed. ([Agile is the only thing that works](https://www.youtube.com/watch?v=9K20e7jlQPA).)
+- [Oneflow](https://www.endoflineblog.com/oneflow-a-git-branching-model-and-workflow) as the Git branching strategy.
+- Cleanish architecture. For now, this means that code is separated between "core" and "non-core" tasks and data structures, and "core" code doesn't depend on "non-core" code.
+- One repo and one docker image for train, eval, and serve. IMO, this makes sharing functionality across tasks easier and artifact versioning simpler. (But I'm interested in hearing about drawbacks, too.)
+- Use Python standard tooling to make collaboration easier. In particular, the project is a Python package with [poetry](https://python-poetry.org/) as the build backend.
+- CI is done using [pre-commit](https://pre-commit.com/) and GitHub actions (since we're in GitHub).
+- CD should be done depending on how the project is to be deployed. Currently, I'm experimenting with AWS for deployment, so I also use it for CD.
+
+Since the point of this project is _not_ to sharpen my data anaylsis/science skills, the actual data for the project is completely simulated. Maybe later I will try to modify this in order to actually solve a useful problem.
+
 ## To do
 
-- ~~Use Python 3.11 in CI github action~~
-- ~~make pipelines functions~~
-- ~~add loggers to stuff~~
-- ~~add local deployment code...~~
-- ~~add versioning to training... in deployment?~~
-- ~~add eval pipeline, model comparison~~
-- ~~add "best model" mark. add "get_best_model"~~
-- ~~add Dockerfile~~
-- add real prediction logging func
-- add simple demo unit tests
-- add db conn / func to save inference cases (local deployment)
-- add build script to push to ECR (AWS deployment)
-- add rest of AWS deployment (using S3, EC2, AWS CodePipeline)
+- Add section detailing v1 CD + deployment on AWS (with CodeBuild, ECR, Fargate ECS tasks and services, and ELB).
+- Create deployment stack using IaC tool (could be AWS CloudFormation).
+- Add real prediction logging func
+- Add simple demo unit tests
+- Add db conn / func to save inference cases
+- Add build script to push to ECR (AWS deployment)
 
 # Commands to remember
+
+This is a bit inelegant. Sorry.
+
 - python ml_pipelines/deployment/local/train.py
 - python ml_pipelines/deployment/local/eval.py
 - python ml_pipelines/deployment/local/serve.py

diff --git a/ml_pipelines/logic/__init__.py → ml_pipelines/core/__init__.py b/ml_pipelines/logic/__init__.py → ml_pipelines/core/__init__.py
diff --git a/ml_pipelines/logic/common/__init__.py → ml_pipelines/core/common/__init__.py b/ml_pipelines/logic/common/__init__.py → ml_pipelines/core/common/__init__.py
diff --git a/ml_pipelines/logic/common/dgp.py → ml_pipelines/core/common/dgp.py b/ml_pipelines/logic/common/dgp.py → ml_pipelines/core/common/dgp.py
diff --git a/ml_pipelines/logic/common/feature_eng.py → ml_pipelines/core/common/feature_eng.py b/ml_pipelines/logic/common/feature_eng.py → ml_pipelines/core/common/feature_eng.py
diff --git a/ml_pipelines/logic/common/model.py → ml_pipelines/core/common/model.py b/ml_pipelines/logic/common/model.py → ml_pipelines/core/common/model.py
diff --git a/ml_pipelines/logic/eval/__init__.py → ml_pipelines/core/eval/__init__.py b/ml_pipelines/logic/eval/__init__.py → ml_pipelines/core/eval/__init__.py
diff --git a/ml_pipelines/logic/eval/eval.py → ml_pipelines/core/eval/eval.py b/ml_pipelines/logic/eval/eval.py → ml_pipelines/core/eval/eval.py
diff --git a/ml_pipelines/logic/serve/__init__.py → ml_pipelines/core/serve/__init__.py b/ml_pipelines/logic/serve/__init__.py → ml_pipelines/core/serve/__init__.py
diff --git a/ml_pipelines/logic/serve/serve.py → ml_pipelines/core/serve/serve.py b/ml_pipelines/logic/serve/serve.py → ml_pipelines/core/serve/serve.py
@@ -6,11 +6,11 @@
 from pydantic import BaseModel, Field
 from sklearn.linear_model import LogisticRegression
 
-from ml_pipelines.logic.common.feature_eng import (
+from ml_pipelines.core.common.feature_eng import (
     FeatureEngineeringParams,
     transform_features,
 )
-from ml_pipelines.logic.common.model import predict
+from ml_pipelines.core.common.model import predict
 
 
 class Point(BaseModel):

diff --git a/ml_pipelines/logic/train/__init__.py → ml_pipelines/core/train/__init__.py b/ml_pipelines/logic/train/__init__.py → ml_pipelines/core/train/__init__.py
diff --git a/ml_pipelines/logic/train/train.py → ml_pipelines/core/train/train.py b/ml_pipelines/logic/train/train.py → ml_pipelines/core/train/train.py
diff --git a/ml_pipelines/deployment/aws/io.py b/ml_pipelines/deployment/aws/io.py
@@ -10,8 +10,8 @@
 from matplotlib.figure import Figure
 from sklearn.linear_model import LogisticRegression
 
-from ml_pipelines.logic.common.feature_eng import FeatureEngineeringParams
-from ml_pipelines.logic.serve.serve import Point
+from ml_pipelines.core.common.feature_eng import FeatureEngineeringParams
+from ml_pipelines.core.serve.serve import Point
 from ml_pipelines.pipeline.train_pipeline import TrainArtifacts
 
 

diff --git a/ml_pipelines/deployment/common/serve.py b/ml_pipelines/deployment/common/serve.py
@@ -3,7 +3,7 @@
 
 import uvicorn
 
-from ml_pipelines.logic.serve.serve import PredictionLoggingFunc, create_fastapi_app
+from ml_pipelines.core.serve.serve import PredictionLoggingFunc, create_fastapi_app
 from ml_pipelines.pipeline.train_pipeline import TrainArtifacts
 
 

diff --git a/ml_pipelines/deployment/local/io.py b/ml_pipelines/deployment/local/io.py
@@ -8,8 +8,8 @@
 from matplotlib.figure import Figure
 from sklearn.linear_model import LogisticRegression
 
-from ml_pipelines.logic.common.feature_eng import FeatureEngineeringParams
-from ml_pipelines.logic.serve.serve import Point
+from ml_pipelines.core.common.feature_eng import FeatureEngineeringParams
+from ml_pipelines.core.serve.serve import Point
 from ml_pipelines.pipeline.train_pipeline import TrainArtifacts
 
 

diff --git a/ml_pipelines/pipeline/data_gen_pipeline.py b/ml_pipelines/pipeline/data_gen_pipeline.py
@@ -1,4 +1,4 @@
-from ml_pipelines.logic.common.dgp import generate_raw_data
+from ml_pipelines.core.common.dgp import generate_raw_data
 
 data = generate_raw_data(10_000, 813)
 data.to_csv("raw_data.csv", index=False)
diff --git a/ml_pipelines/pipeline/eval_pipeline.py b/ml_pipelines/pipeline/eval_pipeline.py
@@ -4,12 +4,12 @@
 import pandas as pd
 from sklearn.linear_model import LogisticRegression
 
-from ml_pipelines.logic.common.feature_eng import (
+from ml_pipelines.core.common.feature_eng import (
     FeatureEngineeringParams,
     transform_features,
 )
-from ml_pipelines.logic.common.model import predict
-from ml_pipelines.logic.eval.eval import (
+from ml_pipelines.core.common.model import predict
+from ml_pipelines.core.eval.eval import (
     calculate_metrics,
     make_calibration_plot,
     make_roc_plot,

diff --git a/ml_pipelines/pipeline/train_pipeline.py b/ml_pipelines/pipeline/train_pipeline.py
@@ -4,12 +4,12 @@
 import pandas as pd
 from sklearn.linear_model import LogisticRegression
 
-from ml_pipelines.logic.common.feature_eng import (
+from ml_pipelines.core.common.feature_eng import (
     FeatureEngineeringParams,
     fit_feature_transform,
     transform_features,
 )
-from ml_pipelines.logic.train.train import split_data, train_model
+from ml_pipelines.core.train.train import split_data, train_model
 
 
 class TrainArtifacts(TypedDict):