diff --git a/examples/machine-learning_ClassicAI/mlflow-server/README.md b/examples/machine-learning_ClassicAI/mlflow-server/README.md new file mode 100644 index 00000000..66dadb90 --- /dev/null +++ b/examples/machine-learning_ClassicAI/mlflow-server/README.md @@ -0,0 +1,13 @@ +# MLflow Tracking Server Template + +This template demonstrates how to use MLflow to track, log, and manage machine learning experiments in a single notebook. It trains a Random Forest model on the Diabetes dataset, logs parameters, metrics, and artifacts, and enables viewing and reloading runs locally or through a remote MLflow tracking server. + +You can deploy MLflow-tracked models via platforms like **Saturn Cloud**, refer to Saturn’s documentation for deployment guidance. + +--- + +## References + +* [MLflow Documentation](https://mlflow.org/docs/latest/index.html) +* [Saturn Cloud Docs](https://saturncloud.io/docs/) +* [Scikit-learn RandomForestRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) \ No newline at end of file diff --git a/examples/machine-learning_ClassicAI/mlflow-server/mlflow-tracking.ipynb b/examples/machine-learning_ClassicAI/mlflow-server/mlflow-tracking.ipynb new file mode 100644 index 00000000..8497df8f --- /dev/null +++ b/examples/machine-learning_ClassicAI/mlflow-server/mlflow-tracking.ipynb @@ -0,0 +1,237 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "JwE9FH9oFOMb" + }, + "source": [ + "# MLflow Tracking Server\n", + "\n", + "**MLflow** is an open-source platform that simplifies the tracking, comparison, and deployment of machine learning experiments.\n", + "\n", + "In this sample example template, you’ll use MLflow to **track training runs**, **log parameters and metrics**, and **store models** for future reuse — all within a single notebook.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oV2djjnvFOMj" + }, + "source": [ + "Install **MLflow**, **Gradio**, and supporting libraries including **scikit‑learn**, **matplotlib**, and **pandas**.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "b5yEqAXTFOMk" + }, + "outputs": [], + "source": [ + "!pip install -q mlflow scikit-learn matplotlib pandas gradio\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IppJqmrgFOMm" + }, + "source": [ + "Import MLflow, perform a quick GPU check with PyTorch, and load helper libraries used throughout.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "F31VfPfuFOMn" + }, + "outputs": [], + "source": [ + "import mlflow, os, torch, pandas as pd, matplotlib.pyplot as plt, gradio as gr\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.datasets import load_diabetes\n", + "from sklearn.ensemble import RandomForestRegressor\n", + "\n", + "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n", + "print(f'✅ Using device: {device}')\n", + "if device == 'cpu':\n", + " print('⚠️ Running on CPU — switch to GPU for faster performance if available.')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "fyS8Q66nFOMp" + }, + "source": [ + "By default, MLflow saves runs to the local **`mlruns/`** directory. You can switch to a **remote tracking server** later by setting a different tracking URI.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Le6Ec_F_FOMq" + }, + "outputs": [], + "source": [ + "mlflow.set_tracking_uri('file:///content/mlruns')\n", + "mlflow.set_experiment('mlflow_tracking_demo')\n", + "print('🎯 Tracking URI:', mlflow.get_tracking_uri())\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eTSpUqXLFOMs" + }, + "source": [ + "It fetches experiment metadata, parameters, and metrics from your local `mlruns/` directory (or a remote server if configured).\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tNSyaQmHFOMu" + }, + "outputs": [], + "source": [ + "from mlflow.tracking import MlflowClient\n", + "\n", + "def show_mlflow_runs_table(experiment_name=\"mlflow_tracking_demo\"):\n", + " \"\"\"Display all MLflow runs (similar to MLflow UI Table).\"\"\"\n", + " client = MlflowClient()\n", + " experiment = client.get_experiment_by_name(experiment_name)\n", + "\n", + " if not experiment:\n", + " return pd.DataFrame({\"Info\": [\"No experiment found. Run a training cell first.\"]})\n", + " runs = client.search_runs([experiment.experiment_id])\n", + " if not runs:\n", + " return pd.DataFrame({\"Info\": [\"No runs logged yet.\"]})\n", + "\n", + " rows = []\n", + " for r in runs:\n", + " row = {\n", + " \"Run ID\": r.info.run_id,\n", + " \"Status\": r.info.status,\n", + " \"Start Time\": pd.to_datetime(r.info.start_time, unit=\"ms\"),\n", + " \"End Time\": pd.to_datetime(r.info.end_time, unit=\"ms\"),\n", + " \"Duration (s)\": round((r.info.end_time - r.info.start_time) / 1000, 2)\n", + " if r.info.end_time else None,\n", + " }\n", + " row.update(r.data.params)\n", + " row.update(r.data.metrics)\n", + " rows.append(row)\n", + "\n", + " df = pd.DataFrame(rows)\n", + " main_cols = [\"Run ID\", \"Status\", \"Start Time\", \"End Time\", \"Duration (s)\"]\n", + " other_cols = [c for c in df.columns if c not in main_cols]\n", + " df = df[main_cols + other_cols]\n", + " print(f\"✅ Showing {len(df)} runs from experiment '{experiment_name}'\")\n", + " return df\n", + "\n", + "runs_df = show_mlflow_runs_table()\n", + "display(runs_df)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oMeP7H8qFOMw" + }, + "source": [ + "Let's train a small **Random Forest** on the Diabetes dataset and log parameters, metrics, and the model artefact to MLflow.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "tqqgobq6FOMy" + }, + "outputs": [], + "source": [ + "from mlflow.models.signature import infer_signature\n", + "\n", + "with mlflow.start_run() as run:\n", + " db = load_diabetes()\n", + " X_train, X_test, y_train, y_test = train_test_split(db.data, db.target, test_size=0.2, random_state=42)\n", + "\n", + " model = RandomForestRegressor(n_estimators=100, max_depth=6, random_state=42)\n", + " model.fit(X_train, y_train)\n", + " preds = model.predict(X_test)\n", + " signature = infer_signature(X_test, preds)\n", + "\n", + " mlflow.log_params({'n_estimators': 100, 'max_depth': 6})\n", + " mlflow.log_metric('mean_prediction', float(preds.mean()))\n", + " mlflow.sklearn.log_model(model, 'model', signature=signature)\n", + "\n", + " print(f'Run ID: {run.info.run_id}')\n", + " print('✅ Training and logging complete!')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KwHrJpK9FOMz" + }, + "source": [ + "Use the run ID to load the stored model for inference.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "iB6HPf3kFOM0" + }, + "outputs": [], + "source": [ + "run_id = run.info.run_id\n", + "loaded_model = mlflow.sklearn.load_model(f'runs:/{run_id}/model')\n", + "print('✅ Model loaded successfully!')\n", + "print('Sample predictions:', loaded_model.predict(X_test[:5]))\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7_ULM2cqFOM0" + }, + "source": [ + "So, you've configured MLflow tracking locally (can be configure for MLflow remote server too), logged parameters, metrics, and model artifacts.\n", + "\n", + "Additionally, you can reload a trained model from specific run using the `run Id`. Guide on deployment on saturn can be found in the [saturn documentation](https://saturncloud.io/docs)." + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.7" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}