KServe ML Deployment Project

This repository provides a batteries-included template for serving machine-learning models on Kubernetes with the following stack:

Istio for service mesh & ingress
Knative Serving for serverless autoscaling
KServe for model management & prediction endpoints

The goal is to give you an opinionated, yet extensible starting point that you can run locally on Kind or promote to any managed Kubernetes service.

Architecture Overview

The following diagram illustrates the complete request flow when a user uploads an image for circular object detection:

flowchart TB
    User[User/Client] 
    
    subgraph K8s["Kubernetes Cluster"]
        Gateway[Istio Gateway :80/:443]
        
        subgraph NS1["Namespace: aiq-backend"]
            Service[Backend Service]
            
            subgraph BackendPod["Backend Pod"]
                Envoy1[Envoy Proxy]
                FastAPI[FastAPI :8000]
            end
            
            Storage[(PVC Storage)]
            DB[(SQLite DB)]
        end
        
        subgraph NS2["Namespace: aiq-detector"]
            ModelService[Model Service]
            
            subgraph ModelDeployment["KServe Deployment"]
                Knative[Knative Autoscaler]
                
                subgraph ModelPod["Model Pod"]
                    Envoy2[Envoy Proxy]
                    Model[Model Server :8080]
                end
            end
        end
        
        Gateway <--> Service
        Service <--> Envoy1
        Envoy1 <--> FastAPI
        FastAPI <--> Storage
        FastAPI <--> DB
        FastAPI <--> ModelService
        ModelService <--> Knative
        Knative <--> ModelPod
        Envoy2 <--> Model
    end
    
    User <--> Gateway
    Gateway <--> User
    
    style Gateway fill:#4285F4,color:#fff
    style FastAPI fill:#34A853,color:#fff
    style Model fill:#FBBC04,color:#000
    style Knative fill:#EA4335,color:#fff

Request Flow Steps

User → Istio Gateway: Client sends POST request to /images/ endpoint
Gateway → Backend Service: VirtualService routes request to backend service
Service → Pod: Kubernetes service load balancer to backend pod
Envoy → FastAPI: Sidecar proxy forwards request with mTLS
FastAPI Processing:
- Stores image in PVC-backed storage
- Saves metadata to SQLite database
- Prepares inference request
FastAPI → Model Service: Sends base64 image for inference
Knative Autoscaling: Scales model pod from 0 to 1 if needed
Model Inference: Detects circles and returns bounding boxes
Response Flow: Results flow back through the same path
User Response: Client receives JSON with detected objects

Detailed Request Flow

sequenceDiagram
    box External
        participant U as User
    end
    
    box Kubernetes Cluster
        participant IG as Istio Gateway
        participant VS as VirtualService
        participant EP as Envoy Proxy
        participant API as FastAPI Backend
        participant S as Storage PVC
        participant DB as Database
        participant KS as KServe
        participant M as Model Server
    end
    
    U->>IG: POST /images/ with image file
    IG->>VS: Route based on path
    VS->>EP: Forward to backend service
    EP->>API: mTLS secured request
    API->>S: Store image file
    API->>DB: Save image metadata
    API->>KS: Request inference with base64 image
    Note over KS: Knative scales from 0 to 1
    KS->>M: Forward to model pod
    M->>M: Detect circles
    M->>KS: Return detections
    KS->>API: JSON response
    API->>DB: Save detected objects
    API->>EP: Response with detections
    EP->>VS: Return via mesh
    VS->>IG: Route response
    IG->>U: JSON result

Workflow Description

User Upload: Client sends a POST request to /api/v1/images/ with an image file
Istio Ingress: Request enters through Istio Gateway and is routed by VirtualService rules
Service Mesh: Envoy sidecar proxy handles mTLS, observability, and load balancing
FastAPI Backend:
- Receives and validates the image
- Stores image in persistent storage (PVC-backed filesystem)
- Registers metadata in database (image ID, path, timestamp)
Model Inference Request: Backend prepares inference request with base64-encoded image
KServe/Knative Autoscaling:
- Knative autoscaler receives request
- Scales model pod from 0 to 1 (cold start) or routes to existing pod
- Creates revision-specific pods on demand
Model Processing:
- Model server receives image
- PyTorch/TensorFlow model detects circular objects
- Returns bounding boxes, centroids, and confidence scores
Response Processing:
- Backend receives detection results
- Saves detected objects to database
- Returns comprehensive response to user

Key Components

Istio Gateway: Entry point for all external traffic, handles TLS termination
VirtualService: Defines routing rules for different endpoints
Envoy Proxy: Sidecar container providing service mesh capabilities
Knative Serving: Provides serverless scaling, including scale-to-zero
KServe: Manages model deployment, versioning, and inference endpoints
Persistent Storage: Ensures data survives pod restarts (critical for SQLite)

Project Structure

.
├── environments/           # Environment-specific Kubernetes manifests
│   ├── dev/                # Development environment configs
│   ├── stage/              # Staging environment configs
│   ├── prod/               # Production configs (Helm values, manifests)
│   └── local/              # Local Kind cluster setup
│       ├── aiq_detector/   # Model server deployment manifests
│       ├── backend/        # Backend service K8s manifests
│       │   ├── *.yaml      # Deployments, services, storage, Istio routing
│       │   ├── deploy.sh   # Automated deployment script
│       │   └── test-istio.sh # Test script with Istio integration
│       ├── test/           # Demo model and test payloads
│       ├── install_kserve_knative.sh  # KServe/Knative installation
│       ├── setup_ingress_routing.sh   # Ingress configuration
│       ├── setup_kind.sh   # Kind cluster setup
│       └── README.md       # Local environment documentation
├── services/               # Microservices
│   ├── backend/            # FastAPI backend service
│   │   ├── aiq_circular_detection/  # Main application package
│   │   ├── config/         # Configuration management
│   │   ├── tests/          # Unit and integration tests
│   │   ├── data/           # Backend data storage
│   │   ├── Dockerfile      # Optimized multi-stage build
│   │   ├── docker-compose.yml  # Docker Compose configuration
│   │   ├── pyproject.toml  # Python project configuration
│   │   ├── uv.lock         # Dependency lock file
│   │   ├── start-dev.sh    # Development server startup
│   │   ├── start-dev-real.sh  # Real mode development server
│   │   └── test_full_integration.sh  # Integration tests
│   ├── evaluation/         # Model evaluation module
│   │   ├── dataset/        # Evaluation dataset
│   │   ├── output/         # Evaluation results
│   │   ├── evaluate_model.py  # Evaluation script
│   │   ├── requirements.txt   # Evaluation dependencies
│   │   ├── run_evaluation.sh  # Execution script
│   │   └── README.md       # Evaluation documentation
├── run_all.sh              # Unified runner script (local/kind modes)
├── run_all_k8s.sh          # Kubernetes deployment script
└── README.md               # This file

Getting Started

Prerequisites

Docker Desktop with Kubernetes enabled
kind - Kubernetes in Docker
kubectl - Kubernetes CLI
kustomize - Tool for customizing Kubernetes YAML configurations
jq - JSON processor
uv - Fast Python package manager (installation guide)
just - Modern command runner (installation guide) - Required

Quick Start

Option 1: Local Development (No Kubernetes)

For rapid development and testing without Kubernetes or Docker, run all services locally on your machine:

# Run all services locally (model server + backend + tests)
just dev

This command will:

Start the AI model server on port 9090
Start the backend API service on port 8000
Run integration tests automatically
Run model evaluation (if dataset is available)
Display performance metrics summary
Keep services running for manual testing

Option 2: Kubernetes Development (Kind)

For testing in a real Kubernetes environment:

# Set up Kubernetes development environment (one-time setup)
just dev --k8s          # or just dev -k

# Run tests on Kubernetes (auto-detects and sets up if needed)
just test --k8s         # or just test -k

# Clean up when done
just clean --k8s        # or just clean -k

Advanced Options:

# Clean first, then setup fresh infrastructure
just dev --k8s --clean

# Force delete entire cluster
just clean --k8s --force

Key Features:

just test --k8s automatically sets up infrastructure if it doesn't exist
just dev --k8s sets up and keeps infrastructure running for development
Infrastructure persists between test runs for faster iteration
Smart auto-detection prevents redundant setup

What happens during setup:

Creates a Kind cluster with proper configuration
Installs KServe, Knative, Istio, and Cert-Manager
Sets up ingress routing for Kind
Builds and deploys the model server
Builds and deploys the backend service
Runs integration tests
Performs model evaluation (optional)

To enable automatic model evaluation:

Place your evaluation dataset in services/evaluation/dataset/
Include _annotations.coco.json and image files
The script will automatically run evaluation and display results

Manual Local Testing

If you prefer to run services individually:

Start the Model Server:

# Interactive mode (blocks terminal)
just model-server

# Background mode (non-blocking)
just model-server --background
just model-server -b            # Short flag
# Model server will run on http://localhost:9090
# Swagger UI: http://localhost:9090/docs

Start the Backend Service:

# Dummy mode (no model server required)
just backend
just backend --background       # Background mode
just backend -b                 # Background mode (short)

# Real mode (requires model server running)
just backend --real             # Interactive
just backend --real --background # Background
just backend -r -b              # Real + background (short flags)
# Backend API will run on http://localhost:8000
# API Docs: http://localhost:8000/docs

Run Integration Tests:

# Automatic mode (starts services, runs tests, cleans up)
just test

# Manual mode (assumes services already running)
just test --manual
just test -m                    # Short flag

# Run tests on Kubernetes (auto-setup if needed)
just test --k8s
just test -k                    # Short flag

Cleanup:

just clean                      # Stop all local services
just clean --k8s                # Clean up Kubernetes resources
just clean -k                   # Clean up Kubernetes (short)

Troubleshooting Local Development

Port conflicts: Use just clean to stop all services and free ports
Model download: First run downloads the AI model (~300MB)
Logs: Use just logs to view service logs or check logs/ directory
Service status: Use just status and just health to check service state
Dependencies: Use just check-deps to verify required tools are installed
Cleanup: Use just clean for local services, just clean --k8s for Kubernetes
Help: Use just to see all available commands organized by category

API Endpoints

Once deployed, the backend service provides:

POST /images/ - Upload image for circle detection
GET /images/{image_id}/objects - List detected objects for an image
GET /images/{image_id}/objects/{object_id} - Get object details
GET /health - Health check endpoint
GET /docs - Interactive API documentation (Swagger UI)

Configuration

The backend service can operate in two modes:

Dummy Mode (default): Returns mock detection results for testing
Real Mode: Connects to actual KServe model endpoint

Configure via environment variables:

MODE: "dummy"  # or "real"
MODEL_SERVER_URL: "http://model-service.namespace.svc.cluster.local"

Production Deployment

For production environments:

Replace SQLite with PostgreSQL or MySQL for multi-replica support
Use cloud storage (S3, GCS, Azure Storage) instead of local filesystem
Configure proper ingress with TLS certificates and domain names
Set resource limits and autoscaling policies
Enable monitoring with Prometheus and distributed tracing

See environments/local/backend/DEPLOYMENT.md for detailed deployment instructions.

Model Evaluation

The project includes a comprehensive evaluation module to assess model performance on circular object detection tasks.

Evaluation Metrics

The evaluation module uses industry-standard computer vision metrics:

Jaccard Index (IoU): Measures overlap between predicted and ground truth regions
- Simple and weighted averages computed
- Range: 0 to 1 (higher is better)
F1 Score: Balances precision and recall
- Precision: Ratio of correct detections to total detections
- Recall: Ratio of detected objects to total ground truth objects
- Uses IoU threshold of 0.5 for matching
Hungarian Assignment: Optimally matches predictions to ground truth objects

Running Evaluation

Prepare dataset in COCO format:

cd services/evaluation
mkdir -p dataset
# Copy COCO annotations and images
cp /path/to/_annotations.coco.json dataset/
cp /path/to/images/*.jpg dataset/

Run evaluation:

cd services/evaluation
./run_evaluation.sh

The evaluation generates:

Detailed metrics report (precision, recall, F1 score, Jaccard Index)
Annotated images showing predictions (red) vs ground truth (green)
Per-image performance breakdowns

See services/evaluation/README.md for detailed documentation.

Development

Quick Commands

All development tasks are managed through Just commands:

# View all available commands organized by category
just

# Development workflow
just dev              # Start local development environment
just dev --k8s        # Start Kubernetes development environment
just test             # Run integration tests (auto-manages services)
just test --manual    # Run tests (assumes services running)
just test --k8s       # Run tests on Kubernetes (auto-setup if needed)

# Service management
just model-server               # Start model server (interactive)
just model-server --background  # Start model server (background)
just model-server -b            # Start model server (background, short)
just backend                    # Start backend (dummy mode)
just backend --real             # Start backend (real mode)
just backend --real --background # Real mode + background
just backend -r -b              # Real mode + background (short flags)
just clean                      # Clean up all local services
just clean --k8s                # Clean up Kubernetes resources

# Evaluation and testing
just eval             # Run model evaluation (requires dataset)
just lint             # Run code linting
just pytest           # Run unit tests only

# Utilities
just status           # Check service status
just health           # Check service health
just logs             # View local service logs
just k8s-logs         # View Kubernetes service logs
just endpoints        # Show service endpoints
just install-deps     # Install development dependencies
just check-deps       # Verify required tools are installed

Why Just? Modern Advantages

Just provides significant advantages over traditional build tools:

🚀 Intuitive Flags: just model-server --background vs separate commands
📏 Short Flags: just backend -r -b (real + background), just test -k (Kubernetes)
🔗 Flag Combinations: Mix and match flags naturally (--k8s --clean)
🧠 Smart Auto-Detection: just test --k8s sets up infrastructure if needed
🌍 Cross-Platform: Works consistently across macOS, Linux, and Windows
⚡ Modern Syntax: Cleaner, more readable command definitions
📋 Organized Help: Commands grouped by category with just

Traditional Commands (if needed)

# Running Tests
cd services/backend && pytest

# Building Docker Image
cd services/backend && docker build -t aiq-circular-detection:latest .

# Local Development
cd services/backend && ./start-dev.sh

Additional Resources

For detailed development instructions, see the individual README files in:

environments/local/aiq_detector/README.md - Model server details
environments/local/backend/DEPLOYMENT.md - Backend deployment guide
services/backend/README.md - Backend development guide
services/evaluation/README.md - Evaluation tools guide

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests and ensure the deployment works
Submit a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github/workflows		.github/workflows
.idea		.idea
docs		docs
environments/local		environments/local
services		services
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
dev-test.sh		dev-test.sh
justfile		justfile
k8s-deploy.sh		k8s-deploy.sh
run_all_k8s.sh		run_all_k8s.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KServe ML Deployment Project

Architecture Overview

Request Flow Steps

Detailed Request Flow

Workflow Description

Key Components

Project Structure

Getting Started

Prerequisites

Quick Start

Option 1: Local Development (No Kubernetes)

Option 2: Kubernetes Development (Kind)

To enable automatic model evaluation:

Manual Local Testing

Troubleshooting Local Development

API Endpoints

Configuration

Production Deployment

Model Evaluation

Evaluation Metrics

Running Evaluation

Development

Quick Commands

Why Just? Modern Advantages

Traditional Commands (if needed)

Additional Resources

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KServe ML Deployment Project

Architecture Overview

Request Flow Steps

Detailed Request Flow

Workflow Description

Key Components

Project Structure

Getting Started

Prerequisites

Quick Start

Option 1: Local Development (No Kubernetes)

Option 2: Kubernetes Development (Kind)

To enable automatic model evaluation:

Manual Local Testing

Troubleshooting Local Development

API Endpoints

Configuration

Production Deployment

Model Evaluation

Evaluation Metrics

Running Evaluation

Development

Quick Commands

Why Just? Modern Advantages

Traditional Commands (if needed)

Additional Resources

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages