This repository provides a batteries-included template for serving machine-learning models on Kubernetes with the following stack:
- Istio for service mesh & ingress
- Knative Serving for serverless autoscaling
- KServe for model management & prediction endpoints
The goal is to give you an opinionated, yet extensible starting point that you can run locally on Kind or promote to any managed Kubernetes service.
The following diagram illustrates the complete request flow when a user uploads an image for circular object detection:
flowchart TB
User[User/Client]
subgraph K8s["Kubernetes Cluster"]
Gateway[Istio Gateway :80/:443]
subgraph NS1["Namespace: aiq-backend"]
Service[Backend Service]
subgraph BackendPod["Backend Pod"]
Envoy1[Envoy Proxy]
FastAPI[FastAPI :8000]
end
Storage[(PVC Storage)]
DB[(SQLite DB)]
end
subgraph NS2["Namespace: aiq-detector"]
ModelService[Model Service]
subgraph ModelDeployment["KServe Deployment"]
Knative[Knative Autoscaler]
subgraph ModelPod["Model Pod"]
Envoy2[Envoy Proxy]
Model[Model Server :8080]
end
end
end
Gateway <--> Service
Service <--> Envoy1
Envoy1 <--> FastAPI
FastAPI <--> Storage
FastAPI <--> DB
FastAPI <--> ModelService
ModelService <--> Knative
Knative <--> ModelPod
Envoy2 <--> Model
end
User <--> Gateway
Gateway <--> User
style Gateway fill:#4285F4,color:#fff
style FastAPI fill:#34A853,color:#fff
style Model fill:#FBBC04,color:#000
style Knative fill:#EA4335,color:#fff
- User β Istio Gateway: Client sends POST request to
/images/endpoint - Gateway β Backend Service: VirtualService routes request to backend service
- Service β Pod: Kubernetes service load balancer to backend pod
- Envoy β FastAPI: Sidecar proxy forwards request with mTLS
- FastAPI Processing:
- Stores image in PVC-backed storage
- Saves metadata to SQLite database
- Prepares inference request
- FastAPI β Model Service: Sends base64 image for inference
- Knative Autoscaling: Scales model pod from 0 to 1 if needed
- Model Inference: Detects circles and returns bounding boxes
- Response Flow: Results flow back through the same path
- User Response: Client receives JSON with detected objects
sequenceDiagram
box External
participant U as User
end
box Kubernetes Cluster
participant IG as Istio Gateway
participant VS as VirtualService
participant EP as Envoy Proxy
participant API as FastAPI Backend
participant S as Storage PVC
participant DB as Database
participant KS as KServe
participant M as Model Server
end
U->>IG: POST /images/ with image file
IG->>VS: Route based on path
VS->>EP: Forward to backend service
EP->>API: mTLS secured request
API->>S: Store image file
API->>DB: Save image metadata
API->>KS: Request inference with base64 image
Note over KS: Knative scales from 0 to 1
KS->>M: Forward to model pod
M->>M: Detect circles
M->>KS: Return detections
KS->>API: JSON response
API->>DB: Save detected objects
API->>EP: Response with detections
EP->>VS: Return via mesh
VS->>IG: Route response
IG->>U: JSON result
- User Upload: Client sends a POST request to
/api/v1/images/with an image file - Istio Ingress: Request enters through Istio Gateway and is routed by VirtualService rules
- Service Mesh: Envoy sidecar proxy handles mTLS, observability, and load balancing
- FastAPI Backend:
- Receives and validates the image
- Stores image in persistent storage (PVC-backed filesystem)
- Registers metadata in database (image ID, path, timestamp)
- Model Inference Request: Backend prepares inference request with base64-encoded image
- KServe/Knative Autoscaling:
- Knative autoscaler receives request
- Scales model pod from 0 to 1 (cold start) or routes to existing pod
- Creates revision-specific pods on demand
- Model Processing:
- Model server receives image
- PyTorch/TensorFlow model detects circular objects
- Returns bounding boxes, centroids, and confidence scores
- Response Processing:
- Backend receives detection results
- Saves detected objects to database
- Returns comprehensive response to user
- Istio Gateway: Entry point for all external traffic, handles TLS termination
- VirtualService: Defines routing rules for different endpoints
- Envoy Proxy: Sidecar container providing service mesh capabilities
- Knative Serving: Provides serverless scaling, including scale-to-zero
- KServe: Manages model deployment, versioning, and inference endpoints
- Persistent Storage: Ensures data survives pod restarts (critical for SQLite)
.
βββ environments/ # Environment-specific Kubernetes manifests
β βββ dev/ # Development environment configs
β βββ stage/ # Staging environment configs
β βββ prod/ # Production configs (Helm values, manifests)
β βββ local/ # Local Kind cluster setup
β βββ aiq_detector/ # Model server deployment manifests
β βββ backend/ # Backend service K8s manifests
β β βββ *.yaml # Deployments, services, storage, Istio routing
β β βββ deploy.sh # Automated deployment script
β β βββ test-istio.sh # Test script with Istio integration
β βββ test/ # Demo model and test payloads
β βββ install_kserve_knative.sh # KServe/Knative installation
β βββ setup_ingress_routing.sh # Ingress configuration
β βββ setup_kind.sh # Kind cluster setup
β βββ README.md # Local environment documentation
βββ services/ # Microservices
β βββ backend/ # FastAPI backend service
β β βββ aiq_circular_detection/ # Main application package
β β βββ config/ # Configuration management
β β βββ tests/ # Unit and integration tests
β β βββ data/ # Backend data storage
β β βββ Dockerfile # Optimized multi-stage build
β β βββ docker-compose.yml # Docker Compose configuration
β β βββ pyproject.toml # Python project configuration
β β βββ uv.lock # Dependency lock file
β β βββ start-dev.sh # Development server startup
β β βββ start-dev-real.sh # Real mode development server
β β βββ test_full_integration.sh # Integration tests
β βββ evaluation/ # Model evaluation module
β β βββ dataset/ # Evaluation dataset
β β βββ output/ # Evaluation results
β β βββ evaluate_model.py # Evaluation script
β β βββ requirements.txt # Evaluation dependencies
β β βββ run_evaluation.sh # Execution script
β β βββ README.md # Evaluation documentation
βββ run_all.sh # Unified runner script (local/kind modes)
βββ run_all_k8s.sh # Kubernetes deployment script
βββ README.md # This file
- Docker Desktop with Kubernetes enabled
kind- Kubernetes in Dockerkubectl- Kubernetes CLIkustomize- Tool for customizing Kubernetes YAML configurationsjq- JSON processoruv- Fast Python package manager (installation guide)just- Modern command runner (installation guide) - Required
For rapid development and testing without Kubernetes or Docker, run all services locally on your machine:
# Run all services locally (model server + backend + tests)
just devThis command will:
- Start the AI model server on port 9090
- Start the backend API service on port 8000
- Run integration tests automatically
- Run model evaluation (if dataset is available)
- Display performance metrics summary
- Keep services running for manual testing
For testing in a real Kubernetes environment:
# Set up Kubernetes development environment (one-time setup)
just dev --k8s # or just dev -k
# Run tests on Kubernetes (auto-detects and sets up if needed)
just test --k8s # or just test -k
# Clean up when done
just clean --k8s # or just clean -kAdvanced Options:
# Clean first, then setup fresh infrastructure
just dev --k8s --clean
# Force delete entire cluster
just clean --k8s --forceKey Features:
just test --k8sautomatically sets up infrastructure if it doesn't existjust dev --k8ssets up and keeps infrastructure running for development- Infrastructure persists between test runs for faster iteration
- Smart auto-detection prevents redundant setup
What happens during setup:
- Creates a Kind cluster with proper configuration
- Installs KServe, Knative, Istio, and Cert-Manager
- Sets up ingress routing for Kind
- Builds and deploys the model server
- Builds and deploys the backend service
- Runs integration tests
- Performs model evaluation (optional)
- Place your evaluation dataset in
services/evaluation/dataset/ - Include
_annotations.coco.jsonand image files - The script will automatically run evaluation and display results
If you prefer to run services individually:
-
Start the Model Server:
# Interactive mode (blocks terminal) just model-server # Background mode (non-blocking) just model-server --background just model-server -b # Short flag # Model server will run on http://localhost:9090 # Swagger UI: http://localhost:9090/docs
-
Start the Backend Service:
# Dummy mode (no model server required) just backend just backend --background # Background mode just backend -b # Background mode (short) # Real mode (requires model server running) just backend --real # Interactive just backend --real --background # Background just backend -r -b # Real + background (short flags) # Backend API will run on http://localhost:8000 # API Docs: http://localhost:8000/docs
-
Run Integration Tests:
# Automatic mode (starts services, runs tests, cleans up) just test # Manual mode (assumes services already running) just test --manual just test -m # Short flag # Run tests on Kubernetes (auto-setup if needed) just test --k8s just test -k # Short flag
-
Cleanup:
just clean # Stop all local services just clean --k8s # Clean up Kubernetes resources just clean -k # Clean up Kubernetes (short)
- Port conflicts: Use
just cleanto stop all services and free ports - Model download: First run downloads the AI model (~300MB)
- Logs: Use
just logsto view service logs or checklogs/directory - Service status: Use
just statusandjust healthto check service state - Dependencies: Use
just check-depsto verify required tools are installed - Cleanup: Use
just cleanfor local services,just clean --k8sfor Kubernetes - Help: Use
justto see all available commands organized by category
Once deployed, the backend service provides:
POST /images/- Upload image for circle detectionGET /images/{image_id}/objects- List detected objects for an imageGET /images/{image_id}/objects/{object_id}- Get object detailsGET /health- Health check endpointGET /docs- Interactive API documentation (Swagger UI)
The backend service can operate in two modes:
- Dummy Mode (default): Returns mock detection results for testing
- Real Mode: Connects to actual KServe model endpoint
Configure via environment variables:
MODE: "dummy" # or "real"
MODEL_SERVER_URL: "http://model-service.namespace.svc.cluster.local"For production environments:
- Replace SQLite with PostgreSQL or MySQL for multi-replica support
- Use cloud storage (S3, GCS, Azure Storage) instead of local filesystem
- Configure proper ingress with TLS certificates and domain names
- Set resource limits and autoscaling policies
- Enable monitoring with Prometheus and distributed tracing
See environments/local/backend/DEPLOYMENT.md for detailed deployment instructions.
The project includes a comprehensive evaluation module to assess model performance on circular object detection tasks.
The evaluation module uses industry-standard computer vision metrics:
-
Jaccard Index (IoU): Measures overlap between predicted and ground truth regions
- Simple and weighted averages computed
- Range: 0 to 1 (higher is better)
-
F1 Score: Balances precision and recall
- Precision: Ratio of correct detections to total detections
- Recall: Ratio of detected objects to total ground truth objects
- Uses IoU threshold of 0.5 for matching
-
Hungarian Assignment: Optimally matches predictions to ground truth objects
-
Prepare dataset in COCO format:
cd services/evaluation mkdir -p dataset # Copy COCO annotations and images cp /path/to/_annotations.coco.json dataset/ cp /path/to/images/*.jpg dataset/
-
Run evaluation:
cd services/evaluation ./run_evaluation.sh
The evaluation generates:
- Detailed metrics report (precision, recall, F1 score, Jaccard Index)
- Annotated images showing predictions (red) vs ground truth (green)
- Per-image performance breakdowns
See services/evaluation/README.md for detailed documentation.
All development tasks are managed through Just commands:
# View all available commands organized by category
just
# Development workflow
just dev # Start local development environment
just dev --k8s # Start Kubernetes development environment
just test # Run integration tests (auto-manages services)
just test --manual # Run tests (assumes services running)
just test --k8s # Run tests on Kubernetes (auto-setup if needed)
# Service management
just model-server # Start model server (interactive)
just model-server --background # Start model server (background)
just model-server -b # Start model server (background, short)
just backend # Start backend (dummy mode)
just backend --real # Start backend (real mode)
just backend --real --background # Real mode + background
just backend -r -b # Real mode + background (short flags)
just clean # Clean up all local services
just clean --k8s # Clean up Kubernetes resources
# Evaluation and testing
just eval # Run model evaluation (requires dataset)
just lint # Run code linting
just pytest # Run unit tests only
# Utilities
just status # Check service status
just health # Check service health
just logs # View local service logs
just k8s-logs # View Kubernetes service logs
just endpoints # Show service endpoints
just install-deps # Install development dependencies
just check-deps # Verify required tools are installedJust provides significant advantages over traditional build tools:
- π Intuitive Flags:
just model-server --backgroundvs separate commands - π Short Flags:
just backend -r -b(real + background),just test -k(Kubernetes) - π Flag Combinations: Mix and match flags naturally (
--k8s --clean) - π§ Smart Auto-Detection:
just test --k8ssets up infrastructure if needed - π Cross-Platform: Works consistently across macOS, Linux, and Windows
- β‘ Modern Syntax: Cleaner, more readable command definitions
- π Organized Help: Commands grouped by category with
just
# Running Tests
cd services/backend && pytest
# Building Docker Image
cd services/backend && docker build -t aiq-circular-detection:latest .
# Local Development
cd services/backend && ./start-dev.shFor detailed development instructions, see the individual README files in:
environments/local/aiq_detector/README.md- Model server detailsenvironments/local/backend/DEPLOYMENT.md- Backend deployment guideservices/backend/README.md- Backend development guideservices/evaluation/README.md- Evaluation tools guide
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests and ensure the deployment works
- Submit a pull request