This is a custom Kubernetes controller (operator) written in Golang that automatically monitors pod health across the cluster. It detects when a pod falls into a CrashLoopBackOff state and takes corrective action by forcefully restarting the pod.
This project demonstrates cloud-native systems engineering, familiarity with Kubernetes internals, client-go, and reconciliation loops.
- Authenticates and connects to Kubernetes cluster (in-cluster or via kubeconfig).
- Polls all namespaces for Pods on a periodic basis.
- Detects when a Container falls into
CrashLoopBackOff. - Automatically deletes the problematic Pod so that the backing ReplicaSet/Deployment starts a fresh instance.
- Go 1.20+
- A running Kubernetes cluster (e.g.,
kind,minikube, or Docker Desktop Kubernetes) kubectlconfigured to communicate with your server
Clone this repository and run the operator locally using your local .kube/config:
go run main.go(You should see: Starting K8s AutoHeal Operator...)
You can apply a faulty pod to see the auto-healing in action. In a new terminal, create a broken deployment:
kubectl apply -f manifest/broken-deployment.yamlWait a few minutes. You should notice that the pod enters a CrashLoopBackOff loop using kubectl get pods.
Within ~15 seconds of the pod hitting CrashLoopBackOff, the operator will display:
[ALERT] Pod broken-pod-xxxx in namespace default is in CrashLoopBackOff. Proceeding to heal...
[ACTION] Deleting Pod broken-pod-xxxx to force restart...
[SUCCESS] Pod broken-pod-xxxx successfully deleted for restart.
Build the docker image:
docker build -t k8podguard:v1 .Role-Based Access Control (RBAC) must be set up so the operator has permission to list and delete pods inside the cluster.