Kubernetes AutoHeal Operator

This is a custom Kubernetes controller (operator) written in Golang that automatically monitors pod health across the cluster. It detects when a pod falls into a CrashLoopBackOff state and takes corrective action by forcefully restarting the pod.

Why this exists

This project demonstrates cloud-native systems engineering, familiarity with Kubernetes internals, client-go, and reconciliation loops.

MVP Phase 1 Features

Authenticates and connects to Kubernetes cluster (in-cluster or via kubeconfig).
Polls all namespaces for Pods on a periodic basis.
Detects when a Container falls into CrashLoopBackOff.
Automatically deletes the problematic Pod so that the backing ReplicaSet/Deployment starts a fresh instance.

Testing Locally

Prerequisites

Go 1.20+
A running Kubernetes cluster (e.g., kind, minikube, or Docker Desktop Kubernetes)
kubectl configured to communicate with your server

1. Build and Run the Operator

Clone this repository and run the operator locally using your local .kube/config:

go run main.go

(You should see: Starting K8s AutoHeal Operator...)

2. Simulate a Failing Pod

You can apply a faulty pod to see the auto-healing in action. In a new terminal, create a broken deployment:

kubectl apply -f manifest/broken-deployment.yaml

Wait a few minutes. You should notice that the pod enters a CrashLoopBackOff loop using kubectl get pods. Within ~15 seconds of the pod hitting CrashLoopBackOff, the operator will display:

[ALERT] Pod broken-pod-xxxx in namespace default is in CrashLoopBackOff. Proceeding to heal...
[ACTION] Deleting Pod broken-pod-xxxx to force restart...
[SUCCESS] Pod broken-pod-xxxx successfully deleted for restart.

3. Deploy in-cluster (Future)

Build the docker image:

docker build -t k8podguard:v1 .

Role-Based Access Control (RBAC) must be set up so the operator has permission to list and delete pods inside the cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
manifest		manifest
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kubernetes AutoHeal Operator

Why this exists

MVP Phase 1 Features

Testing Locally

Prerequisites

1. Build and Run the Operator

2. Simulate a Failing Pod

3. Deploy in-cluster (Future)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Kubernetes AutoHeal Operator

Why this exists

MVP Phase 1 Features

Testing Locally

Prerequisites

1. Build and Run the Operator

2. Simulate a Failing Pod

3. Deploy in-cluster (Future)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages