AI-powered Kubernetes monitoring and auto-healing platform. OpsAgent watches your cluster, diagnoses issues using LLMs, and heals crashed pods automatically — with Slack alerts delivered in real time.
Live Demo → opsagent-five.vercel.app
OpsAgent continuously monitors your Kubernetes cluster and uses Groq's Llama 3 model to diagnose pod failures. When a pod crashes, OpsAgent restarts it automatically and sends a Slack notification — no manual intervention needed.
- 🔍 Real-time pod monitoring via Kubernetes API
- 🧠 AI diagnostics powered by Groq / Llama 3.3 70B
- 🔧 Auto-healing — detects crashed pods and restarts them automatically
- 💬 Slack alerts on every heal action
- 📊 React dashboard with live event stream and cluster health overview
- 🖥️ Textual TUI for terminal-based monitoring
| Layer | Tech |
|---|---|
| Backend | Python, FastAPI |
| AI | Groq API (llama-3.3-70b-versatile) |
| Kubernetes | minikube, kubectl, Python k8s client |
| Frontend | React, Tailwind CSS, Framer Motion |
| Notifications | Slack Webhooks |
| Containerization | Docker |
- Python 3.10+
- Node.js 18+
- Docker Desktop
- minikube
- A Groq API key → console.groq.com
- A Slack Webhook URL → Slack Incoming Webhooks
git clone https://github.com/shayannab/opsagent.git
cd opsagentCreate a .env file in the backend/ directory:
GROQ_API_KEY=your_groq_api_key
SLACK_WEBHOOK_URL=your_slack_webhook_urlminikube startcd backend
pip install -r requirements.txt
uvicorn main:app --reload --port 8000cd frontend
npm install
npm run devFrontend runs at http://localhost:3000
- OpsAgent polls your minikube cluster every few seconds
- If a pod enters
CrashLoopBackOfforFailedstate, it's flagged - Groq / Llama 3 diagnoses the failure and generates a summary
- OpsAgent restarts the pod via the Kubernetes API
- A Slack notification is sent with pod name, status, and AI diagnosis
- Go to api.slack.com/apps and create an app
- Enable Incoming Webhooks and add it to your workspace
- Copy the webhook URL
- Paste it as
SLACK_WEBHOOK_URLin your.envfile
opsagent/
├── frontend/ # React App
│ └── src/
├── charts/ # Helm charts
├── models/ # Data models
├── routes/ # FastAPI route handlers
├── services/ # Business logic & integrations
├── tests/ # Test suite
├── main.py # FastAPI app entrypoint
├── worker.py # Background worker
├── start.py # Startup script
├── Dockerfile
├── requirements.txt
└── README.md
Built by Shayanna
MIT