G-RAT — GitHub Review Agent Tool
An automated PR code reviewer powered by LLMs, RAG, and LangChain. G-RAT listens to GitHub webhooks, analyzes diffs for security vulnerabilities (OWASP Top 10), performance regressions, and refactoring opportunities, then posts inline review comments.
GitHub PR Event
│
▼
POST /webhook ──► HMAC-SHA256 verification ──► 202 Accepted
│
▼ (background task)
ReviewOrchestrator
│
┌────┴────────────────────────────┐
│ Parse diff → FileDiff objects │
└────┬────────────────────────────┘
│
┌────┴────────────────────────────┐
│ Pinecone RAG (3 retrievals) │
└────┬────────────────────────────┘
│
┌──────────┼──────────┐
▼ ▼ ▼
Security Performance Refactor
Analyzer Analyzer Analyzer
(GPT-4o) (GPT-4o) (GPT-4o)
with_structured_output() × 3
│ │ │
└──────────┼──────────┘
▼
Merge + Deduplicate + Sort
│
▼
POST /repos/{repo}/pulls/{pr}/reviews
(single atomic review)
Component
Technology
API server
FastAPI + uvicorn
HTTP client
httpx (async)
LLM
OpenAI GPT-4o via LangChain
Structured output
ChatModel.with_structured_output()
Vector DB
Pinecone
Embeddings
OpenAI text-embedding-3-small
Observability
LangSmith + Prometheus
Packaging
Docker (multi-stage)
CI/CD
GitHub Actions
git clone https://github.com/yourname/g-rat.git
cd g-rat
cp .env.example .env
# Edit .env with your API keys
python -m venv .venv
source .venv/bin/activate
pip install -e " .[dev]"
3. Seed the RAG knowledge base
python scripts/seed_patterns.py
# With Docker Compose (recommended)
docker compose -f docker/docker-compose.yml up
# Or directly
python -m src.main
5. Test with a simulated webhook
python scripts/simulate_webhook.py
Variable
Required
Description
GITHUB_WEBHOOK_SECRET
✓
HMAC secret configured in GitHub webhook settings
GITHUB_TOKEN
✓
PAT with pull_requests: write scope
OPENAI_API_KEY
✓
OpenAI API key
PINECONE_API_KEY
✓
Pinecone API key
PINECONE_INDEX_NAME
Index name (default: g-rat-patterns)
LANGCHAIN_API_KEY
LangSmith key (for tracing)
LANGCHAIN_TRACING_V2
true to enable LangSmith
CONFIDENCE_THRESHOLD
Min confidence to report finding (default: 0.6)
MAX_DIFF_BYTES
Max diff size (default: 500000)
RAG_TOP_K
Patterns to retrieve per query (default: 5)
Go to your repository → Settings → Webhooks → Add webhook
Set Payload URL to https://your-domain.com/webhook
Content type: application/json
Secret: same value as GITHUB_WEBHOOK_SECRET
Events: select Pull requests
# All tests with coverage
pytest tests/ -v
# Unit tests only
pytest tests/unit/ -v
# Integration tests only
pytest tests/integration/ -v
Method
Path
Description
POST
/webhook
GitHub PR webhook receiver
GET
/health
Liveness check
GET
/metrics
Prometheus metrics
SQL/LDAP/command injection
XSS (reflected and stored)
Hardcoded secrets and weak cryptography
Broken access control
SSRF vulnerabilities
Security misconfiguration
N+1 database queries
Synchronous I/O in async contexts
Unbounded loops and recursion
Missing pagination
Inefficient data structures
DRY violations
Magic numbers/strings
Deep nesting (use guard clauses)
Long functions
Missing type hints
Return 202 immediately — GitHub expects responses within 10s; LLM review takes 15–60s
Single atomic review — One GitHub "Create a review" API call posts all comments together (no notification spam)
with_structured_output() — Eliminates brittle JSON parsing; LLM returns typed Pydantic objects
Parallel analyzer fan-out — asyncio.gather runs all 3 analyzers concurrently
Confidence threshold — Configurable filter (default 0.6) to reduce hallucination noise
Diff line validation — Drops comments on lines not in the diff to avoid GitHub 422 errors