Skip to content

sourrris/g-rat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

G-RAT — GitHub Review Agent Tool

An automated PR code reviewer powered by LLMs, RAG, and LangChain. G-RAT listens to GitHub webhooks, analyzes diffs for security vulnerabilities (OWASP Top 10), performance regressions, and refactoring opportunities, then posts inline review comments.

Architecture

GitHub PR Event
      │
      ▼
POST /webhook  ──► HMAC-SHA256 verification ──► 202 Accepted
                              │
                              ▼ (background task)
                    ReviewOrchestrator
                         │
                    ┌────┴────────────────────────────┐
                    │   Parse diff → FileDiff objects  │
                    └────┬────────────────────────────┘
                         │
                    ┌────┴────────────────────────────┐
                    │   Pinecone RAG (3 retrievals)    │
                    └────┬────────────────────────────┘
                         │
              ┌──────────┼──────────┐
              ▼          ▼          ▼
        Security    Performance  Refactor
        Analyzer    Analyzer     Analyzer
       (GPT-4o)    (GPT-4o)     (GPT-4o)
       with_structured_output() × 3
              │          │          │
              └──────────┼──────────┘
                         ▼
              Merge + Deduplicate + Sort
                         │
                         ▼
            POST /repos/{repo}/pulls/{pr}/reviews
                 (single atomic review)

Tech Stack

Component Technology
API server FastAPI + uvicorn
HTTP client httpx (async)
LLM OpenAI GPT-4o via LangChain
Structured output ChatModel.with_structured_output()
Vector DB Pinecone
Embeddings OpenAI text-embedding-3-small
Observability LangSmith + Prometheus
Packaging Docker (multi-stage)
CI/CD GitHub Actions

Quick Start

1. Clone and configure

git clone https://github.com/yourname/g-rat.git
cd g-rat
cp .env.example .env
# Edit .env with your API keys

2. Install dependencies

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

3. Seed the RAG knowledge base

python scripts/seed_patterns.py

4. Run locally

# With Docker Compose (recommended)
docker compose -f docker/docker-compose.yml up

# Or directly
python -m src.main

5. Test with a simulated webhook

python scripts/simulate_webhook.py

Environment Variables

Variable Required Description
GITHUB_WEBHOOK_SECRET HMAC secret configured in GitHub webhook settings
GITHUB_TOKEN PAT with pull_requests: write scope
OPENAI_API_KEY OpenAI API key
PINECONE_API_KEY Pinecone API key
PINECONE_INDEX_NAME Index name (default: g-rat-patterns)
LANGCHAIN_API_KEY LangSmith key (for tracing)
LANGCHAIN_TRACING_V2 true to enable LangSmith
CONFIDENCE_THRESHOLD Min confidence to report finding (default: 0.6)
MAX_DIFF_BYTES Max diff size (default: 500000)
RAG_TOP_K Patterns to retrieve per query (default: 5)

GitHub Webhook Setup

  1. Go to your repository → Settings → Webhooks → Add webhook
  2. Set Payload URL to https://your-domain.com/webhook
  3. Content type: application/json
  4. Secret: same value as GITHUB_WEBHOOK_SECRET
  5. Events: select Pull requests

Running Tests

# All tests with coverage
pytest tests/ -v

# Unit tests only
pytest tests/unit/ -v

# Integration tests only
pytest tests/integration/ -v

API Endpoints

Method Path Description
POST /webhook GitHub PR webhook receiver
GET /health Liveness check
GET /metrics Prometheus metrics

What G-RAT Detects

Security (OWASP Top 10)

  • SQL/LDAP/command injection
  • XSS (reflected and stored)
  • Hardcoded secrets and weak cryptography
  • Broken access control
  • SSRF vulnerabilities
  • Security misconfiguration

Performance

  • N+1 database queries
  • Synchronous I/O in async contexts
  • Unbounded loops and recursion
  • Missing pagination
  • Inefficient data structures

Refactoring

  • DRY violations
  • Magic numbers/strings
  • Deep nesting (use guard clauses)
  • Long functions
  • Missing type hints

Key Design Decisions

  1. Return 202 immediately — GitHub expects responses within 10s; LLM review takes 15–60s
  2. Single atomic review — One GitHub "Create a review" API call posts all comments together (no notification spam)
  3. with_structured_output() — Eliminates brittle JSON parsing; LLM returns typed Pydantic objects
  4. Parallel analyzer fan-outasyncio.gather runs all 3 analyzers concurrently
  5. Confidence threshold — Configurable filter (default 0.6) to reduce hallucination noise
  6. Diff line validation — Drops comments on lines not in the diff to avoid GitHub 422 errors

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors