G-RAT — GitHub Review Agent Tool

An automated PR code reviewer powered by LLMs, RAG, and LangChain. G-RAT listens to GitHub webhooks, analyzes diffs for security vulnerabilities (OWASP Top 10), performance regressions, and refactoring opportunities, then posts inline review comments.

Architecture

GitHub PR Event
      │
      ▼
POST /webhook  ──► HMAC-SHA256 verification ──► 202 Accepted
                              │
                              ▼ (background task)
                    ReviewOrchestrator
                         │
                    ┌────┴────────────────────────────┐
                    │   Parse diff → FileDiff objects  │
                    └────┬────────────────────────────┘
                         │
                    ┌────┴────────────────────────────┐
                    │   Pinecone RAG (3 retrievals)    │
                    └────┬────────────────────────────┘
                         │
              ┌──────────┼──────────┐
              ▼          ▼          ▼
        Security    Performance  Refactor
        Analyzer    Analyzer     Analyzer
       (GPT-4o)    (GPT-4o)     (GPT-4o)
       with_structured_output() × 3
              │          │          │
              └──────────┼──────────┘
                         ▼
              Merge + Deduplicate + Sort
                         │
                         ▼
            POST /repos/{repo}/pulls/{pr}/reviews
                 (single atomic review)

Tech Stack

Component	Technology
API server	FastAPI + uvicorn
HTTP client	httpx (async)
LLM	OpenAI GPT-4o via LangChain
Structured output	`ChatModel.with_structured_output()`
Vector DB	Pinecone
Embeddings	OpenAI `text-embedding-3-small`
Observability	LangSmith + Prometheus
Packaging	Docker (multi-stage)
CI/CD	GitHub Actions

Quick Start

1. Clone and configure

git clone https://github.com/yourname/g-rat.git
cd g-rat
cp .env.example .env
# Edit .env with your API keys

2. Install dependencies

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

3. Seed the RAG knowledge base

python scripts/seed_patterns.py

4. Run locally

# With Docker Compose (recommended)
docker compose -f docker/docker-compose.yml up

# Or directly
python -m src.main

5. Test with a simulated webhook

python scripts/simulate_webhook.py

Environment Variables

Variable	Required	Description
`GITHUB_WEBHOOK_SECRET`	✓	HMAC secret configured in GitHub webhook settings
`GITHUB_TOKEN`	✓	PAT with `pull_requests: write` scope
`OPENAI_API_KEY`	✓	OpenAI API key
`PINECONE_API_KEY`	✓	Pinecone API key
`PINECONE_INDEX_NAME`		Index name (default: `g-rat-patterns`)
`LANGCHAIN_API_KEY`		LangSmith key (for tracing)
`LANGCHAIN_TRACING_V2`		`true` to enable LangSmith
`CONFIDENCE_THRESHOLD`		Min confidence to report finding (default: `0.6`)
`MAX_DIFF_BYTES`		Max diff size (default: `500000`)
`RAG_TOP_K`		Patterns to retrieve per query (default: `5`)

GitHub Webhook Setup

Go to your repository → Settings → Webhooks → Add webhook
Set Payload URL to https://your-domain.com/webhook
Content type: application/json
Secret: same value as GITHUB_WEBHOOK_SECRET
Events: select Pull requests

Running Tests

# All tests with coverage
pytest tests/ -v

# Unit tests only
pytest tests/unit/ -v

# Integration tests only
pytest tests/integration/ -v

API Endpoints

Method	Path	Description
`POST`	`/webhook`	GitHub PR webhook receiver
`GET`	`/health`	Liveness check
`GET`	`/metrics`	Prometheus metrics

What G-RAT Detects

Security (OWASP Top 10)

SQL/LDAP/command injection
XSS (reflected and stored)
Hardcoded secrets and weak cryptography
Broken access control
SSRF vulnerabilities
Security misconfiguration

Performance

N+1 database queries
Synchronous I/O in async contexts
Unbounded loops and recursion
Missing pagination
Inefficient data structures

Refactoring

DRY violations
Magic numbers/strings
Deep nesting (use guard clauses)
Long functions
Missing type hints

Key Design Decisions

Return 202 immediately — GitHub expects responses within 10s; LLM review takes 15–60s
Single atomic review — One GitHub "Create a review" API call posts all comments together (no notification spam)
with_structured_output() — Eliminates brittle JSON parsing; LLM returns typed Pydantic objects
Parallel analyzer fan-out — asyncio.gather runs all 3 analyzers concurrently
Confidence threshold — Configurable filter (default 0.6) to reduce hallucination noise
Diff line validation — Drops comments on lines not in the diff to avoid GitHub 422 errors

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
docker		docker
patterns		patterns
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

G-RAT — GitHub Review Agent Tool

Architecture

Tech Stack

Quick Start

1. Clone and configure

2. Install dependencies

3. Seed the RAG knowledge base

4. Run locally

5. Test with a simulated webhook

Environment Variables

GitHub Webhook Setup

Running Tests

API Endpoints

What G-RAT Detects

Security (OWASP Top 10)

Performance

Refactoring

Key Design Decisions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

G-RAT — GitHub Review Agent Tool

Architecture

Tech Stack

Quick Start

1. Clone and configure

2. Install dependencies

3. Seed the RAG knowledge base

4. Run locally

5. Test with a simulated webhook

Environment Variables

GitHub Webhook Setup

Running Tests

API Endpoints

What G-RAT Detects

Security (OWASP Top 10)

Performance

Refactoring

Key Design Decisions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages