Skip to content

austinchima/PRPilot

Repository files navigation

πŸ¦… MergeHawk

AI-powered pull request reviewer that watches every PR from above. Automatically analyzes diffs using Google Gemini and posts clear, actionable feedback as native GitHub PR Reviews β€” just like having a senior engineer review every PR in seconds.

Self-hosted, open-source, and free. Deploy your own instance, use your own API key, and keep your code private.

.NET Cloud Run Docker License


Why MergeHawk?

MergeHawk CodeRabbit / Others
Cost Free β€” you only pay for Gemini API usage (near-free with Flash Lite) $15–30/seat/month
Privacy Self-hosted β€” your code never leaves your infrastructure Code sent to third-party servers
Model Google Gemini (configurable) Locked to one provider
Control Full control over rate limits, model, and behavior Vendor-controlled
Setup Deploy once, install on any repo Per-repo onboarding

Features

  • πŸ¦… Native PR Reviews β€” Posts formal Pull Request Reviews (not issue comments), integrating into GitHub's review UI
  • βœ… Skip Indicators β€” Posts visible "Skipped (Clean)" or "Skipped (Rate Limited)" reviews so you always know the bot ran
  • πŸ”„ Auto-Approve (opt-in) β€” Clean PRs can receive a formal Approve review to satisfy branch protection rules
  • πŸ€– AI Code Review β€” Analyzes diffs using Google Gemini with structured, severity-tagged feedback (⚠️ Critical Β· πŸ”§ Suggestion Β· πŸ’‘ Nit)
  • πŸ”‡ 5-Layer Noise Reduction β€” Webhook debouncing, diff-hash dedup, bot comment awareness, AI opt-out, and file filtering
  • ⚑ Fast β€” Reviews typically complete in under 30 seconds
  • πŸ”’ Secure β€” HMAC-SHA256 webhook validation, GitHub App JWT auth, scoped installation tokens
  • πŸ“Š Rate Limiting β€” Per-installation hourly caps protect your API budget
  • πŸ” Incremental Reviews β€” On new pushes, only reviews what changed and avoids repeating feedback
  • πŸ›‘οΈ Retry with Backoff β€” Exponential backoff on transient network errors for reliability

Self-Hosting Guide

MergeHawk is designed to be self-hosted. You deploy your own instance, use your own API keys, and have full control. Here's how:

Prerequisites

Step 1: Create a GitHub App

  1. Go to GitHub Settings β†’ Developer Settings β†’ GitHub Apps β†’ New GitHub App
  2. Fill in:
    • App name: Choose a unique name (e.g., YourName-MergeHawk)
    • Homepage URL: Your repo URL
    • Webhook URL: https://your-cloud-run-url/api/github/webhooks (update after deploying)
    • Webhook secret: Generate a strong random string (save it!)
  3. Permissions:
    • Pull Requests: Read & Write
    • Contents: Read
  4. Subscribe to events:
    • Pull request
  5. Click Create GitHub App
  6. Note your App ID from the app settings page
  7. Generate a Private Key (downloads a .pem file β€” keep it safe!)

Step 2: Set Up GCP

# Create a project (or use an existing one)
gcloud projects create your-project-id
gcloud config set project your-project-id

# Enable required APIs
gcloud services enable run.googleapis.com
gcloud services enable cloudbuild.googleapis.com
gcloud services enable secretmanager.googleapis.com
gcloud services enable artifactregistry.googleapis.com

# Create Artifact Registry repository
gcloud artifacts repositories create mergehawk \
    --repository-format=docker \
    --location=us-east1

# Store secrets
echo -n "your-webhook-secret" | gcloud secrets create github-webhook-secret --data-file=-
echo -n "your-app-id" | gcloud secrets create github-app-id --data-file=-
gcloud secrets create github-app-private-key --data-file=path/to/your-private-key.pem
echo -n "your-gemini-api-key" | gcloud secrets create google-api-key --data-file=-

Step 3: Deploy

# Clone the repo
git clone https://github.com/austinchima/PRPilot.git
cd PRPilot

# Build & push the container image
gcloud builds submit \
    --tag us-east1-docker.pkg.dev/your-project-id/mergehawk/mergehawk-api:latest .

# Deploy to Cloud Run
gcloud run deploy mergehawk-api \
    --image us-east1-docker.pkg.dev/your-project-id/mergehawk/mergehawk-api:latest \
    --region us-east1 \
    --platform managed \
    --allow-unauthenticated \
    --port 8080 \
    --min-instances 0 \
    --max-instances 10 \
    --memory 1Gi \
    --cpu 1 \
    --timeout 300 \
    --concurrency 80 \
    --set-env-vars="ASPNETCORE_ENVIRONMENT=Production,Gemini__Model=gemini-3.1-flash-lite" \
    --set-secrets="GitHub__WebhookSecret=github-webhook-secret:latest,GitHub__AppId=github-app-id:latest,GitHub__PrivateKey=github-app-private-key:latest,Gemini__ApiKey=google-api-key:latest"

Step 4: Connect

  1. Copy your Cloud Run service URL from the deploy output
  2. Go back to your GitHub App settings
  3. Update the Webhook URL to: https://your-service-url/api/github/webhooks
  4. Install the app on your repositories (Settings β†’ Install App β†’ select repos)
  5. Open a PR and watch MergeHawk review it! πŸ¦…

Cost Estimate

Component Cost
Cloud Run Free tier: 2M requests/month, 360K vCPU-seconds
Gemini Flash Lite Free tier: 1500 requests/day. Paid: ~$0.01 per review
Cloud Build Free tier: 120 build-minutes/day
Secret Manager Free tier: 10K access operations/month

For a small team (< 50 PRs/day), your total cost is effectively $0.


Run Locally

# 1. Configure secrets
cd src/PRPilot.API
dotnet user-secrets set "GitHub:WebhookSecret" "your-webhook-secret"
dotnet user-secrets set "GitHub:AppId" "your-github-app-id"
dotnet user-secrets set "GitHub:PrivateKey" "your-github-private-key-pem"
dotnet user-secrets set "Gemini:ApiKey" "your-gemini-key"

# 2. Run
dotnet run

The API starts at http://localhost:5267. Use ngrok to expose it for GitHub webhook testing:

ngrok http 5267

Run Tests

# Unit + infrastructure tests
dotnet test

# Integration tests (requires live API credentials)
export GitHub__AppId="your-app-id"
export GitHub__PrivateKey="your-private-key-pem"
export GitHub__InstallationId="your-installation-id"
export TEST_REPO="owner/repo"
export TEST_PR_NUMBER="1"
export Gemini__ApiKey="your-gemini-key"
dotnet test tests/PRPilot.IntegrationTests

Build & Run with Docker

# Build image
docker build -t mergehawk:latest .

# Run container
docker run -p 8080:8080 \
  -e ASPNETCORE_ENVIRONMENT=Development \
  -e GitHub__AppId=your-app-id \
  -e GitHub__PrivateKey="your-private-key" \
  -e GitHub__WebhookSecret=your-secret \
  -e Gemini__ApiKey=your-key \
  mergehawk:latest

Configuration

MergeHawk supports the following configuration options in appsettings.json or via environment variables:

"PRPilot": {
  "PostSkipReviews": true,          // Post visible "Skipped" reviews when no issues found
  "AutoApprove": false,             // Auto-approve clean PRs (formal Approve event)
  "MaxReviewsPerInstallationPerHour": 10  // Rate limit per GitHub App installation
},
"Gemini": {
  "Model": "gemini-3.1-flash-lite", // AI model (configurable)
  "ApiKey": ""                      // Inject via secrets
}

Noise Reduction

MergeHawk implements five layers of intelligent noise reduction to avoid spamming pull requests:

Layer Mechanism Location
1. Webhook Debouncing Rapid successive synchronize events for the same PR within a 30-second window are collapsed into one ReviewProcessor
2. Diff-Hash Deduplication SHA-256 hash of the diff is cached (24h TTL); identical diffs from force-pushes are skipped ReviewBackgroundWorker
3. Bot Comment Awareness On updated PRs, previous bot comments are fetched and passed to the AI to avoid repeating feedback ReviewBackgroundWorker + GitHubClientAdapter
4. AI Opt-Out The AI can return a [NO_COMMENT] sentinel to skip posting on trivial changes (typo fixes, whitespace, import reordering) GeminiCodeReviewService
5. Diff Filtering Lockfiles, generated code, .min.js, protobuf output, and designer files are stripped before AI analysis GeminiCodeReviewService

Architecture

flowchart TB
 subgraph GitHub["GitHub"]
        GH["GitHub PR Event"]
        GC_GH["Post PR Review"]
  end
 subgraph API["MergeHawk.API"]
        WH["Webhook Endpoint<br>/api/github/webhooks"]
        HC["Health Check<br>/health"]
  end
 subgraph Infra["MergeHawk.Infrastructure"]
        RP["ReviewProcessor<br>(WebhookEventProcessor)"]
        CH["Channel&lt;ReviewRequest&gt;<br>(Unbounded Queue)"]
        BW["ReviewBackgroundWorker<br>(BackgroundService)"]
        GC["GeminiCodeReviewService<br>(Gemini 3.1 Flash Lite)"]
        AA["GitHubAppAuthenticator<br>(JWT + Installation Tokens)"]
        GA["GitHubClientAdapter<br>(Octokit)"]
  end
 subgraph Core["MergeHawk.Core"]
        RR["ReviewRequest Model<br>Repo Β· PR # Β· URL Β· Action"]
  end
 subgraph GCP["GCP Cloud Run"]
    direction TB
        API
        Infra
        Core
  end
 subgraph Secrets["GCP Secret Manager"]
        S1["github-webhook-secret"]
        S2["github-app-credentials"]
        S3["google-api-key"]
  end
    GH -- POST webhook --> WH
    WH --> RP
    RP -- Validate + Debounce --> CH
    CH -- Dequeue --> BW
    BW -- Analyze diff --> GC
    GC -- Review or NO_COMMENT --> BW
    BW -- Request Auth --> AA
    AA -- Create App Client --> GA
    BW -- Post review --> GA
    GA -- API Call --> GC_GH
    S1 -. Env Var .-> WH
    S2 -. Env Var .-> AA
    S3 -. Env Var .-> GC
    RP --> RR
Loading

Tech Stack

Layer Technology
Runtime .NET 10 Β· ASP.NET Core Minimal APIs
AI Gemini 3.1 Flash Lite (via Google GenAI SDK, configurable)
Auth GitHub App JWT + Installation Tokens (RSA-signed)
Webhooks Octokit.Webhooks.AspNetCore
GitHub API Octokit.net
Logging Serilog (Console sink for Cloud Run)
Queue System.Threading.Channels
Caching IMemoryCache (debounce, dedup, token cache, rate limiting)
Testing xUnit Β· Moq Β· WebApplicationFactory Β· BenchmarkDotNet
Hosting GCP Cloud Run (serverless)
CI/CD GitHub Actions (CI: build + test, CD: deploy to Cloud Run)
Build GCP Cloud Build / Docker β†’ Artifact Registry
Secrets GCP Secret Manager
Container Docker multi-stage build

Project Structure

MergeHawk/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ PRPilot.API/              # Entrypoint β€” webhook routes, health check, DI
β”‚   β”‚   └── Program.cs
β”‚   β”œβ”€β”€ PRPilot.Core/             # Domain models & shared utilities
β”‚   β”‚   β”œβ”€β”€ Models/
β”‚   β”‚   β”‚   └── ReviewRequest.cs
β”‚   β”‚   └── Utilities/
β”‚   β”‚       └── LogSanitizer.cs
β”‚   └── PRPilot.Infrastructure/   # Core logic & external integrations
β”‚       β”œβ”€β”€ Interfaces/
β”‚       β”‚   β”œβ”€β”€ ICodeReviewService.cs
β”‚       β”‚   β”œβ”€β”€ IGitHubAppAuthenticator.cs
β”‚       β”‚   └── IGitHubClient.cs
β”‚       β”œβ”€β”€ GeminiCodeReviewService.cs  # AI review generation + prompt engineering
β”‚       β”œβ”€β”€ GitHubAppAuthenticator.cs   # JWT auth + installation token caching + retry
β”‚       β”œβ”€β”€ GitHubClientAdapter.cs      # GitHub API communication (Adapter Pattern)
β”‚       β”œβ”€β”€ ReviewBackgroundWorker.cs   # Async consumer β€” orchestration, dedup, rate limiting
β”‚       └── ReviewProcessor.cs          # Webhook producer β€” filtering, debouncing
β”œβ”€β”€ tests/
β”‚   β”œβ”€β”€ PRPilot.Core.Tests/            # LogSanitizer tests
β”‚   β”œβ”€β”€ PRPilot.Infrastructure.Tests/  # ReviewProcessor action filtering, debounce, null-safety
β”‚   β”œβ”€β”€ PRPilot.UnitTests/             # BackgroundWorker pipeline, error resilience
β”‚   β”œβ”€β”€ PRPilot.IntegrationTests/      # Live GitHub auth, diff fetch, review post, webhook signatures
β”‚   └── PRPilot.Benchmarks/            # Logger guard-clause perf benchmarks
β”œβ”€β”€ .github/workflows/
β”‚   β”œβ”€β”€ ci.yml                         # CI: build + test on PR & push to main
β”‚   └── cd.yml                         # CD: build container, deploy to Cloud Run
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ deploy.ps1
└── PRPilot.slnx

Design Decisions

  • Channel<T> for Queueing: Used System.Threading.Channels instead of a heavy message broker (like RabbitMQ or Pub/Sub) to keep the architecture simple and cost-effective. It provides a lightweight, thread-safe, in-memory queue perfectly suited for a single-instance Cloud Run deployment processing webhooks asynchronously.
  • sealed Classes: The ReviewProcessor and ReviewRequest classes are marked as sealed to prevent unintended inheritance. This improves runtime performance slightly (due to devirtualization) and clearly communicates the design intent that these components are leaf nodes in the hierarchy.
  • Adapter Pattern: Implemented GitHubClientAdapter and GeminiCodeReviewService behind interfaces (IGitHubClient, ICodeReviewService) to decouple the infrastructure from specific SDKs, making it easier to swap out providers in the future.
  • Hosted Service: The ReviewBackgroundWorker runs as a long-running BackgroundService, ensuring that the API returns a 200 OK to GitHub immediately after queueing, while the heavy processing happens out-of-band.
  • Webhook Signature Validation: All incoming GitHub webhooks are validated with HMAC-SHA256 to ensure authenticity and prevent spoofing.
  • RSA Key Handling: The JWT signing flow exports RSA parameters and creates a fresh key instance to avoid ObjectDisposedException caused by RsaSecurityKey holding a reference to a disposed RSA object.
  • Per-Installation Rate Limiting: Each GitHub App installation is capped at a configurable number of reviews/hour (default: 10) using IMemoryCache counters to prevent runaway Gemini API costs from external installations.
  • Native PR Reviews over Issue Comments: Using GitHub's Pull Request Review API instead of issue comments ensures the bot integrates into the standard review workflow, shows up in the "Reviews" tab, and can satisfy branch protection rules.
  • Exponential Backoff Retry: The GitHub token exchange includes retry with exponential backoff (1s, 2s, 4s) to handle transient network errors on Cloud Run cold starts.

Roadmap

  • AI review pipeline with Gemini integration
  • Webhook signature validation
  • Background async processing
  • GitHub App authentication (JWT + installation tokens)
  • Webhook debouncing & diff-hash deduplication
  • AI-driven opt-out for trivial changes
  • Bot comment awareness (incremental review context)
  • Per-installation rate limiting
  • Native Pull Request Reviews (CodeRabbit-style)
  • Skip indicators & auto-approve
  • CI/CD pipelines (GitHub Actions)
  • Exponential backoff retry for transient errors
  • Support for multiple AI providers (OpenAI, Anthropic)
  • Multi-file context analysis
  • Custom review rules per repository
  • SignalR dashboard for real-time review monitoring

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages