feat: Scout — batch territory qualification pipeline by DarlingtonDeveloper · Pull Request #6 · DarlingtonDeveloper/lighthouse

DarlingtonDeveloper · 2026-03-24T23:13:35Z

Summary

Scout pipeline: Three-tier funnel (header scan → Gemini qualification → full Lighthouse analysis) that qualifies a territory list in minutes instead of days
HTML sanitisation: Strips prompt injection surfaces (inline scripts, event handlers, comments, data URIs) before any LLM call — retrofitted into existing pipeline too
Prospect discovery: Search bar that uses Gemini to find prospect URLs from a vertical + geography query (e.g. "UK e-commerce companies")
Scout UI: Streaming results table with live progress, skip summaries, and links to full reports
57 unit tests covering sanitise, tier1, tier2, and pipeline orchestrator
Pitch script: docs/PITCH.md for Lighthouse positioning

Test plan

Run npm run test — 57 new tests should pass
Go to /scout, search "UK e-commerce companies", verify URLs populate
Run Scout on populated URLs, verify tier1 → tier2 streaming results
Verify tier3 full analysis links work for top prospects
Verify M&S-style detection accuracy (Next.js, Azure, Akamai, Contentful)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Launched Scout—a batch territory qualification tool for scanning multiple URLs and analyzing prospect potential
- Added prospect discovery search to identify companies matching specific criteria
- Introduced three-tier scanning pipeline delivering domain assessment, qualification scores, and tech stack analysis with real-time progress tracking
- Scout results now display comprehensive analysis including deal scores, detected technologies, and analysis readiness status
Documentation
- Added pitch script for Scout feature demonstrations

Three-tier funnel: header scan (no LLM), quick Gemini qualification, and full Lighthouse analysis for top prospects. Includes HTML sanitisation for LLM safety, SSE streaming API, and Scout UI. 57 unit tests covering sanitise, tier1, tier2, and pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The preview model was deprecated. Also adds /api/scout/discover endpoint that uses Gemini to find prospect URLs from a vertical + geography query (e.g. "UK e-commerce companies"). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Search bar lets users type a query like "UK e-commerce companies" instead of manually pasting URLs. Hits the discover endpoint and populates the URL textarea with results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vercel · 2026-03-24T23:13:40Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
lighthouse	Ready	Preview, Comment	Mar 24, 2026 11:14pm

coderabbitai · 2026-03-24T23:18:35Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a172ecc-2727-4dc9-b6fa-aadb6312cf41

📥 Commits

Reviewing files that changed from the base of the PR and between c8d0d57 and cc83db9.

📒 Files selected for processing (21)

app/api/scout/discover/route.ts
app/api/scout/route.ts
app/layout.tsx
app/page.tsx
app/scout/page.tsx
components/scout-input.tsx
components/scout-progress.tsx
components/scout-results-table.tsx
docs/PITCH.md
lib/__tests__/sanitise.test.ts
lib/gemini/detect-tech-stack.ts
lib/gemini/qualify-prospect.ts
lib/sanitise.ts
lib/scout/__tests__/pipeline.test.ts
lib/scout/__tests__/tier1.test.ts
lib/scout/__tests__/tier2.test.ts
lib/scout/pipeline.ts
lib/scout/tier1.ts
lib/scout/tier2-schema.ts
lib/scout/tier2.ts
lib/scout/types.ts

📝 Walkthrough

Walkthrough

Introduces a complete three-tier territory qualification pipeline ("Scout") comprising Tier 1 framework detection, Tier 2 Gemini-based deal scoring, and optional Tier 3 deep analysis. Adds API endpoints for discovery and scanning, React UI components for input/progress/results display, HTML sanitization utilities for LLM processing, and extensive test coverage.

Changes

Cohort / File(s)	Summary
API Routes `app/api/scout/discover/route.ts`, `app/api/scout/route.ts`	POST endpoint for company discovery via Gemini; SSE-streaming route for multi-tier prospect scanning with configurable options and real-time progress emission.
Scout Pipeline Core `lib/scout/pipeline.ts`, `lib/scout/tier1.ts`, `lib/scout/tier2.ts`	Three-tier async generator implementing URL deduplication, Tier 1 framework detection via HTTP headers/HTML signals, Tier 2 Gemini-based qualification, and optional Tier 3 deep analysis with Cortex persistence.
Scout Data Structures `lib/scout/types.ts`, `lib/scout/tier2-schema.ts`	TypeScript interfaces for `Tier1Result`, `Tier2Result`, `ScoutResult`, and `ScoutStreamEvent`; Zod schema for Gemini Tier 2 output validation.
Frontend Pages & Components `app/scout/page.tsx`, `components/scout-*.tsx`	Scout page orchestrating discovery and scanning workflow; input component with URL/query entry and discovery; progress tracker with pulsing indicators; results table with skip summary and link-based navigation.
HTML Sanitization `lib/sanitise.ts`, `lib/gemini/detect-tech-stack.ts`, `lib/gemini/qualify-prospect.ts`	HTML sanitization module removing comments, event handlers, scripts/styles, and data URIs; integrated into tech stack detection and careers-page scraping for safe LLM input.
Navigation & Data Model `app/layout.tsx`, `app/page.tsx`	Added Scout link to navbar; extended `ProspectNode` with optional `id` field and improved key derivation in list rendering.
Test Coverage `lib/__tests__/sanitise.test.ts`, `lib/scout/__tests__/*.test.ts`	Comprehensive test suites for HTML sanitization, Tier 1/2 qualification logic, and full pipeline event streaming and Cortex interaction.
Documentation `docs/PITCH.md`	Scripted pitch narrative describing end-to-end product flow, value propositions, and objection handling.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as POST /api/scout
    participant Tier1 as Tier 1 Scan
    participant Cortex as Cortex DB
    participant Tier2 as Tier 2 Qualify
    participant Gemini as Gemini LLM
    participant PageSpeed as PageSpeed API
    participant Tier3 as Tier 3 Fetch/Analyze

    Client->>API: POST { urls, tier3_limit, ... }
    API->>API: Normalize & deduplicate URLs
    API->>Cortex: Query previously analyzed prospects
    API->>Tier1: scanTier1Batch(urls)
    Tier1->>Tier1: Fetch & detect frameworks
    Tier1->>API: Emit tier1 events
    
    API->>Tier2: qualifyTier2(tier1 results)
    Tier2->>Tier2: Sanitize HTML
    Tier2->>Gemini: generateObject(sanitized_html)
    Gemini->>Tier2: {framework, deal_score, promote_to_tier3}
    Tier2->>API: Emit tier2 events
    
    API->>Cortex: Store qualified prospects (score≥50)
    
    alt Tier 3 Enabled & Candidates
        API->>Tier3: Fetch pages for promotion candidates
        Tier3->>PageSpeed: Fetch performance metrics
        Tier3->>Gemini: Tech stack & value analysis
        Tier3->>Cortex: Store Tier 3 results
        Tier3->>API: Emit tier3 events
    end
    
    API->>Cortex: Store scan summary
    API->>Client: Emit complete event (SSE)

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 Hops through prospects with Scout so keen,
Three tiers deep where signals gleam—
Framework frames and scores divine,
Gemini whispers, deal lines shine,
From chaos, qualified prospects spring! ✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/scout

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

DarlingtonDeveloper and others added 4 commits March 24, 2026 23:13

feat: add prospect search bar to Scout UI

fc03bdd

Search bar lets users type a query like "UK e-commerce companies" instead of manually pasting URLs. Hits the discover endpoint and populates the URL textarea with results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: add pitch script for Lighthouse

cc83db9

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

DarlingtonDeveloper merged commit aaae722 into main Mar 24, 2026
2 of 5 checks passed

vercel Bot deployed to Preview March 24, 2026 23:14 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Scout — batch territory qualification pipeline#6

feat: Scout — batch territory qualification pipeline#6
DarlingtonDeveloper merged 4 commits into
mainfrom
feat/scout

DarlingtonDeveloper commented Mar 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

vercel Bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DarlingtonDeveloper commented Mar 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

vercel Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated Code Review Effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DarlingtonDeveloper commented Mar 24, 2026 •

edited by coderabbitai Bot

Loading

vercel Bot commented Mar 24, 2026 •

edited

Loading

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading