Skip to content

feat: Scout — batch territory qualification pipeline#6

Merged
DarlingtonDeveloper merged 4 commits into
mainfrom
feat/scout
Mar 24, 2026
Merged

feat: Scout — batch territory qualification pipeline#6
DarlingtonDeveloper merged 4 commits into
mainfrom
feat/scout

Conversation

@DarlingtonDeveloper
Copy link
Copy Markdown
Owner

@DarlingtonDeveloper DarlingtonDeveloper commented Mar 24, 2026

Summary

  • Scout pipeline: Three-tier funnel (header scan → Gemini qualification → full Lighthouse analysis) that qualifies a territory list in minutes instead of days
  • HTML sanitisation: Strips prompt injection surfaces (inline scripts, event handlers, comments, data URIs) before any LLM call — retrofitted into existing pipeline too
  • Prospect discovery: Search bar that uses Gemini to find prospect URLs from a vertical + geography query (e.g. "UK e-commerce companies")
  • Scout UI: Streaming results table with live progress, skip summaries, and links to full reports
  • 57 unit tests covering sanitise, tier1, tier2, and pipeline orchestrator
  • Pitch script: docs/PITCH.md for Lighthouse positioning

Test plan

  • Run npm run test — 57 new tests should pass
  • Go to /scout, search "UK e-commerce companies", verify URLs populate
  • Run Scout on populated URLs, verify tier1 → tier2 streaming results
  • Verify tier3 full analysis links work for top prospects
  • Verify M&S-style detection accuracy (Next.js, Azure, Akamai, Contentful)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Launched Scout—a batch territory qualification tool for scanning multiple URLs and analyzing prospect potential
    • Added prospect discovery search to identify companies matching specific criteria
    • Introduced three-tier scanning pipeline delivering domain assessment, qualification scores, and tech stack analysis with real-time progress tracking
    • Scout results now display comprehensive analysis including deal scores, detected technologies, and analysis readiness status
  • Documentation

    • Added pitch script for Scout feature demonstrations

DarlingtonDeveloper and others added 4 commits March 24, 2026 23:13
Three-tier funnel: header scan (no LLM), quick Gemini qualification,
and full Lighthouse analysis for top prospects. Includes HTML
sanitisation for LLM safety, SSE streaming API, and Scout UI.
57 unit tests covering sanitise, tier1, tier2, and pipeline.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The preview model was deprecated. Also adds /api/scout/discover
endpoint that uses Gemini to find prospect URLs from a vertical +
geography query (e.g. "UK e-commerce companies").

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Search bar lets users type a query like "UK e-commerce companies"
instead of manually pasting URLs. Hits the discover endpoint and
populates the URL textarea with results.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
lighthouse Ready Ready Preview, Comment Mar 24, 2026 11:14pm

@DarlingtonDeveloper DarlingtonDeveloper merged commit aaae722 into main Mar 24, 2026
2 of 5 checks passed
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 24, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a172ecc-2727-4dc9-b6fa-aadb6312cf41

📥 Commits

Reviewing files that changed from the base of the PR and between c8d0d57 and cc83db9.

📒 Files selected for processing (21)
  • app/api/scout/discover/route.ts
  • app/api/scout/route.ts
  • app/layout.tsx
  • app/page.tsx
  • app/scout/page.tsx
  • components/scout-input.tsx
  • components/scout-progress.tsx
  • components/scout-results-table.tsx
  • docs/PITCH.md
  • lib/__tests__/sanitise.test.ts
  • lib/gemini/detect-tech-stack.ts
  • lib/gemini/qualify-prospect.ts
  • lib/sanitise.ts
  • lib/scout/__tests__/pipeline.test.ts
  • lib/scout/__tests__/tier1.test.ts
  • lib/scout/__tests__/tier2.test.ts
  • lib/scout/pipeline.ts
  • lib/scout/tier1.ts
  • lib/scout/tier2-schema.ts
  • lib/scout/tier2.ts
  • lib/scout/types.ts

📝 Walkthrough

Walkthrough

Introduces a complete three-tier territory qualification pipeline ("Scout") comprising Tier 1 framework detection, Tier 2 Gemini-based deal scoring, and optional Tier 3 deep analysis. Adds API endpoints for discovery and scanning, React UI components for input/progress/results display, HTML sanitization utilities for LLM processing, and extensive test coverage.

Changes

Cohort / File(s) Summary
API Routes
app/api/scout/discover/route.ts, app/api/scout/route.ts
POST endpoint for company discovery via Gemini; SSE-streaming route for multi-tier prospect scanning with configurable options and real-time progress emission.
Scout Pipeline Core
lib/scout/pipeline.ts, lib/scout/tier1.ts, lib/scout/tier2.ts
Three-tier async generator implementing URL deduplication, Tier 1 framework detection via HTTP headers/HTML signals, Tier 2 Gemini-based qualification, and optional Tier 3 deep analysis with Cortex persistence.
Scout Data Structures
lib/scout/types.ts, lib/scout/tier2-schema.ts
TypeScript interfaces for Tier1Result, Tier2Result, ScoutResult, and ScoutStreamEvent; Zod schema for Gemini Tier 2 output validation.
Frontend Pages & Components
app/scout/page.tsx, components/scout-*.tsx
Scout page orchestrating discovery and scanning workflow; input component with URL/query entry and discovery; progress tracker with pulsing indicators; results table with skip summary and link-based navigation.
HTML Sanitization
lib/sanitise.ts, lib/gemini/detect-tech-stack.ts, lib/gemini/qualify-prospect.ts
HTML sanitization module removing comments, event handlers, scripts/styles, and data URIs; integrated into tech stack detection and careers-page scraping for safe LLM input.
Navigation & Data Model
app/layout.tsx, app/page.tsx
Added Scout link to navbar; extended ProspectNode with optional id field and improved key derivation in list rendering.
Test Coverage
lib/__tests__/sanitise.test.ts, lib/scout/__tests__/*.test.ts
Comprehensive test suites for HTML sanitization, Tier 1/2 qualification logic, and full pipeline event streaming and Cortex interaction.
Documentation
docs/PITCH.md
Scripted pitch narrative describing end-to-end product flow, value propositions, and objection handling.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API as POST /api/scout
    participant Tier1 as Tier 1 Scan
    participant Cortex as Cortex DB
    participant Tier2 as Tier 2 Qualify
    participant Gemini as Gemini LLM
    participant PageSpeed as PageSpeed API
    participant Tier3 as Tier 3 Fetch/Analyze

    Client->>API: POST { urls, tier3_limit, ... }
    API->>API: Normalize & deduplicate URLs
    API->>Cortex: Query previously analyzed prospects
    API->>Tier1: scanTier1Batch(urls)
    Tier1->>Tier1: Fetch & detect frameworks
    Tier1->>API: Emit tier1 events
    
    API->>Tier2: qualifyTier2(tier1 results)
    Tier2->>Tier2: Sanitize HTML
    Tier2->>Gemini: generateObject(sanitized_html)
    Gemini->>Tier2: {framework, deal_score, promote_to_tier3}
    Tier2->>API: Emit tier2 events
    
    API->>Cortex: Store qualified prospects (score≥50)
    
    alt Tier 3 Enabled & Candidates
        API->>Tier3: Fetch pages for promotion candidates
        Tier3->>PageSpeed: Fetch performance metrics
        Tier3->>Gemini: Tech stack & value analysis
        Tier3->>Cortex: Store Tier 3 results
        Tier3->>API: Emit tier3 events
    end
    
    API->>Cortex: Store scan summary
    API->>Client: Emit complete event (SSE)
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 Hops through prospects with Scout so keen,
Three tiers deep where signals gleam—
Framework frames and scores divine,
Gemini whispers, deal lines shine,
From chaos, qualified prospects spring!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/scout

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant