feat(web): Streamed code search #623

brendan-kellam · 2025-11-20T05:45:19Z

Summary by CodeRabbit

New Features
- Added regex pattern and case sensitivity toggle controls to search interface
- Implemented streaming search results with real-time updates
- Added comprehensive query language parser with advanced search operators
- Enhanced search with visibility and fork filtering options
Bug Fixes
- Fixed branch name display by removing ref path prefixes
Chores
- Updated MCP package with API changes
- Added gRPC integration for improved search backend communication

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-20T05:45:25Z

Walkthrough

This PR introduces a Lezer-based query language parser, converts the search API from string-based queries to intermediate representation (IR), implements streaming search via SSE, adds Zoekt gRPC proto definitions, updates search UI for regex and case-sensitivity flags, and refactors Prisma permission filtering to account-based scoping.

Changes

Cohort / File(s)	Summary
Query Language Package `packages/queryLanguage/*`	New `@sourcebot/query-language` workspace package with Lezer grammar (`query.grammar`), tokenizer (`tokens.ts`), generated parser (`parser.ts`, `parser.terms.ts`), index exports, and comprehensive test suites covering basic, grouping, negation, operators, precedence, prefixes, and quoted expressions.
Search API & IR Foundation `packages/web/src/features/search/index.ts`, `packages/web/src/features/search/types.ts`, `packages/web/src/features/search/ir.ts`, `packages/web/src/features/search/parser.ts`, `packages/web/src/features/search/searchApi.ts`	New/refactored search feature exports, unified type definitions (SearchRequest, SearchResponse, SearchStats, etc.), QueryIR visitor/traversal utilities, parseQuerySyntaxIntoIR for converting Lezer trees to IR, and refactored search/streamSearch functions using IR and permission filters.
Zoekt gRPC Integration `packages/web/src/features/search/zoektSearcher.ts`, `packages/web/src/proto/zoekt/webserver/v1/`, `packages/web/src/proto/webserver.ts`, `packages/web/src/proto/google/protobuf/`	New zoektSearcher with createZoektSearchRequest, zoektSearch, zoektStreamSearch functions, and comprehensive proto type definitions for Zoekt gRPC (And, Boost, Branch, FileMatch, Q, SearchResponse, WebserverService, etc.) and Google Protobuf types.
Search UI Components `packages/web/src/app/[domain]/components/searchBar/*`, `packages/web/src/app/[domain]/browse/layout.tsx`	Updated SearchBar to accept `defaults` object with `query`, `isRegexEnabled`, `isCaseSensitivityEnabled`; added case-sensitivity and regex toggles; refactored suggestion mode from "case"/"public" to "visibility"; updated search prefix constants and logic.
Streaming Search & Server Routes `packages/web/src/app/[domain]/search/useStreamedSearch.ts`, `packages/web/src/app/api/(server)/stream_search/route.ts`, `packages/web/src/app/[domain]/search/page.tsx`, `packages/web/src/app/[domain]/search/components/searchResultsPage.tsx`	New useStreamedSearch hook for SSE-based search with caching and cancellation; POST /stream_search route handler; updated SearchResultsPage to use streaming data (isStreaming, files, stats, timeToFirstSearchResultMs, timeToSearchCompletionMs); added isRegexEnabled/isCaseSensitivityEnabled to search params.
Search Results UI `packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx`, `packages/web/src/app/[domain]/search/components/filterPanel/*`	Converted SearchResultsPanel to forwardRef with resetScroll handle; refactored "show all matches" state to per-file-match map; added isStreaming and onFilterChange to FilterPanel; updated skeleton rendering during stream.
Code Navigation `packages/web/src/features/codeNav/api.ts`, `packages/web/src/features/codeNav/types.ts`	Refactored findSearchBasedSymbolReferences/findSearchBasedSymbolDefinitions to use IR queries (queryType: 'ir') instead of strings; updated language filter builder to return QueryIR; adjusted parse logic and file structure mapping.
File Source API & Chat Tools `packages/web/src/features/search/fileSourceApi.ts`, `packages/web/src/features/chat/tools.ts`, `packages/web/src/app/api/(client)/client.ts`	Updated fileSourceApi to construct IR queries; refactored search tool in chat/tools.ts to use new searchOptions structure; adjusted client API imports.
Database & Auth `packages/db/src/index.ts`, `packages/web/src/prisma.ts`	Added UserWithAccounts type alias (User & { accounts: Account[] }); refactored userScopedPrismaClientExtension to accept user parameter and use new getRepoPermissionFilterForUser helper for account-based permission filtering.
Proto & Type Utilities `packages/web/src/lib/utils.ts`, `packages/web/src/lib/errorCodes.ts`, `packages/web/src/lib/types.ts`, `packages/web/src/lib/posthogEvents.ts`	Added getCodeHostBrowseFileAtBranchUrl URL builder; added FAILED_TO_PARSE_QUERY error code; added SearchQueryParams enum members (isRegexEnabled, isCaseSensitivityEnabled); updated search_finished PosthogEventMap with new timing fields and flushReason type.
MCP Package `packages/mcp/src/index.ts`, `packages/mcp/src/schemas.ts`, `packages/mcp/CHANGELOG.md`	Updated search call to use isRegexEnabled and isCaseSensitivityEnabled flags instead of case: query suffix; refactored searchRequestSchema with searchOptionsSchema composition; updated changelog.
Web Package & Build `package.json`, `packages/web/package.json`, `Dockerfile`	Added `@sourcebot/query-language` workspace and build dependencies; added gRPC dependencies (`@grpc/grpc-js`, `@grpc/proto-loader`); added generate:protos script; updated Docker build stages for queryLanguage; removed SRC_TENANT_ENFORCEMENT_MODE=strict; added `@lezer/common` resolution pin.
Infrastructure & Config `.env.development`, `CHANGELOG.md`, `packages/queryLanguage/tsconfig.json`, `packages/queryLanguage/vitest.config.ts`, `packages/queryLanguage/.gitignore`, `packages/web/.eslintignore`, `packages/web/src/features/search/README.md`	Removed strict tenant enforcement mode; added MCP changelog entry; added TypeScript and Vitest configs for queryLanguage; added ESLint ignores for proto output; added search feature README documentation.
Removed/Deleted Files `packages/web/src/features/search/schemas.ts`, `packages/web/src/features/search/zoektClient.ts`, `packages/web/src/features/search/zoektSchema.ts`	Removed centralized Zod schemas file, zoektClient fetch wrapper, and Zoekt-specific schema definitions (moved to types.ts and new IR-based approach).
Type/Import Path Updates `packages/web/src/app/[domain]/components/lightweightCodeHighlighter.tsx`, `packages/web/src/app/[domain]/search/components/codePreviewPanel/`, `packages/web/src/app/[domain]/search/components/searchResultsPanel/`, `packages/web/src/features/codeNav/types.ts`, `packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts`, `packages/web/src/lib/extensions/searchResultHighlightExtension.ts`, `packages/web/src/ee/features/codeNav/components/*`	Consolidated import paths for search types (SourceRange, SearchResultFile, RepositoryInfo, etc.) from `@/features/search/types` to `@/features/search`; updated fileSourceRequestSchema imports to types subpath.
Backend `packages/backend/src/index.ts`	Removed debug log line for repeated shutdown signals.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant SearchPage as SearchPage
    participant useStreamedSearch as useStreamedSearch
    participant SearchAPI as /api/stream_search
    participant zoektSearcher as zoektSearcher
    participant ZoektGRPC as Zoekt gRPC

    Client->>SearchPage: Enter query + flags
    SearchPage->>useStreamedSearch: query, isRegexEnabled, isCaseSensitivityEnabled
    
    alt Cache Hit
        useStreamedSearch-->>SearchPage: Return cached results
    else Cache Miss
        useStreamedSearch->>SearchAPI: POST (SSE)
        SearchAPI->>SearchAPI: Parse query → IR (parseQuerySyntaxIntoIR)
        SearchAPI->>zoektSearcher: createZoektSearchRequest(IR)
        zoektSearcher->>ZoektGRPC: StreamSearch RPC
        
        ZoektGRPC-->>zoektSearcher: Stream chunk
        zoektSearcher->>zoektSearcher: Transform chunk + enrich repos
        zoektSearcher-->>SearchAPI: data: {files, stats, ...}
        SearchAPI-->>useStreamedSearch: SSE chunk
        useStreamedSearch->>useStreamedSearch: Update state
        useStreamedSearch-->>SearchPage: Partial results
        
        ZoektGRPC-->>zoektSearcher: Final/Error
        zoektSearcher-->>SearchAPI: Final/Error response
        SearchAPI-->>useStreamedSearch: SSE final
        useStreamedSearch->>useStreamedSearch: Cache results
        useStreamedSearch-->>SearchPage: Final results
    end
    
    SearchPage->>SearchPage: Render results

sequenceDiagram
    participant Input as Query Input
    participant Lezer as Lezer Parser
    participant transformTreeToIR as transformTreeToIR
    participant expandContext as expandSearchContext
    participant IROutput as QueryIR Output

    Input->>Lezer: "file:test.ts content:bug -archived:yes"
    Lezer-->>transformTreeToIR: Parse tree (PrefixExpr, Term, NegateExpr, etc.)
    
    transformTreeToIR->>transformTreeToIR: Walk tree recursively
    
    alt PrefixExpr
        transformTreeToIR->>transformTreeToIR: Extract prefix type (FileExpr, ContentExpr, etc.)
        transformTreeToIR->>transformTreeToIR: Build IR node (regexp, substring, raw_config, etc.)
    else Term
        transformTreeToIR->>transformTreeToIR: Check isRegexEnabled / isCaseSensitivityEnabled
        transformTreeToIR->>transformTreeToIR: Create regexp or substring query
    else ContextExpr
        transformTreeToIR->>expandContext: Resolve context name
        expandContext-->>transformTreeToIR: Repository IDs
        transformTreeToIR->>transformTreeToIR: Build repo_set query
    end
    
    transformTreeToIR-->>IROutput: QueryIR (And / Or / Not / PrefixExpr nodes)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

Query language grammar and parser (packages/queryLanguage/src/query.grammar, packages/queryLanguage/src/parser.ts, packages/queryLanguage/src/tokens.ts): Complex Lezer-based grammar with tokenization logic; verify precedence, operator handling, and edge cases in negation/prefix parsing.
parseQuerySyntaxIntoIR implementation (packages/web/src/features/search/parser.ts): Dense transformation logic with context expansion, case/regex sensitivity handling, and error path complexity; review all prefix transformations and error codes.
Zoekt gRPC integration (packages/web/src/features/search/zoektSearcher.ts): Streaming error handling, chunk accumulation, repository enrichment, and gRPC client lifecycle; verify SSE encoding and chunk/final/error response ordering.
Streaming search hook (packages/web/src/app/[domain]/search/useStreamedSearch.ts): SSE parsing, cache TTL validation, AbortController cancellation, and error handling paths; review cache key construction and error wrapping.
SearchResultsPanel refactoring (packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx): Conversion to forwardRef, per-file-match state map persistence, and debounce logic changes; verify resetScroll behavior and history restoration.
Prisma permission filtering (packages/web/src/prisma.ts): Shift from account-ID arrays to account-aware user scoping; verify permission edge cases and fallback to public repos logic.
Proto definitions (packages/web/src/proto/zoekt/webserver/v1/*, packages/web/src/proto/webserver.ts): Large number of generated type definitions; spot-check a few for correctness of field types and nested structures.

Possibly related PRs

V4 #311: Directly related—both modify search subsystem (query-language package, Zoekt gRPC protos, streaming search, search API/zoekt integration).
chore(worker): Refactor permission syncing join table to be between Account <> Repo #600: Related to Prisma user-scoping refactor (UserWithAccounts, getRepoPermissionFilterForUser, account→repo permission filtering).
Search contexts #273: Related to search context expansion and ContextExpr query handling in the parser.

Suggested labels

sourcebot-team

Suggested reviewers

msukkari

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(web): Streamed code search' clearly describes the main feature being introduced - streaming functionality for code search in the web package.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bkellam/streamed_search

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…ort streaming

…ults

…arch api

brendan-kellam · 2025-11-21T02:45:52Z

@coderabbitai review

coderabbitai · 2025-11-21T02:45:58Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 10

🧹 Nitpick comments (28)

packages/web/src/app/[domain]/components/pathHeader.tsx (1)
236-236: LGTM! Clean branch name display.

The regex transformation correctly strips Git's symbolic reference prefixes (refs/heads/ or refs/tags/) to display user-friendly branch names. The implementation is safe and follows standard Git UI conventions.

Recommend extracting to a utility function—this pattern is already duplicated.

Verification found the same transformation in packages/web/src/app/[domain]/repos/components/repoBranchesTable.tsx (line 39). Consider extracting to a shared utility to reduce duplication:
// In lib/utils.ts or similar
export const formatBranchDisplayName = (branchName: string): string => {
  return branchName.replace(/^refs\/(heads|tags)\//, '');
};
Update both locations:

packages/web/src/app/[domain]/components/pathHeader.tsx (line 236)

packages/web/src/app/[domain]/repos/components/repoBranchesTable.tsx (line 39)
packages/mcp/src/schemas.ts (1)
32-35: Consider validating the query field.

The query field accepts any string including empty strings. Depending on how queries are processed, you may want to ensure non-empty queries or add appropriate validation.

If empty queries should be rejected, apply this diff:
 export const searchRequestSchema = z.object({
-    query: z.string(),                                // The zoekt query to execute.
+    query: z.string().min(1),                         // The zoekt query to execute.
     ...searchOptionsSchema.shape,
 });
packages/queryLanguage/tsconfig.json (1)
21-21: Consider broadening the include pattern.

The current configuration only includes src/index.ts, which means TypeScript will only compile files transitively imported from the index. Generated parser files in src/parser/ may not be fully type-checked if they contain exports that aren't used.

Consider changing to:
-    "include": ["src/index.ts"],
+    "include": ["src/**/*.ts"],
This ensures all TypeScript files, including generated parser code, are properly type-checked during compilation.
packages/queryLanguage/src/query.grammar (1)
23-30: Negation currently cannot target bare terms (only prefixes/paren groups)

NegateExpr only accepts negate before PrefixExpr or ParenExpr, so something like -foo won’t ever become a NegateExpr node; it’ll only work if you always express exclusions as -file:..., -repo:..., or -(...). If the intended UX is that -foo excludes results matching a plain term, consider broadening this rule:
-NegateExpr { !negate negate (PrefixExpr | ParenExpr) }
+NegateExpr { !negate negate expr }
This would allow negation of any expr (including bare Term) while still supporting prefixes and grouped subexpressions.
packages/queryLanguage/test/prefixes.txt (1)

1-335: Prefix test coverage is strong; consider a few more operator combinations

This suite hits a wide range of prefix/value shapes (short forms, wildcards, regexy patterns, invalid values), which is great. If you expect users to mix prefixes heavily with or and negation, it could be worth adding a couple of cases like:

file:test.js or repo:myproject

-file:test.js lang:typescript

(file:*.ts or file:*.tsx) lang:typescript

to lock in parse shapes for those combinations.

packages/queryLanguage/test/basic.txt (1)

1-71: Solid basic coverage; a couple of extra lexical edge cases might help

These cases do a nice job exercising plain terms, adjacency, and regex-like patterns. If you want to harden the lexer further, you could optionally add inputs like:

order or or1 (ensure only standalone or becomes the operator)

-foo bar (whatever behavior you intend around leading dashes)

hello_or_world

to document how the grammar treats borderline operator/word collisions.

packages/queryLanguage/test.ts (1)

1-46: Turn this into an automated test and use the public parser entrypoint

The pretty‑print/reconstruction logic is fine for a quick sanity check, but as a committed file it’s more useful if:

It’s wired into your actual test runner (e.g., moved under test/ with an assertion instead of console logging).

It imports the parser from the package’s public surface (e.g., ./src/index or the published module name) rather than the generated ./src/parser, so it exercises what consumers will use.

Doing that would turn this from a one‑off debug script into a stable regression test for the grammar.
packages/queryLanguage/src/tokens.ts (1)
27-58: Remove duplicate comment.

Lines 28 and 32 contain identical comments. Remove the duplicate to improve code clarity.

Apply this diff:
 // Check if followed by a prefix keyword (by checking for keyword followed by colon)
-// Look ahead until we hit a delimiter or colon
 const checkPos = input.pos;
 let foundColon = false;
 
 // Look ahead until we hit a delimiter or colon
 while (ch >= 0) {
Consider clarifying EOF handling.

The lookahead loop condition ch >= 0 (Line 33) correctly handles EOF (which returns a negative value in Lezer), but this behavior isn't documented. Consider adding a comment to make the EOF handling explicit.

Example:
 // Look ahead until we hit a delimiter or colon
-while (ch >= 0) {
+while (ch >= 0) { // ch < 0 indicates EOF
packages/web/src/features/search/zoektSearcher.ts (6)
171-318: Streaming lifecycle mostly robust; consider explicit guards around controller usage

The SSE streaming implementation correctly:

Tracks in-flight chunks (pendingChunks) and defers closing until both isStreamActive is false and pendingChunks === 0.

Pauses/resumes the gRPC stream to avoid completing before async processing is done.

Converts processing errors into typed error chunks rather than tearing down the stream abruptly.

Two small robustness considerations:
Post-error gRPC stream lifetime
In the per-chunk catch, isStreamActive is set to false and an error chunk is enqueued, but the underlying gRPC stream is neither cancelled nor closed explicitly. You do close the client via tryCloseController, so this is likely fine, but explicitly cancelling the stream can free resources sooner:
} catch (error) {
  // ...
  isStreamActive = false;
grpcStream?.cancel();
}
2. **Cancel callback and finalization**  
The `cancel()` handler sets `isStreamActive = false` and calls `grpcStream.cancel()`/`client.close()`, but doesn’t invoke `tryCloseController`. That’s acceptable (consumer has cancelled), but if you want symmetric behavior (e.g., always emit a final “done” marker on server-side aborts), consider calling `tryCloseController()` once in-flight chunks drain.

Both are optional refinements; the current implementation should behave correctly for the common happy-path and error-path cases.

---

`329-368`: **Per-chunk repo lookup and caching are solid; consider minor ergonomics**

The repo resolution logic per chunk behaves well:

- Per-request cache (`_reposMapCache`) avoids redundant DB hits across chunks.
- Per-chunk `reposMap` limits lookups to just the repositories actually present in that response.
- Numeric vs string ids are handled via `getRepoIdForFile`.

Two optional improvements:

- You could avoid the extra `reposMap` Map and just reuse the shared cache for lookups, since you already filter by presence before transforming files, but the current separation is clear and not problematic.
- If Prisma throws (e.g., transient DB issues), that error will bubble into `zoektStreamSearch`’s per-chunk `catch`, which is good; combined with the fix suggested for `zoektSearch`, error propagation is consistent across both search modes.

No required changes here.

---

`370-446`: **Strict “repo not found” behavior can hard-fail searches**

In `transformZoektSearchResponse`, if a file’s repository can’t be found in the DB/cache, you throw:

```ts
if (!repo) {
 throw new Error(`Repository not found for file: ${file.file_name}`);
}
Given out-of-sync states between Zoekt indices and your Repo table are possible (e.g., repo deleted in DB but still indexed, temporary ingestion lag), this will hard-fail the whole search.

Consider a more forgiving behavior, e.g.:

Skip such files and continue, or

Emit them with a sentinel repo name/id and surface a warning elsewhere.

Example adjustment:
-        if (!repo) {
-            throw new Error(`Repository not found for file: ${file.file_name}`);
-        }
+        if (!repo) {
+            // Skip files whose repositories are missing in the DB to avoid failing the entire search.
+            return undefined;
+        }
...
-    }).filter(file => file !== undefined);
+    }).filter((file): file is SearchResultFile => file !== undefined);
This keeps the search resilient to transient data inconsistencies.

448-480: Double-check units and semantics for duration/wait and flushReason

The stats mapping looks coherent, but relies on external Zoekt semantics:

duration, wait, matchTree* fields are populated from the .nanos part only. If Zoekt ever returns non-zero seconds, these will under-report timings.

flushReason is set via response.stats?.flush_reason?.toString() and then used as a string; this matches the z.string() type, but you might want to standardize on the enum values directly (no .toString()) for easier downstream handling.

Both are minor and can be left as-is, but worth verifying against Zoekt’s current SearchStats definition and the behavior of your gRPC/proto loader.

544-572: Stats accumulation across stream chunks may overcount depending on Zoekt semantics

accumulateStats currently adds all numeric fields (including totalMatchCount, fileCount, duration fields, etc.) across chunks, and keeps the first non-unknown flushReason.

This is correct only if Zoekt’s streamed SearchStats are per-chunk deltas. If Zoekt instead sends cumulative stats (“so far”) for each flush, summing will overcount and could:

Inflate totalMatchCount, fileCount, etc.

Distort isSearchExhaustive in the streaming final response, which compares accumulatedStats.totalMatchCount to accumulatedStats.actualMatchCount.

If Zoekt’s stats are cumulative, the accumulator should instead take the last stats object (or max per field) rather than summing.

Please confirm from the Zoekt docs or implementation whether SearchStats in streamed responses are deltas or cumulative, and adjust accumulateStats accordingly.

513-541: gRPC client construction: URL parsing and path robustness

Overall the client setup is clear and matches the generated types. Two small robustness points:
Port extraction from ZOEKT_WEBSERVER_URL

If the URL omits an explicit port, zoektUrl.port will be '', leading to an address like hostname:. You might want a fallback:
const grpcAddress = ${zoektUrl.hostname}:${zoektUrl.port};

const port = zoektUrl.port || (zoektUrl.protocol === 'https:' ? '443' : '80');

const grpcAddress = ${zoektUrl.hostname}:${port};
2. **Proto path based on `process.cwd()`**

Using `process.cwd()` with `'../../vendor/zoekt/grpc/protos'` assumes a specific working-directory layout at runtime. If this ever runs from a different cwd (e.g., Next.js production server root), proto loading will fail. Consider anchoring paths via a config/env var, or document the assumption clearly.

Both are defensive improvements; the current code is fine if your deployment environment guarantees cwd and URL shape.

</blockquote></details>
<details>
<summary>packages/web/src/features/search/types.ts (1)</summary><blockquote>

`33-56`: **SearchStats shape matches Zoekt stats mapping**

The `searchStatsSchema` fields and comments mirror the Zoekt `SearchStats` fields you populate in `transformZoektSearchResponse`. That alignment is good.

Optional improvement: if you want stronger validation, you could constrain these to non-negative integers, e.g.:

```ts
actualMatchCount: z.number().int().nonnegative(),
for the count fields and similar for the byte counters, while keeping simple z.number() for timing values if they might be fractional in the future.
packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx (1)

72-78: Cosmetic dependency array reordering.

The dependency array for branchDisplayName was reordered from [isBranchFilteringEnabled, branches] to [branches, isBranchFilteringEnabled]. While this doesn't affect React's memoization behavior, it may be for consistency with a linting rule or code style.

packages/web/src/features/search/fileSourceApi.ts (1)

8-38: Validate IR shape and regex escaping for repository / fileName

The QueryIR shape here looks reasonable, but both repo.regexp and the file regexp.regexp are being populated directly from repository and fileName. If these values can contain regex metacharacters (e.g. +, ?, [], etc.), this will change the match semantics compared to a literal path/repo match and could even produce invalid patterns, depending on Zoekt’s expectations.

If the intent is “exact repo” and “exact file path”:

Consider normalizing both to anchored, escaped patterns, e.g. ^<escapedRepo>$ / ^<escapedPath>$, or

Use an IR helper that treats these as literal names rather than raw regexes (if such an abstraction exists in your new IR layer).

This is especially important for getFileSource, which is effectively a “fetch exact file” path rather than a fuzzy search.

Also applies to: 41-47

packages/web/src/app/[domain]/search/components/filterPanel/index.tsx (1)

15-20: onFilterChange / isStreaming integration looks solid

The new props are plumbed cleanly: onFilterChange is only invoked when query params actually change, and isStreaming is consistently passed down to both Filter components. This should make it straightforward for the parent to react to filter changes during streaming without introducing extra renders.

If you find the repo/language onEntryClicked blocks diverging over time, consider extracting the “toggle and sync URL + fire onFilterChange” logic into a small shared helper, but that’s optional.

Also applies to: 36-38, 42-44, 137-187
packages/web/src/prisma.ts (1)
3-3: Repo permission filter extraction looks correct; consider safer where composition

The refactor to accept user?: UserWithAccounts and delegate repo scoping to getRepoPermissionFilterForUser keeps the semantics clear: repo queries under the feature flag are constrained to repositories that are either permitted to one of the user’s accounts or public, with unauthenticated/no-account users seeing only public repos.

One nuance to be aware of: in repo.$allOperations, the new filter is merged via:
argsWithWhere.where = {
  ...(argsWithWhere.where || {}),
  ...getRepoPermissionFilterForUser(user),
};
If callers ever provide their own OR in argsWithWhere.where, this spread will overwrite it with the permission filter’s OR. Today that may be fine (or pre-existing), but if you expect more complex repo predicates in the future, it might be safer to wrap these in an explicit AND: [originalWhere, permissionFilter] instead.

Not a blocker, but worth keeping in mind as you expand repo query usage.

Also applies to: 35-52, 61-85
packages/web/src/app/[domain]/components/searchBar/searchBar.tsx (1)
95-104: Consider syncing regex/case flags from updated defaults like query

defaultIsRegexEnabled / defaultIsCaseSensitivityEnabled are only used as initial state; if the parent updates defaults (e.g. when navigating back/forward and rebuilding search params), the toggles won’t follow, while query is explicitly resynced via useEffect. If the intent is for the UI to always reflect the URL/props, consider adding similar effects for the flags:
useEffect(() => {
    setIsRegexEnabled(defaultIsRegexEnabled);
}, [defaultIsRegexEnabled]);

useEffect(() => {
    setIsCaseSensitivityEnabled(defaultIsCaseSensitivityEnabled);
}, [defaultIsCaseSensitivityEnabled]);
Or confirm that flags are intentionally “sticky” across navigation and don’t need to follow defaults.

Also applies to: 113-115, 119-133
packages/web/src/app/[domain]/search/useStreamedSearch.ts (2)

9-35: Cache key & entry shape are tightly coupled to current SearchRequest and omit stats

Two small things to consider:

createCacheKey manually lists a subset of SearchRequest fields. If SearchRequest later grows with result-affecting fields (e.g. repo scope, domain, filters), the cache will start cross-contaminating queries unless this function is updated in lockstep. You might want to either:

Base the key on the entire SearchRequest object (e.g. JSON.stringify(params)), or

Add a unit test that fails when SearchRequest changes without updating createCacheKey.

CacheEntry doesn’t store stats, and the cached setState path also omits stats, so repeated identical queries served from cache will never have stats populated even if the original streamed run did. If the UI relies on stats, consider adding it to CacheEntry and wiring it through in both the cache write and read paths.

Also applies to: 91-106, 233-245

152-231: SSE parsing assumes single-line data: events and only breaks inner loop on [DONE]

The SSE loop works for the current “data: <JSON>\n\n per event” shape, but it has a few assumptions:

message.match(/^data: (.+)$/) only handles a single data: line per event and ignores any additional lines/fields. If the server ever emits multi-line data: payloads or other SSE fields, this parser will start dropping information.

When encountering [DONE], you break only the inner for (const message of messages) loop; the outer while continues reading until the stream naturally ends. If the server keeps the SSE connection open after sending [DONE], this hook would never finalize/search-cache until the connection closes.

If you expect to rely on a [DONE] sentinel, consider:

Tracking a doneStreaming flag that breaks the outer loop when [DONE] is seen, and

Optionally making the parsing a bit more tolerant to additional data: lines if the backend ever evolves.
packages/web/src/features/codeNav/api.ts (3)
23-45: Consider escaping symbolName in the IR regexp to avoid unintended regex behavior

Both reference and definition queries build a regexp around symbolName with \\b${symbolName}\\b, but symbolName is interpolated raw. If it can contain regex metacharacters (e.g., ., *, +, ?, []), this will change the meaning of the pattern and could either over‑match or fail to match the intended symbol.

If you want literal symbol lookup, consider escaping the name before interpolation, e.g.:
+const escapeRegex = (value: string) =>
+  value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");

- regexp: `\\b${symbolName}\\b`,
+ regexp: `\\b${escapeRegex(symbolName)}\\b`,
If, instead, you intentionally rely on regex semantics here, it would be good to document that assumption and add tests for symbols containing regex‑special characters.

Also applies to: 74-88

116-140: Simplified response parsing is good; consider map instead of flatMap on files

The new parseRelatedSymbolsSearchResponse that directly maps SearchResponse → FindRelatedSymbolsResponse is clear, and the inner chunks.flatMap(...matchRanges...) with a final .filter(file => file.matches.length > 0) makes sense.

Minor nit: searchResult.files.flatMap((file) => { return { ... } }) always returns a single object per file, so flatMap behaves like map here and is a bit surprising to readers. Switching to map would better express intent without behavior change:
- files: searchResult.files.flatMap((file) => {
+ files: searchResult.files.map((file) => {
   // ...
- }).filter((file) => file.matches.length > 0),
+ }).filter((file) => file.matches.length > 0),
143-181: Language IR filter expansion looks good; just confirm language values match Zoekt’s language names

The getExpandedLanguageFilter helper that builds an or over TypeScript/JavaScript/JSX/TSX (and otherwise a single language node) is a nice improvement over string‑based filters. Please just ensure the language strings here exactly match the language identifiers emitted by your indexer/Zoekt configuration so the filters don’t silently miss matches.
packages/web/src/features/search/parser.ts (1)

126-203: Unknown AST node types defaulting to const: true can silently broaden queries

In transformTreeToIR’s default branch, unknown node types log a warning and return { const: true, query: "const" }, effectively “match all”. That’s safe in the sense of not throwing, but if the grammar evolves and new nodes appear, this will silently broaden user queries instead of failing fast.

Consider failing closed by throwing (or converting to a parse error as per the previous comment), so unsupported syntax doesn’t produce unexpectedly wide result sets.
packages/web/src/app/[domain]/search/components/searchResultsPage.tsx (2)
80-86: Factor out error message construction to keep UI and toast in sync

Both the toast and inline error message compute the same ternary expression for ServiceErrorException vs generic Error. To keep things DRY and ensure they never diverge, you could precompute errorMessage once:
-    useEffect(() => {
-        if (error) {
-            toast({
-                description: `❌ Search failed. Reason: ${error instanceof ServiceErrorException ? error.serviceError.message : error.message}`,
-            });
-        }
-    }, [error, toast]);
+    const errorMessage =
+        error instanceof ServiceErrorException
+            ? error.serviceError.message
+            : error?.message;
+
+    useEffect(() => {
+        if (errorMessage) {
+            toast({
+                description: `❌ Search failed. Reason: ${errorMessage}`,
+            });
+        }
+    }, [errorMessage, toast]);
@@
-                    <p className="text-sm text-center">{error instanceof ServiceErrorException ? error.serviceError.message : error.message}</p>
+                    <p className="text-sm text-center">{errorMessage}</p>
This keeps the error handling consistent and slightly simplifies the JSX.

Also applies to: 184-189

105-114: Consider gating console.debug timing logs to non‑production environments

The console.debug calls for timeToFirstSearchResultMs and timeToSearchCompletionMs are useful during development, but may add noise in production consoles.

If you don’t rely on them in prod, consider wrapping them in an environment check or removing them once metrics are fully wired into analytics:
-        console.debug('timeToFirstSearchResultMs:', timeToFirstSearchResultMs);
-        console.debug('timeToSearchCompletionMs:', timeToSearchCompletionMs);
+        if (process.env.NODE_ENV !== 'production') {
+            console.debug('timeToFirstSearchResultMs:', timeToFirstSearchResultMs);
+            console.debug('timeToSearchCompletionMs:', timeToSearchCompletionMs);
+        }

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09507d3 and 2f6fc04.

⛔ Files ignored due to path filters (1)

yarn.lock is excluded by !**/yarn.lock, !**/*.lock

📒 Files selected for processing (107)

.env.development (0 hunks)
CHANGELOG.md (1 hunks)
Dockerfile (5 hunks)
package.json (2 hunks)
packages/backend/src/index.ts (0 hunks)
packages/db/src/index.ts (1 hunks)
packages/mcp/CHANGELOG.md (1 hunks)
packages/mcp/src/index.ts (1 hunks)
packages/mcp/src/schemas.ts (2 hunks)
packages/queryLanguage/.gitignore (1 hunks)
packages/queryLanguage/package.json (1 hunks)
packages/queryLanguage/src/index.ts (1 hunks)
packages/queryLanguage/src/parser.terms.ts (1 hunks)
packages/queryLanguage/src/parser.ts (1 hunks)
packages/queryLanguage/src/query.grammar (1 hunks)
packages/queryLanguage/src/tokens.ts (1 hunks)
packages/queryLanguage/test.ts (1 hunks)
packages/queryLanguage/test/basic.txt (1 hunks)
packages/queryLanguage/test/grammar.test.ts (1 hunks)
packages/queryLanguage/test/grouping.txt (1 hunks)
packages/queryLanguage/test/negation.txt (1 hunks)
packages/queryLanguage/test/operators.txt (1 hunks)
packages/queryLanguage/test/precedence.txt (1 hunks)
packages/queryLanguage/test/prefixes.txt (1 hunks)
packages/queryLanguage/test/quoted.txt (1 hunks)
packages/queryLanguage/tsconfig.json (1 hunks)
packages/queryLanguage/vitest.config.ts (1 hunks)
packages/web/.eslintignore (1 hunks)
packages/web/package.json (3 hunks)
packages/web/src/actions.ts (1 hunks)
packages/web/src/app/[domain]/browse/layout.tsx (1 hunks)
packages/web/src/app/[domain]/components/lightweightCodeHighlighter.tsx (1 hunks)
packages/web/src/app/[domain]/components/pathHeader.tsx (1 hunks)
packages/web/src/app/[domain]/components/searchBar/constants.ts (1 hunks)
packages/web/src/app/[domain]/components/searchBar/searchBar.tsx (5 hunks)
packages/web/src/app/[domain]/components/searchBar/searchSuggestionsBox.tsx (4 hunks)
packages/web/src/app/[domain]/components/searchBar/useRefineModeSuggestions.ts (1 hunks)
packages/web/src/app/[domain]/components/searchBar/useSuggestionModeMappings.ts (1 hunks)
packages/web/src/app/[domain]/components/searchBar/useSuggestionsData.ts (1 hunks)
packages/web/src/app/[domain]/components/searchBar/zoektLanguageExtension.ts (1 hunks)
packages/web/src/app/[domain]/search/components/codePreviewPanel/codePreview.tsx (1 hunks)
packages/web/src/app/[domain]/search/components/codePreviewPanel/index.tsx (1 hunks)
packages/web/src/app/[domain]/search/components/filterPanel/filter.tsx (3 hunks)
packages/web/src/app/[domain]/search/components/filterPanel/index.tsx (5 hunks)
packages/web/src/app/[domain]/search/components/filterPanel/useFilterMatches.ts (1 hunks)
packages/web/src/app/[domain]/search/components/searchResultsPage.tsx (9 hunks)
packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatch.tsx (1 hunks)
packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx (2 hunks)
packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx (7 hunks)
packages/web/src/app/[domain]/search/page.tsx (2 hunks)
packages/web/src/app/[domain]/search/useStreamedSearch.ts (1 hunks)
packages/web/src/app/api/(client)/client.ts (1 hunks)
packages/web/src/app/api/(server)/search/route.ts (2 hunks)
packages/web/src/app/api/(server)/source/route.ts (1 hunks)
packages/web/src/app/api/(server)/stream_search/route.ts (1 hunks)
packages/web/src/ee/features/codeNav/components/exploreMenu/referenceList.tsx (1 hunks)
packages/web/src/ee/features/codeNav/components/symbolHoverPopup/symbolDefinitionPreview.tsx (1 hunks)
packages/web/src/ee/features/codeNav/components/symbolHoverPopup/useHoveredOverSymbolInfo.ts (1 hunks)
packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts (1 hunks)
packages/web/src/features/chat/tools.ts (5 hunks)
packages/web/src/features/codeNav/api.ts (4 hunks)
packages/web/src/features/codeNav/types.ts (1 hunks)
packages/web/src/features/search/README.md (1 hunks)
packages/web/src/features/search/fileSourceApi.ts (1 hunks)
packages/web/src/features/search/index.ts (1 hunks)
packages/web/src/features/search/ir.ts (1 hunks)
packages/web/src/features/search/parser.ts (1 hunks)
packages/web/src/features/search/schemas.ts (0 hunks)
packages/web/src/features/search/searchApi.ts (1 hunks)
packages/web/src/features/search/types.ts (1 hunks)
packages/web/src/features/search/zoektClient.ts (0 hunks)
packages/web/src/features/search/zoektSchema.ts (0 hunks)
packages/web/src/features/search/zoektSearcher.ts (1 hunks)
packages/web/src/lib/errorCodes.ts (1 hunks)
packages/web/src/lib/extensions/searchResultHighlightExtension.ts (1 hunks)
packages/web/src/lib/posthogEvents.ts (2 hunks)
packages/web/src/lib/types.ts (1 hunks)
packages/web/src/lib/utils.ts (1 hunks)
packages/web/src/prisma.ts (4 hunks)
packages/web/src/proto/google/protobuf/Duration.ts (1 hunks)
packages/web/src/proto/google/protobuf/Timestamp.ts (1 hunks)
packages/web/src/proto/query.ts (1 hunks)
packages/web/src/proto/webserver.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/And.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/Boost.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/Branch.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/BranchRepos.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/BranchesRepos.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/ChunkMatch.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/FileMatch.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/FileNameSet.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/FlushReason.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/IndexMetadata.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/Language.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/LineFragmentMatch.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/LineMatch.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/ListOptions.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/ListRequest.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/ListResponse.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/Location.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/MinimalRepoListEntry.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/Not.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/Or.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/Progress.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/Q.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/Range.ts (1 hunks)
packages/web/src/proto/zoekt/webserver/v1/RawConfig.ts (1 hunks)

⛔ Files not processed due to max files limit (21)

packages/web/src/proto/zoekt/webserver/v1/Regexp.ts
packages/web/src/proto/zoekt/webserver/v1/Repo.ts
packages/web/src/proto/zoekt/webserver/v1/RepoIds.ts
packages/web/src/proto/zoekt/webserver/v1/RepoListEntry.ts
packages/web/src/proto/zoekt/webserver/v1/RepoRegexp.ts
packages/web/src/proto/zoekt/webserver/v1/RepoSet.ts
packages/web/src/proto/zoekt/webserver/v1/RepoStats.ts
packages/web/src/proto/zoekt/webserver/v1/Repository.ts
packages/web/src/proto/zoekt/webserver/v1/RepositoryBranch.ts
packages/web/src/proto/zoekt/webserver/v1/SearchOptions.ts
packages/web/src/proto/zoekt/webserver/v1/SearchRequest.ts
packages/web/src/proto/zoekt/webserver/v1/SearchResponse.ts
packages/web/src/proto/zoekt/webserver/v1/Stats.ts
packages/web/src/proto/zoekt/webserver/v1/StreamSearchRequest.ts
packages/web/src/proto/zoekt/webserver/v1/StreamSearchResponse.ts
packages/web/src/proto/zoekt/webserver/v1/Substring.ts
packages/web/src/proto/zoekt/webserver/v1/Symbol.ts
packages/web/src/proto/zoekt/webserver/v1/SymbolInfo.ts
packages/web/src/proto/zoekt/webserver/v1/Type.ts
packages/web/src/proto/zoekt/webserver/v1/WebserverService.ts
packages/web/src/withAuthV2.ts

💤 Files with no reviewable changes (5)

packages/backend/src/index.ts
.env.development
packages/web/src/features/search/zoektClient.ts
packages/web/src/features/search/schemas.ts
packages/web/src/features/search/zoektSchema.ts

🧰 Additional context used

🪛 Biome (2.1.2)

packages/web/src/features/search/ir.ts

[error] 4-4: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

🪛 LanguageTool

packages/queryLanguage/test/negation.txt

[grammar] ~193-~193: Use a hyphen to join words.
Context: ...PrefixExpr(FileExpr))))) # Negate short form prefix -f:test.js ==> Program(Ne...

(QB_NEW_EN_HYPHEN)

[grammar] ~209-~209: Use a hyphen to join words.
Context: ...r(PrefixExpr(RepoExpr))) # Negate short form content -c:console ==> Program(N...

(QB_NEW_EN_HYPHEN)

packages/queryLanguage/test/basic.txt

[typographical] ~10-~10: This greeting should probably end with a comma.
Context: ... ==> Program(Term) # Multiple terms hello world ==> Program(AndExpr(Term,Term)) # Mu...

(EN_GREETING_WITHOUT_COMMA)

packages/queryLanguage/test/quoted.txt

[grammar] ~465-~465: Use a hyphen to join words.
Context: ...m,PrefixExpr(FileExpr))) # Quoted short form prefix "f:test" ==> Program(Term...

(QB_NEW_EN_HYPHEN)

packages/queryLanguage/test/operators.txt

[grammar] ~97-~97: Use a hyphen to join words.
Context: ...ixExpr(FileExpr),Term)) # OR with short form prefixes f:test.js or r:myrepo ==...

(QB_NEW_EN_HYPHEN)

coderabbitai · 2025-11-21T02:53:25Z

packages/mcp/src/schemas.ts

+export const searchOptionsSchema = z.object({
+    matches: z.number(),                              // The number of matches to return.
+    contextLines: z.number().optional(),              // The number of context lines to return.
+    whole: z.boolean().optional(),                    // Whether to return the whole file as part of the response.
+    isRegexEnabled: z.boolean().optional(),           // Whether to enable regular expression search.
+    isCaseSensitivityEnabled: z.boolean().optional(), // Whether to enable case sensitivity.
+});


🛠️ Refactor suggestion | 🟠 Major

Add validation constraints to numeric fields.

The matches field accepts any number, including negative, zero, or unreasonably large values, which could lead to unexpected behavior or resource issues downstream. Similarly, contextLines lacks constraints when provided.

Apply this diff to add sensible constraints:

export const searchOptionsSchema = z.object({ - matches: z.number(), // The number of matches to return. - contextLines: z.number().optional(), // The number of context lines to return. + matches: z.number().int().positive(), // The number of matches to return. + contextLines: z.number().int().nonnegative().optional(), // The number of context lines to return. whole: z.boolean().optional(), // Whether to return the whole file as part of the response. isRegexEnabled: z.boolean().optional(), // Whether to enable regular expression search. isCaseSensitivityEnabled: z.boolean().optional(), // Whether to enable case sensitivity. });

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export const searchOptionsSchema = z.object({

matches: z.number(), // The number of matches to return.

contextLines: z.number().optional(), // The number of context lines to return.

whole: z.boolean().optional(), // Whether to return the whole file as part of the response.

isRegexEnabled: z.boolean().optional(), // Whether to enable regular expression search.

isCaseSensitivityEnabled: z.boolean().optional(), // Whether to enable case sensitivity.

});

export const searchOptionsSchema = z.object({

matches: z.number().int().positive(), // The number of matches to return.

contextLines: z.number().int().nonnegative().optional(), // The number of context lines to return.

whole: z.boolean().optional(), // Whether to return the whole file as part of the response.

isRegexEnabled: z.boolean().optional(), // Whether to enable regular expression search.

isCaseSensitivityEnabled: z.boolean().optional(), // Whether to enable case sensitivity.

});

🤖 Prompt for AI Agents

In packages/mcp/src/schemas.ts around lines 24 to 30, the numeric fields lack validation constraints: `matches` currently allows negative, zero, or huge numbers and `contextLines` has no bounds when present. Change `matches` to an integer with a sensible lower and upper bound (e.g., .int().min(1).max(100)) and change `contextLines` to an optional integer with a non-negative upper bound (e.g., .optional().int().min(0).max(50)); keep other fields unchanged and ensure the schema rejects non-integers and out-of-range values.

coderabbitai · 2025-11-21T02:53:26Z

packages/web/package.json

    "start": "next start",
    "lint": "cross-env SKIP_ENV_VALIDATION=1 eslint .",
    "test": "cross-env SKIP_ENV_VALIDATION=1 vitest",
+    "generate:protos": "proto-loader-gen-types --includeComments --longs=Number --enums=String --defaults --oneofs --grpcLib=@grpc/grpc-js --keepCase --includeDirs=../../vendor/zoekt/grpc/protos --outDir=src/proto zoekt/webserver/v1/webserver.proto zoekt/webserver/v1/query.proto",


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify proto file paths and script correctness.

The generate:protos script references proto files at ../../vendor/zoekt/grpc/protos. Ensure these files exist at:

vendor/zoekt/grpc/protos/zoekt/webserver/v1/webserver.proto

vendor/zoekt/grpc/protos/zoekt/webserver/v1/query.proto

The script itself appears well-configured with appropriate flags for TypeScript generation.

🏁 Script executed:

#!/bin/bash # Verify proto files exist at the expected paths echo "Checking for proto files..." fd -t f "webserver.proto|query.proto" vendor/zoekt/grpc/protos/

Length of output: 140

🏁 Script executed:

#!/bin/bash # Search for proto files anywhere in the repository echo "=== Searching for proto files in entire repo ===" find . -name "webserver.proto" -o -name "query.proto" 2>/dev/null echo "" echo "=== Checking if vendor directory exists ===" ls -la vendor/ 2>&1 | head -20 echo "" echo "=== Checking if zoekt directory exists anywhere ===" find . -type d -name "zoekt" 2>/dev/null | head -20

Length of output: 358

🏁 Script executed:

#!/bin/bash # Check vendor/zoekt structure echo "=== Contents of vendor/zoekt ===" find vendor/zoekt -type f 2>/dev/null | head -30 echo "" echo "=== Contents of vendor/zoekt (tree view) ===" ls -laR vendor/zoekt/ 2>&1 | head -50 echo "" echo "=== Check .gitignore for proto file patterns ===" rg -i "proto|vendor" .gitignore 2>/dev/null

Length of output: 250

Proto files referenced in the script do not exist in the repository.

The generate:protos script references source proto files that are missing:

Expected path: vendor/zoekt/grpc/protos/zoekt/webserver/v1/

Expected files: webserver.proto, query.proto

Actual state: The vendor/zoekt/ directory exists but is empty; proto files are not present anywhere in the repository

This will cause the script to fail when executed. Either:

Add the missing proto files to vendor/zoekt/grpc/protos/zoekt/webserver/v1/

Update the script to reference the correct proto file location

Remove the script if proto generation is not needed for this PR

🤖 Prompt for AI Agents

In packages/web/package.json around line 11, the "generate:protos" npm script points at proto files that are not present in the repo (vendor/zoekt/grpc/protos/zoekt/webserver/v1/{webserver.proto,query.proto}); fix by either (A) adding the missing proto files to vendor/zoekt/grpc/protos/zoekt/webserver/v1/ in this PR, (B) updating the script to point to the actual proto location in the repo (or an external fetch step) so the referenced files exist at runtime, or (C) remove the generate:protos script entirely if proto generation is not required for this change — ensure whichever option you pick is reflected in package.json and any CI scripts that call it.

coderabbitai · 2025-11-21T02:53:26Z

packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx

    const {
        scrollOffset: restoreOffset,
        measurementsCache: restoreMeasurementsCache,
-        showAllMatchesStates: restoreShowAllMatchesStates,
+        showAllMatchesMap: restoreShowAllMatchesStates,
    } = history.state as ScrollHistoryState;

-    const [showAllMatchesStates, setShowAllMatchesStates] = useState(restoreShowAllMatchesStates || Array(fileMatches.length).fill(false));
+    const [showAllMatchesMap, showAllMatchesActions] = useMap<string, boolean>(restoreShowAllMatchesStates || []);



⚠️ Potential issue | 🔴 Critical

Guard against history.state being null on initial load

const { scrollOffset, measurementsCache, showAllMatchesMap } = history.state as ScrollHistoryState; will throw at runtime when history.state is null/undefined (which is the default before any replaceState), since the destructuring happens unguarded.

Recommend defaulting to an empty object before casting so the initial render is safe:

- const { - scrollOffset: restoreOffset, - measurementsCache: restoreMeasurementsCache, - showAllMatchesMap: restoreShowAllMatchesStates, - } = history.state as ScrollHistoryState; + const { + scrollOffset: restoreOffset, + measurementsCache: restoreMeasurementsCache, + showAllMatchesMap: restoreShowAllMatchesStates, + } = (history.state ?? {}) as ScrollHistoryState;

🤖 Prompt for AI Agents

In packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx around lines 55 to 62, the code destructures history.state directly which will throw when history.state is null/undefined on initial load; change the destructuring to use a safe default object before casting (e.g., use (history.state ?? {}) as ScrollHistoryState) so scrollOffset/measurementsCache/showAllMatchesMap default to undefined rather than causing a runtime error, and keep passing restoreShowAllMatchesStates || [] into useMap to preserve the existing fallback.

coderabbitai · 2025-11-21T02:53:26Z

packages/web/src/features/search/ir.ts

+import { Q as QueryIR } from '@/proto/zoekt/webserver/v1/Q';
+import { RawConfig } from '@/proto/zoekt/webserver/v1/RawConfig';
+import { Regexp } from '@/proto/zoekt/webserver/v1/Regexp';
+import { Symbol } from '@/proto/zoekt/webserver/v1/Symbol';
+import { Language } from '@/proto/zoekt/webserver/v1/Language';
+import { Repo } from '@/proto/zoekt/webserver/v1/Repo';
+import { RepoRegexp } from '@/proto/zoekt/webserver/v1/RepoRegexp';
+import { BranchesRepos } from '@/proto/zoekt/webserver/v1/BranchesRepos';


⚠️ Potential issue | 🔴 Critical

Rename imported Symbol to avoid shadowing the global Symbol (lint error)

The import import { Symbol } from '@/proto/zoekt/webserver/v1/Symbol'; shadows the built‑in global Symbol, which static analysis correctly flags. While it works at runtime, it’s confusing and will keep lint failing.

You can fix this by renaming the import and updating the isSymbolQuery type guard:

-import { Symbol } from '@/proto/zoekt/webserver/v1/Symbol'; +import { Symbol as ZoektSymbol } from '@/proto/zoekt/webserver/v1/Symbol'; @@ -export const isSymbolQuery = (query: QueryIR): query is QueryIR & { symbol: Symbol } => query.symbol != null; +export const isSymbolQuery = (query: QueryIR): query is QueryIR & { symbol: ZoektSymbol } => query.symbol != null;

Also applies to: 27-28

🧰 Tools

🪛 Biome (2.1.2)

[error] 4-4: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

🤖 Prompt for AI Agents

In packages/web/src/features/search/ir.ts around lines 1 to 8 (and also apply changes at lines 27-28), the imported name "Symbol" shadows the global Symbol causing lint errors; rename the import (e.g., import { Symbol as ProtoSymbol } from '@/proto/zoekt/webserver/v1/Symbol') and update all uses and the isSymbolQuery type guard to refer to the renamed identifier (ProtoSymbol) so the global Symbol is not shadowed and the type guard compiles/lints cleanly.

coderabbitai · 2025-11-21T02:53:26Z

packages/web/src/features/search/parser.ts

+export const parseQuerySyntaxIntoIR = async ({
+    query,
+    options,
+    prisma,
+}: {
+    query: string,
+    options: {
+        isCaseSensitivityEnabled?: boolean;
+        isRegexEnabled?: boolean;
+    },
+    prisma: PrismaClient,
+}): Promise<QueryIR> => {
+
+    try {
+        // First parse the query into a Lezer tree.
+        const tree = parser.parse(query);
+
+        // Then transform the tree into the intermediate representation.
+        return transformTreeToIR({
+            tree,
+            input: query,
+            isCaseSensitivityEnabled: options.isCaseSensitivityEnabled ?? false,
+            isRegexEnabled: options.isRegexEnabled ?? false,
+            onExpandSearchContext: async (contextName: string) => {
+                const context = await prisma.searchContext.findUnique({
+                    where: {
+                        name_orgId: {
+                            name: contextName,
+                            orgId: SINGLE_TENANT_ORG_ID,
+                        }
+                    },
+                    include: {
+                        repos: true,
+                    }
+                });
+
+                if (!context) {
+                    throw new Error(`Search context "${contextName}" not found`);
+                }
+
+                return context.repos.map((repo) => repo.name);
+            },
+        });
+    } catch (error) {
+        if (error instanceof SyntaxError) {
+            throw new ServiceErrorException({
+                statusCode: StatusCodes.BAD_REQUEST,
+                errorCode: ErrorCode.FAILED_TO_PARSE_QUERY,
+                message: `Failed to parse query "${query}" with message: ${error.message}`,
+            });
+        }
+        throw error;
+    }


⚠️ Potential issue | 🟠 Major

Normalize all parse/validation failures to a consistent ServiceErrorException

parseQuerySyntaxIntoIR only maps SyntaxError to ServiceErrorException, but transformTreeToIR and transformPrefixExpr throw plain Error for many user-facing validation issues (e.g., invalid visibility:, archived:, fork: values, missing colons, unknown prefixes, missing search contexts). Those will currently bubble out as generic 500s instead of a structured BAD_REQUEST with FAILED_TO_PARSE_QUERY.

Consider centralizing the mapping here so any non-ServiceErrorException from parsing/transformation yields a consistent 400:

- } catch (error) { - if (error instanceof SyntaxError) { - throw new ServiceErrorException({ - statusCode: StatusCodes.BAD_REQUEST, - errorCode: ErrorCode.FAILED_TO_PARSE_QUERY, - message: `Failed to parse query "${query}" with message: ${error.message}`, - }); - } - throw error; - } + } catch (error) { + if (error instanceof ServiceErrorException) { + // Already normalized elsewhere; just propagate. + throw error; + } + + const message = error instanceof Error ? error.message : String(error); + throw new ServiceErrorException({ + statusCode: StatusCodes.BAD_REQUEST, + errorCode: ErrorCode.FAILED_TO_PARSE_QUERY, + message: `Failed to parse query "${query}" with message: ${message}`, + }); + }

Optionally, if you still want to distinguish “internal” errors, you could gate this on a dedicated QueryParseError subclass and throw that from the various validation branches.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export const parseQuerySyntaxIntoIR = async ({

query,

options,

prisma,

}: {

query: string,

options: {

isCaseSensitivityEnabled?: boolean;

isRegexEnabled?: boolean;

},

prisma: PrismaClient,

}): Promise<QueryIR> => {

try {

// First parse the query into a Lezer tree.

const tree = parser.parse(query);

// Then transform the tree into the intermediate representation.

return transformTreeToIR({

tree,

input: query,

isCaseSensitivityEnabled: options.isCaseSensitivityEnabled ?? false,

isRegexEnabled: options.isRegexEnabled ?? false,

onExpandSearchContext: async (contextName: string) => {

const context = await prisma.searchContext.findUnique({

where: {

name_orgId: {

name: contextName,

orgId: SINGLE_TENANT_ORG_ID,

}

},

include: {

repos: true,

}

});

if (!context) {

throw new Error(`Search context "${contextName}" not found`);

}

return context.repos.map((repo) => repo.name);

},

});

} catch (error) {

if (error instanceof SyntaxError) {

throw new ServiceErrorException({

statusCode: StatusCodes.BAD_REQUEST,

errorCode: ErrorCode.FAILED_TO_PARSE_QUERY,

message: `Failed to parse query "${query}" with message: ${error.message}`,

});

}

throw error;

}

export const parseQuerySyntaxIntoIR = async ({

query,

options,

prisma,

}: {

query: string,

options: {

isCaseSensitivityEnabled?: boolean;

isRegexEnabled?: boolean;

},

prisma: PrismaClient,

}): Promise<QueryIR> => {

try {

// First parse the query into a Lezer tree.

const tree = parser.parse(query);

// Then transform the tree into the intermediate representation.

return transformTreeToIR({

tree,

input: query,

isCaseSensitivityEnabled: options.isCaseSensitivityEnabled ?? false,

isRegexEnabled: options.isRegexEnabled ?? false,

onExpandSearchContext: async (contextName: string) => {

const context = await prisma.searchContext.findUnique({

where: {

name_orgId: {

name: contextName,

orgId: SINGLE_TENANT_ORG_ID,

}

},

include: {

repos: true,

}

});

if (!context) {

throw new Error(`Search context "${contextName}" not found`);

}

return context.repos.map((repo) => repo.name);

},

});

} catch (error) {

if (error instanceof ServiceErrorException) {

// Already normalized elsewhere; just propagate.

throw error;

}

const message = error instanceof Error ? error.message : String(error);

throw new ServiceErrorException({

statusCode: StatusCodes.BAD_REQUEST,

errorCode: ErrorCode.FAILED_TO_PARSE_QUERY,

message: `Failed to parse query "${query}" with message: ${message}`,

});

}

}

🤖 Prompt for AI Agents

packages/web/src/features/search/parser.ts around lines 55 to 107: parsing/validation errors other than SyntaxError (e.g., plain Error from transformTreeToIR/transformPrefixExpr) are allowed to bubble out as generic 500s; wrap any non-ServiceErrorException thrown during parsing/transform into a ServiceErrorException with status BAD_REQUEST and ErrorCode.FAILED_TO_PARSE_QUERY, preserving the original error message (and optionally stack) so user-facing validation errors return a consistent 400, and rethrow ServiceErrorException unchanged.

coderabbitai · 2025-11-21T02:53:26Z

packages/web/src/features/search/searchApi.ts

+        logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);

-        if (!searchResponse.ok) {
-            return invalidZoektResponse(searchResponse);
-        }
-
-        const transformZoektSearchResponse = async ({ Result }: ZoektSearchResponse) => {
-            // @note (2025-05-12): in zoekt, repositories are identified by the `RepositoryID` field
-            // which corresponds to the `id` in the Repo table. In order to efficiently fetch repository
-            // metadata when transforming (potentially thousands) of file matches, we aggregate a unique
-            // set of repository ids* and map them to their corresponding Repo record.
-            //
-            // *Q: Why is `RepositoryID` optional? And why are we falling back to `Repository`?
-            //  A: Prior to this change, the repository id was not plumbed into zoekt, so RepositoryID was
-            // always undefined. To make this a non-breaking change, we fallback to using the repository's name
-            // (`Repository`) as the identifier in these cases. This is not guaranteed to be unique, but in
-            // practice it is since the repository name includes the host and path (e.g., 'github.com/org/repo',
-            // 'gitea.com/org/repo', etc.).
-            // 
-            // Note: When a repository is re-indexed (every hour) this ID will be populated.
-            // @see: https://github.com/sourcebot-dev/zoekt/pull/6
-            const repoIdentifiers = new Set(Result.Files?.map((file) => file.RepositoryID ?? file.Repository) ?? []);
-            const repos = new Map<string | number, Repo>();
-
-            (await prisma.repo.findMany({
-                where: {
-                    id: {
-                        in: Array.from(repoIdentifiers).filter((id) => typeof id === "number"),
-                    },
-                    orgId: org.id,
-                }
-            })).forEach(repo => repos.set(repo.id, repo));
-
-            (await prisma.repo.findMany({
-                where: {
-                    name: {
-                        in: Array.from(repoIdentifiers).filter((id) => typeof id === "string"),
-                    },
-                    orgId: org.id,
-                }
-            })).forEach(repo => repos.set(repo.name, repo));
-
-            const files = Result.Files?.map((file) => {
-                const fileNameChunks = file.ChunkMatches.filter((chunk) => chunk.FileName);
-
-                const webUrl = (() => {
-                    const template: string | undefined = Result.RepoURLs[file.Repository];
-                    if (!template) {
-                        return undefined;
-                    }
-
-                    // If there are multiple branches pointing to the same revision of this file, it doesn't
-                    // matter which branch we use here, so use the first one.
-                    const branch = file.Branches && file.Branches.length > 0 ? file.Branches[0] : "HEAD";
-                    return getFileWebUrl(template, branch, file.FileName);
-                })();
-
-                const identifier = file.RepositoryID ?? file.Repository;
-                const repo = repos.get(identifier);
+        return zoektSearch(zoektSearchRequest, prisma);


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Avoid logging full Zoekt requests in production for privacy and performance

logger.debug(JSON.stringify(zoektSearchRequest, null, 2)) in both search and streamSearch will emit the full Zoekt request, including user search text and repo names. This can:

Leak potentially sensitive search terms and repo metadata into logs.

Produce very large log entries for complex queries.

Consider either:

Redacting/summarizing the request (e.g., log only top-level flags and counts), or

Gating full-request logging behind a dedicated debug flag or non-production check.

- logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`); + if (env.NODE_ENV !== 'production') { + logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`); + }

(and similarly for zoektStreamSearchRequest), or introduce a more targeted debug switch if available.

Also applies to: 67-69

--- 🏁 Script executed: ```shell #!/bin/bash # Inspect the searchApi.ts file to verify the logger.debug calls and surrounding context cat packages/web/src/features/search/searchApi.ts | head -100

Length of output: 3235

🏁 Script executed:

#!/bin/bash # Search for other debug logging patterns in the codebase to understand conventions rg -nP 'logger\.(debug|log)\(' packages/web/src --type=ts -C2 -m 20

Length of output: 9804

🏁 Script executed:

#!/bin/bash # Check for environment variable checks (NODE_ENV) in the codebase rg -nP 'NODE_ENV|env\.NODE_ENV' packages/web/src --type=ts -C2

Length of output: 237

🏁 Script executed:

#!/bin/bash # Check for env imports/exports in the file or nearby files rg -nP 'import.*env|export.*env' packages/web/src/features/search --type=ts

Length of output: 226

Wrap debug logging to prevent sensitive search data and large log entries in production

The logger.debug() calls at lines 45 and 67 serialize the full zoektSearchRequest objects, which contain user search terms and repository metadata. In production, this:

Leaks potentially sensitive search queries and repo names into logs

Creates unnecessarily large log entries for complex queries

Since env is already imported in the file, wrap both debug statements with an environment check:

- logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`); + if (env.NODE_ENV !== 'production') { + logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`); + }

Apply the same fix to line 67 (zoektStreamSearchRequest).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);

if (!searchResponse.ok) {

return invalidZoektResponse(searchResponse);

}

const transformZoektSearchResponse = async ({ Result }: ZoektSearchResponse) => {

// @note (2025-05-12): in zoekt, repositories are identified by the `RepositoryID` field

// which corresponds to the `id` in the Repo table. In order to efficiently fetch repository

// metadata when transforming (potentially thousands) of file matches, we aggregate a unique

// set of repository ids* and map them to their corresponding Repo record.

//

// *Q: Why is `RepositoryID` optional? And why are we falling back to `Repository`?

// A: Prior to this change, the repository id was not plumbed into zoekt, so RepositoryID was

// always undefined. To make this a non-breaking change, we fallback to using the repository's name

// (`Repository`) as the identifier in these cases. This is not guaranteed to be unique, but in

// practice it is since the repository name includes the host and path (e.g., 'github.com/org/repo',

// 'gitea.com/org/repo', etc.).

//

// Note: When a repository is re-indexed (every hour) this ID will be populated.

// @see: https://github.com/sourcebot-dev/zoekt/pull/6

const repoIdentifiers = new Set(Result.Files?.map((file) => file.RepositoryID ?? file.Repository) ?? []);

const repos = new Map<string | number, Repo>();

(await prisma.repo.findMany({

where: {

id: {

in: Array.from(repoIdentifiers).filter((id) => typeof id === "number"),

},

orgId: org.id,

}

})).forEach(repo => repos.set(repo.id, repo));

(await prisma.repo.findMany({

where: {

name: {

in: Array.from(repoIdentifiers).filter((id) => typeof id === "string"),

},

orgId: org.id,

}

})).forEach(repo => repos.set(repo.name, repo));

const files = Result.Files?.map((file) => {

const fileNameChunks = file.ChunkMatches.filter((chunk) => chunk.FileName);

const webUrl = (() => {

const template: string | undefined = Result.RepoURLs[file.Repository];

if (!template) {

return undefined;

}

// If there are multiple branches pointing to the same revision of this file, it doesn't

// matter which branch we use here, so use the first one.

const branch = file.Branches && file.Branches.length > 0 ? file.Branches[0] : "HEAD";

return getFileWebUrl(template, branch, file.FileName);

})();

const identifier = file.RepositoryID ?? file.Repository;

const repo = repos.get(identifier);

return zoektSearch(zoektSearchRequest, prisma);

if (env.NODE_ENV !== 'production') {

logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);

}

return zoektSearch(zoektSearchRequest, prisma);

🤖 Prompt for AI Agents

In packages/web/src/features/search/searchApi.ts around lines 45-47 (and additionally apply to line 67), the code currently does logger.debug(JSON.stringify(zoektSearchRequest,...)) which can leak sensitive search terms and create huge logs; guard these debug calls with the existing env check (only log when not in production, e.g. if (!env.isProduction / env.NODE_ENV !== 'production')), and apply the same guard to the logger.debug for zoektStreamSearchRequest at line 67; if desired, replace full JSON.stringify with a redacted or minimal representation (search term and repo count) inside the guarded block.

coderabbitai · 2025-11-21T02:53:26Z

packages/web/src/features/search/zoektSearcher.ts

+        opts: {
+            chunk_matches: true,
+            // @note: Zoekt has several different ways to limit a given search. The two that
+            // we care about are `MaxMatchDisplayCount` and `TotalMaxMatchCount`:
+            // - `MaxMatchDisplayCount` truncates the number of matches AFTER performing
+            //   a search (specifically, after collating and sorting the results). The number of
+            //   results returned by the API will be less than or equal to this value.
+            //
+            // - `TotalMaxMatchCount` truncates the number of matches DURING a search. The results
+            //   returned by the API the API can be less than, equal to, or greater than this value.
+            //   Why greater? Because this value is compared _after_ a given shard has finished
+            //   being processed, the number of matches returned by the last shard may have exceeded
+            //   this value.
+            //
+            // Let's define two variables:
+            // - `actualMatchCount` : The number of matches that are returned by the API. This is
+            //   always less than or equal to `MaxMatchDisplayCount`.
+            // - `totalMatchCount` : The number of matches that zoekt found before it either
+            //   1) found all matches or 2) hit the `TotalMaxMatchCount` limit. This number is
+            //   not bounded and can be less than, equal to, or greater than both `TotalMaxMatchCount`
+            //   and `MaxMatchDisplayCount`.
+            //
+            //
+            // Our challenge is to determine whether or not the search returned all possible matches/
+            // (it was exaustive) or if it was truncated. By setting the `TotalMaxMatchCount` to
+            // `MaxMatchDisplayCount + 1`, we can determine which of these occurred by comparing
+            // `totalMatchCount` to `MaxMatchDisplayCount`.
+            //
+            // if (totalMatchCount ≤ actualMatchCount):
+            //     Search is EXHAUSTIVE (found all possible matches)
+            //     Proof: totalMatchCount ≤ MaxMatchDisplayCount < TotalMaxMatchCount
+            //         Therefore Zoekt stopped naturally, not due to limit
+            //     
+            // if (totalMatchCount > actualMatchCount):
+            //     Search is TRUNCATED (more matches exist)
+            //     Proof: totalMatchCount > MaxMatchDisplayCount + 1 = TotalMaxMatchCount
+            //         Therefore Zoekt hit the limit and stopped searching
+            //
+            max_match_display_count: options.matches,
+            total_max_match_count: options.matches + 1,
+            num_context_lines: options.contextLines ?? 0,


⚠️ Potential issue | 🟠 Major

isSearchExhaustive condition is inverted in non-streaming search

The explanatory comment says exhaustiveness should be determined by comparing totalMatchCount to actualMatchCount:

Exhaustive when totalMatchCount <= actualMatchCount

Truncated when totalMatchCount > actualMatchCount

In zoektSearch, the check is currently:

isSearchExhaustive: stats.actualMatchCount <= stats.totalMatchCount,

This inequality will almost always be true (returned matches can’t exceed total matches), so truncated searches will be incorrectly reported as exhaustive. The streaming final response uses the correct direction:

isSearchExhaustive: accumulatedStats.totalMatchCount <= accumulatedStats.actualMatchCount,

I recommend flipping the condition in zoektSearch to align with the documented logic and the streaming path:

- resolve({ - stats, - files, - repositoryInfo, - isSearchExhaustive: stats.actualMatchCount <= stats.totalMatchCount, - } satisfies SearchResponse); + resolve({ + stats, + files, + repositoryInfo, + isSearchExhaustive: stats.totalMatchCount <= stats.actualMatchCount, + } satisfies SearchResponse);

Also applies to: 136-137

🤖 Prompt for AI Agents

In packages/web/src/features/search/zoektSearcher.ts around lines 66-106 (and also fix lines 136-137), the isSearchExhaustive comparison is inverted: change the boolean to reflect "exhaustive when totalMatchCount <= actualMatchCount" (i.e. return totalMatchCount <= actualMatchCount instead of actualMatchCount <= totalMatchCount) so the non-streaming path matches the documented logic and the streaming path; update both occurrences accordingly and run tests.

coderabbitai · 2025-11-21T02:53:27Z

packages/web/src/features/search/zoektSearcher.ts

+export const zoektSearch = async (searchRequest: ZoektGrpcSearchRequest, prisma: PrismaClient): Promise<SearchResponse> => {
+    const client = createGrpcClient();
+    const metadata = new grpc.Metadata();
+
+    return new Promise((resolve, reject) => {
+        client.Search(searchRequest, metadata, async (error, response) => {
+            if (error || !response) {
+                reject(error || new Error('No response received'));
+                return;
+            }
+
+            const reposMapCache = await createReposMapForChunk(response, new Map<string | number, Repo>(), prisma);
+            const { stats, files, repositoryInfo } = await transformZoektSearchResponse(response, reposMapCache);
+
+            resolve({
+                stats,
+                files,
+                repositoryInfo,
+                isSearchExhaustive: stats.actualMatchCount <= stats.totalMatchCount,
+            } satisfies SearchResponse);
+        });
+    });


⚠️ Potential issue | 🟠 Major

Errors in async callback can leave zoektSearch’s Promise hanging

The client.Search callback is marked async and uses await, but thrown/rejected errors inside that callback (e.g. from Prisma in createReposMapForChunk or from transformZoektSearchResponse) won’t be wired to the outer Promise unless you manually reject them. In the current form, such failures can result in a Promise that never resolves or rejects.

Refactor the callback to keep it synchronous and move the async work into an IIFE with explicit try/catch:

export const zoektSearch = async (searchRequest: ZoektGrpcSearchRequest, prisma: PrismaClient): Promise<SearchResponse> => { const client = createGrpcClient(); const metadata = new grpc.Metadata(); return new Promise((resolve, reject) => { - client.Search(searchRequest, metadata, async (error, response) => { - if (error || !response) { - reject(error || new Error('No response received')); - return; - } - - const reposMapCache = await createReposMapForChunk(response, new Map<string | number, Repo>(), prisma); - const { stats, files, repositoryInfo } = await transformZoektSearchResponse(response, reposMapCache); - - resolve({ - stats, - files, - repositoryInfo, - isSearchExhaustive: stats.actualMatchCount <= stats.totalMatchCount, - } satisfies SearchResponse); - }); + client.Search(searchRequest, metadata, (error, response) => { + if (error || !response) { + reject(error || new Error('No response received')); + return; + } + + (async () => { + try { + const reposMapCache = await createReposMapForChunk( + response, + new Map<string | number, Repo>(), + prisma, + ); + const { stats, files, repositoryInfo } = await transformZoektSearchResponse( + response, + reposMapCache, + ); + + resolve({ + stats, + files, + repositoryInfo, + isSearchExhaustive: stats.totalMatchCount <= stats.actualMatchCount, + } satisfies SearchResponse); + } catch (err) { + reject(err); + } + })(); + }); }); }

This ensures all downstream failures either resolve or reject the Promise instead of silently leaking.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In packages/web/src/features/search/zoektSearcher.ts around lines 118 to 139, the grpc callback is declared async so any thrown/rejected errors from awaited calls (createReposMapForChunk, transformZoektSearchResponse) are not propagated to the outer Promise and can leave it hanging; change the client.Search callback to a synchronous function and immediately invoke an async IIFE inside it that wraps the awaited work in try/catch, calling resolve(...) on success and reject(err) on any thrown/rejected error, and ensure the original error/no-response branch still rejects and returns so the Promise is always settled.

coderabbitai · 2025-11-21T02:53:27Z

packages/web/src/lib/utils.ts

+export const getCodeHostBrowseFileAtBranchUrl = ({
+    webUrl,
+    codeHostType,
+    branchName,
+    filePath,
+}: {
+    webUrl?: string | null,
+    codeHostType: CodeHostType,
+    branchName: string,
+    filePath: string,
+}) => {
+    if (!webUrl) {
+        return undefined;
+    }
+
+    switch (codeHostType) {
+        case 'github':
+            return `${webUrl}/blob/${branchName}/${filePath}`;
+        case 'gitlab':
+            return `${webUrl}/-/blob/${branchName}/${filePath}`;
+        case 'gitea':
+            return `${webUrl}/src/branch/${branchName}/${filePath}`;
+        case 'azuredevops':
+            return `${webUrl}?path=${filePath}&version=${branchName}`;
+        case 'bitbucketCloud':
+            return `${webUrl}/src/${branchName}/${filePath}`;
+        case 'bitbucketServer':
+            return `${webUrl}/browse/${filePath}?at=${branchName}`;
+        case 'gerrit':
+            return `${webUrl}/+/${branchName}/${filePath}`;
+        case 'genericGitHost':
+            return undefined;
+
+    }
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Verify URL patterns for code host file browsing.

The new function follows a consistent pattern with getCodeHostBrowseAtBranchUrl. However, please verify:

Azure DevOps URL format: The pattern ?path={file}&version={branch} differs from the branch-only pattern. Confirm this matches Azure DevOps's file browsing URL structure.

File path encoding: Consider whether filePath should be URL-encoded, especially for paths containing spaces or special characters (e.g., encodeURIComponent(filePath) or proper path segment encoding).

Test with a few sample file paths containing spaces or special characters to ensure the URLs work correctly across different code hosts.

🏁 Script executed:

#!/bin/bash # Check if Azure DevOps and other code hosts handle file paths with special characters correctly # Note: This is a manual verification prompt - actual testing should be done in a browser echo "Sample file paths to test:" echo "1. src/components/Button.tsx" echo "2. src/utils/file name with spaces.ts" echo "3. src/test/special-chars-#@!.js" echo "" echo "Verify these URL patterns work:" echo "- Azure DevOps: {webUrl}?path={file}&version={branch}" echo "- GitHub: {webUrl}/blob/{branch}/{file}" echo "- GitLab: {webUrl}/-/blob/{branch}/{file}"

Length of output: 295

I need to verify the concerns raised in the review comment. Let me search for information about the correct URL formats for these code hosting platforms and best practices for URL encoding file paths.

Now let me search for more specific information about how different code hosts handle file paths in URLs, and check if there are URL encoding utilities in the codebase.

Now let me examine the codebase to see if URL encoding is already being handled or if this is indeed a missing concern.

[/run_scripts] I'll also search for any URL encoding utilities in the codebase:

[/run_scripts]

🏁 Script executed:

#!/bin/bash # Search for the function and its usage context rg -n "getCodeHostBrowseFileAtBranchUrl" packages/web/src --context=5

Length of output: 2471

🏁 Script executed:

#!/bin/bash # Look for encodeURIComponent usage and URL encoding patterns rg -n "encodeURIComponent|encodeURI|URLEncode" packages/web/src --type=ts --type=tsx | head -20

Length of output: 27

Let me retry the URL encoding search with corrected syntax:

[/run_scripts] Let me also check where `fileName` comes from in the zoektSearcher:

[/run_scripts]

🏁 Script executed:

#!/bin/bash # Look for encodeURIComponent usage and URL encoding patterns rg -n "encodeURIComponent|encodeURI" packages/web/src

Length of output: 1341

🏁 Script executed:

#!/bin/bash # Check context around fileName variable in zoektSearcher.ts rg -n "fileName" packages/web/src/features/search/zoektSearcher.ts -B5 -A2 | head -40

Length of output: 1618

Apply URL encoding to file paths across all code host URL patterns.

The review concerns are valid. File paths from Zoekt search results are raw and unencoded, but are directly interpolated into URLs without encoding. This will break URLs containing spaces or special characters (e.g., file name with spaces.ts, special-#@!.js).

The codebase already uses encodeURIComponent() elsewhere for similar URL component encoding (e.g., packages/web/src/app/[domain]/browse/hooks/utils.ts:85). Apply this pattern consistently:

Wrap all filePath and branchName parameters with encodeURIComponent() in each case statement

Example: ${webUrl}/blob/${encodeURIComponent(branchName)}/${encodeURIComponent(filePath)} for GitHub

Apply to all six code host patterns: GitHub, GitLab, Gitea, Azure DevOps, Bitbucket Cloud/Server, and Gerrit

🤖 Prompt for AI Agents

In packages/web/src/lib/utils.ts around lines 379 to 413, filePath (and branchName where used in path components) is inserted into host URLs unencoded which breaks URLs containing spaces or special characters; update each case to wrap branchName and filePath (where they appear in the path or query values) with encodeURIComponent() — e.g. for GitHub use `${webUrl}/blob/${encodeURIComponent(branchName)}/${encodeURIComponent(filePath)}`, apply the same encoding pattern for GitLab, Gitea, Azure DevOps (encode query values), Bitbucket Cloud/Server, and Gerrit, leaving genericGitHost behavior unchanged.

coderabbitai · 2025-11-21T02:53:27Z

packages/web/src/proto/query.ts

+type SubtypeConstructor<Constructor extends new (...args: any) => any, Subtype> = {
+  new(...args: ConstructorParameters<Constructor>): Subtype;
+};


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Verify usage of SubtypeConstructor helper type.

The SubtypeConstructor type is defined but not used within this file. Please verify that it's actually used elsewhere in the codebase.

🏁 Script executed:

#!/bin/bash # Description: Search for usages of SubtypeConstructor in the codebase # Search for imports and usage of SubtypeConstructor rg -n --type=ts --type=tsx 'SubtypeConstructor' -g '!packages/web/src/proto/query.ts'

Length of output: 27

🏁 Script executed:

#!/bin/bash # Description: Search for usages of SubtypeConstructor in the codebase # Using glob patterns to match both .ts and .tsx files # Search for SubtypeConstructor in all TypeScript files (excluding query.ts) rg -n 'SubtypeConstructor' --glob '!packages/web/src/proto/query.ts' -g '*.ts' -g '*.tsx'

Length of output: 452

🏁 Script executed:

#!/bin/bash # Description: Check imports in webserver.ts to understand SubtypeConstructor relationship # Show imports/top of webserver.ts head -60 packages/web/src/proto/webserver.ts | grep -n -A5 -B5 'SubtypeConstructor\|import' # Also check if query.ts exports SubtypeConstructor explicitly echo "=== Checking query.ts exports ===" head -30 packages/web/src/proto/query.ts | grep -n -A2 -B2 'export\|SubtypeConstructor'

Length of output: 8133

Remove unused SubtypeConstructor type definition.

Verification confirms that SubtypeConstructor at lines 24-26 in packages/web/src/proto/query.ts is unused. While webserver.ts defines and uses an identical helper type, it maintains its own independent definition and does not import from query.ts. Delete this unused type from query.ts.

🤖 Prompt for AI Agents

In packages/web/src/proto/query.ts around lines 24 to 26, remove the unused TypeScript type alias SubtypeConstructor (the constructor-to-subtype helper) because it's not referenced anywhere in this file; delete the three-line type declaration and ensure no imports/exports depend on it so the file compiles cleanly after removal.

brendan-kellam added 24 commits November 19, 2025 22:14

generate protobuf types

a040ee1

stream poc over SSE

cca3d30

wip: make stream search api follow existing schema. Modify UI to supp…

9cd3236

…ort streaming

fix scrolling issue

9f66c45

Dockerfile

4f39451

wip on lezer parser grammar for query language

cfdadf2

add lezer tree -> grpc transformer

8dce2aa

remove spammy log message

46c8d60

fix syntax highlighting by adding a module resolution for @lezer/common

ac9d05a

further wip on query language

b966b63

Add case sensitivity and regexp toggles

69e0a1a

Improved type safety / cleanup for query lang

e6859cb

support search contexts

307e17e

update Dockerfile with query langauge package

09581f3

fix filter

b09def9

Add skeletons to filter panel when search is streaming

74376c0

add client side caching

c7ba32f

improved cancelation handling

5ab585c

add isSearchExausted flag for flagging when a search captured all res…

9c9b6b9

…ults

Add back posthog search_finished event

d55dd13

remove zoekt tenant enforcement

afcc30d

migrate blocking search over to grpc. Centralize everything in searchApi

898c909

branch handling

696d06b

plumb file weburl

aad3507

brendan-kellam force-pushed the bkellam/streamed_search branch from c137be2 to aad3507 Compare November 20, 2025 06:15

brendan-kellam added 4 commits November 20, 2025 12:40

add repo_sets filter for repositories a user has access to

3fd5f49

refactor a bunch of stuff + add support for passing in Query IR to se…

bddbb76

…arch api

refactor

ceee66e

dev README

1d1205f

brendan-kellam added 7 commits November 20, 2025 16:47

wip on better error handling

5c624cd

error handling for stream path

55a3e9e

update mcp

93a3f89

changelog wip

a6e4092

type fix

f52dfc7

style

c477d76

Support rev:* wildcard

2f6fc04

brendan-kellam changed the title ~~[wip] Streamed search~~ feat(web): Streamed code search Nov 21, 2025

brendan-kellam marked this pull request as ready for review November 21, 2025 02:45

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

feat(web): Streamed code search #623

Are you sure you want to change the base?

feat(web): Streamed code search #623

Uh oh!

Conversation

brendan-kellam commented Nov 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

brendan-kellam commented Nov 21, 2025

Uh oh!

coderabbitai bot commented Nov 21, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brendan-kellam commented Nov 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 20, 2025 •

edited

Loading