Skip to content

Conversation

@brendan-kellam
Copy link
Contributor

@brendan-kellam brendan-kellam commented Nov 20, 2025

Summary by CodeRabbit

  • New Features

    • Added regex pattern and case sensitivity toggle controls to search interface
    • Implemented streaming search results with real-time updates
    • Added comprehensive query language parser with advanced search operators
    • Enhanced search with visibility and fork filtering options
  • Bug Fixes

    • Fixed branch name display by removing ref path prefixes
  • Chores

    • Updated MCP package with API changes
    • Added gRPC integration for improved search backend communication

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 20, 2025

Walkthrough

This PR introduces a Lezer-based query language parser, converts the search API from string-based queries to intermediate representation (IR), implements streaming search via SSE, adds Zoekt gRPC proto definitions, updates search UI for regex and case-sensitivity flags, and refactors Prisma permission filtering to account-based scoping.

Changes

Cohort / File(s) Summary
Query Language Package
packages/queryLanguage/*
New @sourcebot/query-language workspace package with Lezer grammar (query.grammar), tokenizer (tokens.ts), generated parser (parser.ts, parser.terms.ts), index exports, and comprehensive test suites covering basic, grouping, negation, operators, precedence, prefixes, and quoted expressions.
Search API & IR Foundation
packages/web/src/features/search/index.ts, packages/web/src/features/search/types.ts, packages/web/src/features/search/ir.ts, packages/web/src/features/search/parser.ts, packages/web/src/features/search/searchApi.ts
New/refactored search feature exports, unified type definitions (SearchRequest, SearchResponse, SearchStats, etc.), QueryIR visitor/traversal utilities, parseQuerySyntaxIntoIR for converting Lezer trees to IR, and refactored search/streamSearch functions using IR and permission filters.
Zoekt gRPC Integration
packages/web/src/features/search/zoektSearcher.ts, packages/web/src/proto/zoekt/webserver/v1/*, packages/web/src/proto/webserver.ts, packages/web/src/proto/google/protobuf/*
New zoektSearcher with createZoektSearchRequest, zoektSearch, zoektStreamSearch functions, and comprehensive proto type definitions for Zoekt gRPC (And, Boost, Branch, FileMatch, Q, SearchResponse, WebserverService, etc.) and Google Protobuf types.
Search UI Components
packages/web/src/app/[domain]/components/searchBar/*, packages/web/src/app/[domain]/browse/layout.tsx
Updated SearchBar to accept defaults object with query, isRegexEnabled, isCaseSensitivityEnabled; added case-sensitivity and regex toggles; refactored suggestion mode from "case"/"public" to "visibility"; updated search prefix constants and logic.
Streaming Search & Server Routes
packages/web/src/app/[domain]/search/useStreamedSearch.ts, packages/web/src/app/api/(server)/stream_search/route.ts, packages/web/src/app/[domain]/search/page.tsx, packages/web/src/app/[domain]/search/components/searchResultsPage.tsx
New useStreamedSearch hook for SSE-based search with caching and cancellation; POST /stream_search route handler; updated SearchResultsPage to use streaming data (isStreaming, files, stats, timeToFirstSearchResultMs, timeToSearchCompletionMs); added isRegexEnabled/isCaseSensitivityEnabled to search params.
Search Results UI
packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx, packages/web/src/app/[domain]/search/components/filterPanel/*
Converted SearchResultsPanel to forwardRef with resetScroll handle; refactored "show all matches" state to per-file-match map; added isStreaming and onFilterChange to FilterPanel; updated skeleton rendering during stream.
Code Navigation
packages/web/src/features/codeNav/api.ts, packages/web/src/features/codeNav/types.ts
Refactored findSearchBasedSymbolReferences/findSearchBasedSymbolDefinitions to use IR queries (queryType: 'ir') instead of strings; updated language filter builder to return QueryIR; adjusted parse logic and file structure mapping.
File Source API & Chat Tools
packages/web/src/features/search/fileSourceApi.ts, packages/web/src/features/chat/tools.ts, packages/web/src/app/api/(client)/client.ts
Updated fileSourceApi to construct IR queries; refactored search tool in chat/tools.ts to use new searchOptions structure; adjusted client API imports.
Database & Auth
packages/db/src/index.ts, packages/web/src/prisma.ts
Added UserWithAccounts type alias (User & { accounts: Account[] }); refactored userScopedPrismaClientExtension to accept user parameter and use new getRepoPermissionFilterForUser helper for account-based permission filtering.
Proto & Type Utilities
packages/web/src/lib/utils.ts, packages/web/src/lib/errorCodes.ts, packages/web/src/lib/types.ts, packages/web/src/lib/posthogEvents.ts
Added getCodeHostBrowseFileAtBranchUrl URL builder; added FAILED_TO_PARSE_QUERY error code; added SearchQueryParams enum members (isRegexEnabled, isCaseSensitivityEnabled); updated search_finished PosthogEventMap with new timing fields and flushReason type.
MCP Package
packages/mcp/src/index.ts, packages/mcp/src/schemas.ts, packages/mcp/CHANGELOG.md
Updated search call to use isRegexEnabled and isCaseSensitivityEnabled flags instead of case: query suffix; refactored searchRequestSchema with searchOptionsSchema composition; updated changelog.
Web Package & Build
package.json, packages/web/package.json, Dockerfile
Added @sourcebot/query-language workspace and build dependencies; added gRPC dependencies (@grpc/grpc-js, @grpc/proto-loader); added generate:protos script; updated Docker build stages for queryLanguage; removed SRC_TENANT_ENFORCEMENT_MODE=strict; added @lezer/common resolution pin.
Infrastructure & Config
.env.development, CHANGELOG.md, packages/queryLanguage/tsconfig.json, packages/queryLanguage/vitest.config.ts, packages/queryLanguage/.gitignore, packages/web/.eslintignore, packages/web/src/features/search/README.md
Removed strict tenant enforcement mode; added MCP changelog entry; added TypeScript and Vitest configs for queryLanguage; added ESLint ignores for proto output; added search feature README documentation.
Removed/Deleted Files
packages/web/src/features/search/schemas.ts, packages/web/src/features/search/zoektClient.ts, packages/web/src/features/search/zoektSchema.ts
Removed centralized Zod schemas file, zoektClient fetch wrapper, and Zoekt-specific schema definitions (moved to types.ts and new IR-based approach).
Type/Import Path Updates
packages/web/src/app/[domain]/components/lightweightCodeHighlighter.tsx, packages/web/src/app/[domain]/search/components/codePreviewPanel/*, packages/web/src/app/[domain]/search/components/searchResultsPanel/*, packages/web/src/features/codeNav/types.ts, packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts, packages/web/src/lib/extensions/searchResultHighlightExtension.ts, packages/web/src/ee/features/codeNav/components/*
Consolidated import paths for search types (SourceRange, SearchResultFile, RepositoryInfo, etc.) from @/features/search/types to @/features/search; updated fileSourceRequestSchema imports to types subpath.
Backend
packages/backend/src/index.ts
Removed debug log line for repeated shutdown signals.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant SearchPage as SearchPage
    participant useStreamedSearch as useStreamedSearch
    participant SearchAPI as /api/stream_search
    participant zoektSearcher as zoektSearcher
    participant ZoektGRPC as Zoekt gRPC

    Client->>SearchPage: Enter query + flags
    SearchPage->>useStreamedSearch: query, isRegexEnabled, isCaseSensitivityEnabled
    
    alt Cache Hit
        useStreamedSearch-->>SearchPage: Return cached results
    else Cache Miss
        useStreamedSearch->>SearchAPI: POST (SSE)
        SearchAPI->>SearchAPI: Parse query → IR (parseQuerySyntaxIntoIR)
        SearchAPI->>zoektSearcher: createZoektSearchRequest(IR)
        zoektSearcher->>ZoektGRPC: StreamSearch RPC
        
        ZoektGRPC-->>zoektSearcher: Stream chunk
        zoektSearcher->>zoektSearcher: Transform chunk + enrich repos
        zoektSearcher-->>SearchAPI: data: {files, stats, ...}
        SearchAPI-->>useStreamedSearch: SSE chunk
        useStreamedSearch->>useStreamedSearch: Update state
        useStreamedSearch-->>SearchPage: Partial results
        
        ZoektGRPC-->>zoektSearcher: Final/Error
        zoektSearcher-->>SearchAPI: Final/Error response
        SearchAPI-->>useStreamedSearch: SSE final
        useStreamedSearch->>useStreamedSearch: Cache results
        useStreamedSearch-->>SearchPage: Final results
    end
    
    SearchPage->>SearchPage: Render results
Loading
sequenceDiagram
    participant Input as Query Input
    participant Lezer as Lezer Parser
    participant transformTreeToIR as transformTreeToIR
    participant expandContext as expandSearchContext
    participant IROutput as QueryIR Output

    Input->>Lezer: "file:test.ts content:bug -archived:yes"
    Lezer-->>transformTreeToIR: Parse tree (PrefixExpr, Term, NegateExpr, etc.)
    
    transformTreeToIR->>transformTreeToIR: Walk tree recursively
    
    alt PrefixExpr
        transformTreeToIR->>transformTreeToIR: Extract prefix type (FileExpr, ContentExpr, etc.)
        transformTreeToIR->>transformTreeToIR: Build IR node (regexp, substring, raw_config, etc.)
    else Term
        transformTreeToIR->>transformTreeToIR: Check isRegexEnabled / isCaseSensitivityEnabled
        transformTreeToIR->>transformTreeToIR: Create regexp or substring query
    else ContextExpr
        transformTreeToIR->>expandContext: Resolve context name
        expandContext-->>transformTreeToIR: Repository IDs
        transformTreeToIR->>transformTreeToIR: Build repo_set query
    end
    
    transformTreeToIR-->>IROutput: QueryIR (And / Or / Not / PrefixExpr nodes)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

  • Query language grammar and parser (packages/queryLanguage/src/query.grammar, packages/queryLanguage/src/parser.ts, packages/queryLanguage/src/tokens.ts): Complex Lezer-based grammar with tokenization logic; verify precedence, operator handling, and edge cases in negation/prefix parsing.
  • parseQuerySyntaxIntoIR implementation (packages/web/src/features/search/parser.ts): Dense transformation logic with context expansion, case/regex sensitivity handling, and error path complexity; review all prefix transformations and error codes.
  • Zoekt gRPC integration (packages/web/src/features/search/zoektSearcher.ts): Streaming error handling, chunk accumulation, repository enrichment, and gRPC client lifecycle; verify SSE encoding and chunk/final/error response ordering.
  • Streaming search hook (packages/web/src/app/[domain]/search/useStreamedSearch.ts): SSE parsing, cache TTL validation, AbortController cancellation, and error handling paths; review cache key construction and error wrapping.
  • SearchResultsPanel refactoring (packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx): Conversion to forwardRef, per-file-match state map persistence, and debounce logic changes; verify resetScroll behavior and history restoration.
  • Prisma permission filtering (packages/web/src/prisma.ts): Shift from account-ID arrays to account-aware user scoping; verify permission edge cases and fallback to public repos logic.
  • Proto definitions (packages/web/src/proto/zoekt/webserver/v1/*, packages/web/src/proto/webserver.ts): Large number of generated type definitions; spot-check a few for correctness of field types and nested structures.

Possibly related PRs

Suggested labels

sourcebot-team

Suggested reviewers

  • msukkari

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(web): Streamed code search' clearly describes the main feature being introduced - streaming functionality for code search in the web package.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bkellam/streamed_search

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@brendan-kellam brendan-kellam force-pushed the bkellam/streamed_search branch from c137be2 to aad3507 Compare November 20, 2025 06:15
@brendan-kellam brendan-kellam changed the title [wip] Streamed search feat(web): Streamed code search Nov 21, 2025
@brendan-kellam brendan-kellam marked this pull request as ready for review November 21, 2025 02:45
@brendan-kellam
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Nov 21, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (28)
packages/web/src/app/[domain]/components/pathHeader.tsx (1)

236-236: LGTM! Clean branch name display.

The regex transformation correctly strips Git's symbolic reference prefixes (refs/heads/ or refs/tags/) to display user-friendly branch names. The implementation is safe and follows standard Git UI conventions.

Recommend extracting to a utility function—this pattern is already duplicated.

Verification found the same transformation in packages/web/src/app/[domain]/repos/components/repoBranchesTable.tsx (line 39). Consider extracting to a shared utility to reduce duplication:

// In lib/utils.ts or similar
export const formatBranchDisplayName = (branchName: string): string => {
  return branchName.replace(/^refs\/(heads|tags)\//, '');
};

Update both locations:

  • packages/web/src/app/[domain]/components/pathHeader.tsx (line 236)
  • packages/web/src/app/[domain]/repos/components/repoBranchesTable.tsx (line 39)
packages/mcp/src/schemas.ts (1)

32-35: Consider validating the query field.

The query field accepts any string including empty strings. Depending on how queries are processed, you may want to ensure non-empty queries or add appropriate validation.

If empty queries should be rejected, apply this diff:

 export const searchRequestSchema = z.object({
-    query: z.string(),                                // The zoekt query to execute.
+    query: z.string().min(1),                         // The zoekt query to execute.
     ...searchOptionsSchema.shape,
 });
packages/queryLanguage/tsconfig.json (1)

21-21: Consider broadening the include pattern.

The current configuration only includes src/index.ts, which means TypeScript will only compile files transitively imported from the index. Generated parser files in src/parser/ may not be fully type-checked if they contain exports that aren't used.

Consider changing to:

-    "include": ["src/index.ts"],
+    "include": ["src/**/*.ts"],

This ensures all TypeScript files, including generated parser code, are properly type-checked during compilation.

packages/queryLanguage/src/query.grammar (1)

23-30: Negation currently cannot target bare terms (only prefixes/paren groups)

NegateExpr only accepts negate before PrefixExpr or ParenExpr, so something like -foo won’t ever become a NegateExpr node; it’ll only work if you always express exclusions as -file:..., -repo:..., or -(...). If the intended UX is that -foo excludes results matching a plain term, consider broadening this rule:

-NegateExpr { !negate negate (PrefixExpr | ParenExpr) }
+NegateExpr { !negate negate expr }

This would allow negation of any expr (including bare Term) while still supporting prefixes and grouped subexpressions.

packages/queryLanguage/test/prefixes.txt (1)

1-335: Prefix test coverage is strong; consider a few more operator combinations

This suite hits a wide range of prefix/value shapes (short forms, wildcards, regexy patterns, invalid values), which is great. If you expect users to mix prefixes heavily with or and negation, it could be worth adding a couple of cases like:

  • file:test.js or repo:myproject
  • -file:test.js lang:typescript
  • (file:*.ts or file:*.tsx) lang:typescript

to lock in parse shapes for those combinations.

packages/queryLanguage/test/basic.txt (1)

1-71: Solid basic coverage; a couple of extra lexical edge cases might help

These cases do a nice job exercising plain terms, adjacency, and regex-like patterns. If you want to harden the lexer further, you could optionally add inputs like:

  • order or or1 (ensure only standalone or becomes the operator)
  • -foo bar (whatever behavior you intend around leading dashes)
  • hello_or_world

to document how the grammar treats borderline operator/word collisions.

packages/queryLanguage/test.ts (1)

1-46: Turn this into an automated test and use the public parser entrypoint

The pretty‑print/reconstruction logic is fine for a quick sanity check, but as a committed file it’s more useful if:

  • It’s wired into your actual test runner (e.g., moved under test/ with an assertion instead of console logging).
  • It imports the parser from the package’s public surface (e.g., ./src/index or the published module name) rather than the generated ./src/parser, so it exercises what consumers will use.

Doing that would turn this from a one‑off debug script into a stable regression test for the grammar.

packages/queryLanguage/src/tokens.ts (1)

27-58: Remove duplicate comment.

Lines 28 and 32 contain identical comments. Remove the duplicate to improve code clarity.

Apply this diff:

 // Check if followed by a prefix keyword (by checking for keyword followed by colon)
-// Look ahead until we hit a delimiter or colon
 const checkPos = input.pos;
 let foundColon = false;
 
 // Look ahead until we hit a delimiter or colon
 while (ch >= 0) {

Consider clarifying EOF handling.

The lookahead loop condition ch >= 0 (Line 33) correctly handles EOF (which returns a negative value in Lezer), but this behavior isn't documented. Consider adding a comment to make the EOF handling explicit.

Example:

 // Look ahead until we hit a delimiter or colon
-while (ch >= 0) {
+while (ch >= 0) { // ch < 0 indicates EOF
packages/web/src/features/search/zoektSearcher.ts (6)

171-318: Streaming lifecycle mostly robust; consider explicit guards around controller usage

The SSE streaming implementation correctly:

  • Tracks in-flight chunks (pendingChunks) and defers closing until both isStreamActive is false and pendingChunks === 0.
  • Pauses/resumes the gRPC stream to avoid completing before async processing is done.
  • Converts processing errors into typed error chunks rather than tearing down the stream abruptly.

Two small robustness considerations:

  1. Post-error gRPC stream lifetime
    In the per-chunk catch, isStreamActive is set to false and an error chunk is enqueued, but the underlying gRPC stream is neither cancelled nor closed explicitly. You do close the client via tryCloseController, so this is likely fine, but explicitly cancelling the stream can free resources sooner:

    } catch (error) {
      // ...
      isStreamActive = false;
  • grpcStream?.cancel();
    }

2. **Cancel callback and finalization**  
The `cancel()` handler sets `isStreamActive = false` and calls `grpcStream.cancel()`/`client.close()`, but doesn’t invoke `tryCloseController`. That’s acceptable (consumer has cancelled), but if you want symmetric behavior (e.g., always emit a final “done” marker on server-side aborts), consider calling `tryCloseController()` once in-flight chunks drain.

Both are optional refinements; the current implementation should behave correctly for the common happy-path and error-path cases.

---

`329-368`: **Per-chunk repo lookup and caching are solid; consider minor ergonomics**

The repo resolution logic per chunk behaves well:

- Per-request cache (`_reposMapCache`) avoids redundant DB hits across chunks.
- Per-chunk `reposMap` limits lookups to just the repositories actually present in that response.
- Numeric vs string ids are handled via `getRepoIdForFile`.

Two optional improvements:

- You could avoid the extra `reposMap` Map and just reuse the shared cache for lookups, since you already filter by presence before transforming files, but the current separation is clear and not problematic.
- If Prisma throws (e.g., transient DB issues), that error will bubble into `zoektStreamSearch`’s per-chunk `catch`, which is good; combined with the fix suggested for `zoektSearch`, error propagation is consistent across both search modes.

No required changes here.

---

`370-446`: **Strict “repo not found” behavior can hard-fail searches**

In `transformZoektSearchResponse`, if a file’s repository can’t be found in the DB/cache, you throw:

```ts
if (!repo) {
 throw new Error(`Repository not found for file: ${file.file_name}`);
}

Given out-of-sync states between Zoekt indices and your Repo table are possible (e.g., repo deleted in DB but still indexed, temporary ingestion lag), this will hard-fail the whole search.

Consider a more forgiving behavior, e.g.:

  • Skip such files and continue, or
  • Emit them with a sentinel repo name/id and surface a warning elsewhere.

Example adjustment:

-        if (!repo) {
-            throw new Error(`Repository not found for file: ${file.file_name}`);
-        }
+        if (!repo) {
+            // Skip files whose repositories are missing in the DB to avoid failing the entire search.
+            return undefined;
+        }
...
-    }).filter(file => file !== undefined);
+    }).filter((file): file is SearchResultFile => file !== undefined);

This keeps the search resilient to transient data inconsistencies.


448-480: Double-check units and semantics for duration/wait and flushReason

The stats mapping looks coherent, but relies on external Zoekt semantics:

  • duration, wait, matchTree* fields are populated from the .nanos part only. If Zoekt ever returns non-zero seconds, these will under-report timings.
  • flushReason is set via response.stats?.flush_reason?.toString() and then used as a string; this matches the z.string() type, but you might want to standardize on the enum values directly (no .toString()) for easier downstream handling.

Both are minor and can be left as-is, but worth verifying against Zoekt’s current SearchStats definition and the behavior of your gRPC/proto loader.


544-572: Stats accumulation across stream chunks may overcount depending on Zoekt semantics

accumulateStats currently adds all numeric fields (including totalMatchCount, fileCount, duration fields, etc.) across chunks, and keeps the first non-unknown flushReason.

This is correct only if Zoekt’s streamed SearchStats are per-chunk deltas. If Zoekt instead sends cumulative stats (“so far”) for each flush, summing will overcount and could:

  • Inflate totalMatchCount, fileCount, etc.
  • Distort isSearchExhaustive in the streaming final response, which compares accumulatedStats.totalMatchCount to accumulatedStats.actualMatchCount.

If Zoekt’s stats are cumulative, the accumulator should instead take the last stats object (or max per field) rather than summing.

Please confirm from the Zoekt docs or implementation whether SearchStats in streamed responses are deltas or cumulative, and adjust accumulateStats accordingly.


513-541: gRPC client construction: URL parsing and path robustness

Overall the client setup is clear and matches the generated types. Two small robustness points:

  1. Port extraction from ZOEKT_WEBSERVER_URL

    If the URL omits an explicit port, zoektUrl.port will be '', leading to an address like hostname:. You might want a fallback:

  • const grpcAddress = ${zoektUrl.hostname}:${zoektUrl.port};
  • const port = zoektUrl.port || (zoektUrl.protocol === 'https:' ? '443' : '80');
  • const grpcAddress = ${zoektUrl.hostname}:${port};

2. **Proto path based on `process.cwd()`**

Using `process.cwd()` with `'../../vendor/zoekt/grpc/protos'` assumes a specific working-directory layout at runtime. If this ever runs from a different cwd (e.g., Next.js production server root), proto loading will fail. Consider anchoring paths via a config/env var, or document the assumption clearly.

Both are defensive improvements; the current code is fine if your deployment environment guarantees cwd and URL shape.

</blockquote></details>
<details>
<summary>packages/web/src/features/search/types.ts (1)</summary><blockquote>

`33-56`: **SearchStats shape matches Zoekt stats mapping**

The `searchStatsSchema` fields and comments mirror the Zoekt `SearchStats` fields you populate in `transformZoektSearchResponse`. That alignment is good.

Optional improvement: if you want stronger validation, you could constrain these to non-negative integers, e.g.:

```ts
actualMatchCount: z.number().int().nonnegative(),

for the count fields and similar for the byte counters, while keeping simple z.number() for timing values if they might be fractional in the future.

packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx (1)

72-78: Cosmetic dependency array reordering.

The dependency array for branchDisplayName was reordered from [isBranchFilteringEnabled, branches] to [branches, isBranchFilteringEnabled]. While this doesn't affect React's memoization behavior, it may be for consistency with a linting rule or code style.

packages/web/src/features/search/fileSourceApi.ts (1)

8-38: Validate IR shape and regex escaping for repository / fileName

The QueryIR shape here looks reasonable, but both repo.regexp and the file regexp.regexp are being populated directly from repository and fileName. If these values can contain regex metacharacters (e.g. +, ?, [], etc.), this will change the match semantics compared to a literal path/repo match and could even produce invalid patterns, depending on Zoekt’s expectations.

If the intent is “exact repo” and “exact file path”:

  • Consider normalizing both to anchored, escaped patterns, e.g. ^<escapedRepo>$ / ^<escapedPath>$, or
  • Use an IR helper that treats these as literal names rather than raw regexes (if such an abstraction exists in your new IR layer).

This is especially important for getFileSource, which is effectively a “fetch exact file” path rather than a fuzzy search.

Also applies to: 41-47

packages/web/src/app/[domain]/search/components/filterPanel/index.tsx (1)

15-20: onFilterChange / isStreaming integration looks solid

The new props are plumbed cleanly: onFilterChange is only invoked when query params actually change, and isStreaming is consistently passed down to both Filter components. This should make it straightforward for the parent to react to filter changes during streaming without introducing extra renders.

If you find the repo/language onEntryClicked blocks diverging over time, consider extracting the “toggle and sync URL + fire onFilterChange” logic into a small shared helper, but that’s optional.

Also applies to: 36-38, 42-44, 137-187

packages/web/src/prisma.ts (1)

3-3: Repo permission filter extraction looks correct; consider safer where composition

The refactor to accept user?: UserWithAccounts and delegate repo scoping to getRepoPermissionFilterForUser keeps the semantics clear: repo queries under the feature flag are constrained to repositories that are either permitted to one of the user’s accounts or public, with unauthenticated/no-account users seeing only public repos.

One nuance to be aware of: in repo.$allOperations, the new filter is merged via:

argsWithWhere.where = {
  ...(argsWithWhere.where || {}),
  ...getRepoPermissionFilterForUser(user),
};

If callers ever provide their own OR in argsWithWhere.where, this spread will overwrite it with the permission filter’s OR. Today that may be fine (or pre-existing), but if you expect more complex repo predicates in the future, it might be safer to wrap these in an explicit AND: [originalWhere, permissionFilter] instead.

Not a blocker, but worth keeping in mind as you expand repo query usage.

Also applies to: 35-52, 61-85

packages/web/src/app/[domain]/components/searchBar/searchBar.tsx (1)

95-104: Consider syncing regex/case flags from updated defaults like query

defaultIsRegexEnabled / defaultIsCaseSensitivityEnabled are only used as initial state; if the parent updates defaults (e.g. when navigating back/forward and rebuilding search params), the toggles won’t follow, while query is explicitly resynced via useEffect. If the intent is for the UI to always reflect the URL/props, consider adding similar effects for the flags:

useEffect(() => {
    setIsRegexEnabled(defaultIsRegexEnabled);
}, [defaultIsRegexEnabled]);

useEffect(() => {
    setIsCaseSensitivityEnabled(defaultIsCaseSensitivityEnabled);
}, [defaultIsCaseSensitivityEnabled]);

Or confirm that flags are intentionally “sticky” across navigation and don’t need to follow defaults.

Also applies to: 113-115, 119-133

packages/web/src/app/[domain]/search/useStreamedSearch.ts (2)

9-35: Cache key & entry shape are tightly coupled to current SearchRequest and omit stats

Two small things to consider:

  • createCacheKey manually lists a subset of SearchRequest fields. If SearchRequest later grows with result-affecting fields (e.g. repo scope, domain, filters), the cache will start cross-contaminating queries unless this function is updated in lockstep. You might want to either:

    • Base the key on the entire SearchRequest object (e.g. JSON.stringify(params)), or
    • Add a unit test that fails when SearchRequest changes without updating createCacheKey.
  • CacheEntry doesn’t store stats, and the cached setState path also omits stats, so repeated identical queries served from cache will never have stats populated even if the original streamed run did. If the UI relies on stats, consider adding it to CacheEntry and wiring it through in both the cache write and read paths.

Also applies to: 91-106, 233-245


152-231: SSE parsing assumes single-line data: events and only breaks inner loop on [DONE]

The SSE loop works for the current “data: <JSON>\n\n per event” shape, but it has a few assumptions:

  • message.match(/^data: (.+)$/) only handles a single data: line per event and ignores any additional lines/fields. If the server ever emits multi-line data: payloads or other SSE fields, this parser will start dropping information.
  • When encountering [DONE], you break only the inner for (const message of messages) loop; the outer while continues reading until the stream naturally ends. If the server keeps the SSE connection open after sending [DONE], this hook would never finalize/search-cache until the connection closes.

If you expect to rely on a [DONE] sentinel, consider:

  • Tracking a doneStreaming flag that breaks the outer loop when [DONE] is seen, and
  • Optionally making the parsing a bit more tolerant to additional data: lines if the backend ever evolves.
packages/web/src/features/codeNav/api.ts (3)

23-45: Consider escaping symbolName in the IR regexp to avoid unintended regex behavior

Both reference and definition queries build a regexp around symbolName with \\b${symbolName}\\b, but symbolName is interpolated raw. If it can contain regex metacharacters (e.g., ., *, +, ?, []), this will change the meaning of the pattern and could either over‑match or fail to match the intended symbol.

If you want literal symbol lookup, consider escaping the name before interpolation, e.g.:

+const escapeRegex = (value: string) =>
+  value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");

- regexp: `\\b${symbolName}\\b`,
+ regexp: `\\b${escapeRegex(symbolName)}\\b`,

If, instead, you intentionally rely on regex semantics here, it would be good to document that assumption and add tests for symbols containing regex‑special characters.

Also applies to: 74-88


116-140: Simplified response parsing is good; consider map instead of flatMap on files

The new parseRelatedSymbolsSearchResponse that directly maps SearchResponseFindRelatedSymbolsResponse is clear, and the inner chunks.flatMap(...matchRanges...) with a final .filter(file => file.matches.length > 0) makes sense.

Minor nit: searchResult.files.flatMap((file) => { return { ... } }) always returns a single object per file, so flatMap behaves like map here and is a bit surprising to readers. Switching to map would better express intent without behavior change:

- files: searchResult.files.flatMap((file) => {
+ files: searchResult.files.map((file) => {
   // ...
- }).filter((file) => file.matches.length > 0),
+ }).filter((file) => file.matches.length > 0),

143-181: Language IR filter expansion looks good; just confirm language values match Zoekt’s language names

The getExpandedLanguageFilter helper that builds an or over TypeScript/JavaScript/JSX/TSX (and otherwise a single language node) is a nice improvement over string‑based filters. Please just ensure the language strings here exactly match the language identifiers emitted by your indexer/Zoekt configuration so the filters don’t silently miss matches.

packages/web/src/features/search/parser.ts (1)

126-203: Unknown AST node types defaulting to const: true can silently broaden queries

In transformTreeToIR’s default branch, unknown node types log a warning and return { const: true, query: "const" }, effectively “match all”. That’s safe in the sense of not throwing, but if the grammar evolves and new nodes appear, this will silently broaden user queries instead of failing fast.

Consider failing closed by throwing (or converting to a parse error as per the previous comment), so unsupported syntax doesn’t produce unexpectedly wide result sets.

packages/web/src/app/[domain]/search/components/searchResultsPage.tsx (2)

80-86: Factor out error message construction to keep UI and toast in sync

Both the toast and inline error message compute the same ternary expression for ServiceErrorException vs generic Error. To keep things DRY and ensure they never diverge, you could precompute errorMessage once:

-    useEffect(() => {
-        if (error) {
-            toast({
-                description: `❌ Search failed. Reason: ${error instanceof ServiceErrorException ? error.serviceError.message : error.message}`,
-            });
-        }
-    }, [error, toast]);
+    const errorMessage =
+        error instanceof ServiceErrorException
+            ? error.serviceError.message
+            : error?.message;
+
+    useEffect(() => {
+        if (errorMessage) {
+            toast({
+                description: `❌ Search failed. Reason: ${errorMessage}`,
+            });
+        }
+    }, [errorMessage, toast]);
@@
-                    <p className="text-sm text-center">{error instanceof ServiceErrorException ? error.serviceError.message : error.message}</p>
+                    <p className="text-sm text-center">{errorMessage}</p>

This keeps the error handling consistent and slightly simplifies the JSX.

Also applies to: 184-189


105-114: Consider gating console.debug timing logs to non‑production environments

The console.debug calls for timeToFirstSearchResultMs and timeToSearchCompletionMs are useful during development, but may add noise in production consoles.

If you don’t rely on them in prod, consider wrapping them in an environment check or removing them once metrics are fully wired into analytics:

-        console.debug('timeToFirstSearchResultMs:', timeToFirstSearchResultMs);
-        console.debug('timeToSearchCompletionMs:', timeToSearchCompletionMs);
+        if (process.env.NODE_ENV !== 'production') {
+            console.debug('timeToFirstSearchResultMs:', timeToFirstSearchResultMs);
+            console.debug('timeToSearchCompletionMs:', timeToSearchCompletionMs);
+        }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 09507d3 and 2f6fc04.

⛔ Files ignored due to path filters (1)
  • yarn.lock is excluded by !**/yarn.lock, !**/*.lock
📒 Files selected for processing (107)
  • .env.development (0 hunks)
  • CHANGELOG.md (1 hunks)
  • Dockerfile (5 hunks)
  • package.json (2 hunks)
  • packages/backend/src/index.ts (0 hunks)
  • packages/db/src/index.ts (1 hunks)
  • packages/mcp/CHANGELOG.md (1 hunks)
  • packages/mcp/src/index.ts (1 hunks)
  • packages/mcp/src/schemas.ts (2 hunks)
  • packages/queryLanguage/.gitignore (1 hunks)
  • packages/queryLanguage/package.json (1 hunks)
  • packages/queryLanguage/src/index.ts (1 hunks)
  • packages/queryLanguage/src/parser.terms.ts (1 hunks)
  • packages/queryLanguage/src/parser.ts (1 hunks)
  • packages/queryLanguage/src/query.grammar (1 hunks)
  • packages/queryLanguage/src/tokens.ts (1 hunks)
  • packages/queryLanguage/test.ts (1 hunks)
  • packages/queryLanguage/test/basic.txt (1 hunks)
  • packages/queryLanguage/test/grammar.test.ts (1 hunks)
  • packages/queryLanguage/test/grouping.txt (1 hunks)
  • packages/queryLanguage/test/negation.txt (1 hunks)
  • packages/queryLanguage/test/operators.txt (1 hunks)
  • packages/queryLanguage/test/precedence.txt (1 hunks)
  • packages/queryLanguage/test/prefixes.txt (1 hunks)
  • packages/queryLanguage/test/quoted.txt (1 hunks)
  • packages/queryLanguage/tsconfig.json (1 hunks)
  • packages/queryLanguage/vitest.config.ts (1 hunks)
  • packages/web/.eslintignore (1 hunks)
  • packages/web/package.json (3 hunks)
  • packages/web/src/actions.ts (1 hunks)
  • packages/web/src/app/[domain]/browse/layout.tsx (1 hunks)
  • packages/web/src/app/[domain]/components/lightweightCodeHighlighter.tsx (1 hunks)
  • packages/web/src/app/[domain]/components/pathHeader.tsx (1 hunks)
  • packages/web/src/app/[domain]/components/searchBar/constants.ts (1 hunks)
  • packages/web/src/app/[domain]/components/searchBar/searchBar.tsx (5 hunks)
  • packages/web/src/app/[domain]/components/searchBar/searchSuggestionsBox.tsx (4 hunks)
  • packages/web/src/app/[domain]/components/searchBar/useRefineModeSuggestions.ts (1 hunks)
  • packages/web/src/app/[domain]/components/searchBar/useSuggestionModeMappings.ts (1 hunks)
  • packages/web/src/app/[domain]/components/searchBar/useSuggestionsData.ts (1 hunks)
  • packages/web/src/app/[domain]/components/searchBar/zoektLanguageExtension.ts (1 hunks)
  • packages/web/src/app/[domain]/search/components/codePreviewPanel/codePreview.tsx (1 hunks)
  • packages/web/src/app/[domain]/search/components/codePreviewPanel/index.tsx (1 hunks)
  • packages/web/src/app/[domain]/search/components/filterPanel/filter.tsx (3 hunks)
  • packages/web/src/app/[domain]/search/components/filterPanel/index.tsx (5 hunks)
  • packages/web/src/app/[domain]/search/components/filterPanel/useFilterMatches.ts (1 hunks)
  • packages/web/src/app/[domain]/search/components/searchResultsPage.tsx (9 hunks)
  • packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatch.tsx (1 hunks)
  • packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx (2 hunks)
  • packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx (7 hunks)
  • packages/web/src/app/[domain]/search/page.tsx (2 hunks)
  • packages/web/src/app/[domain]/search/useStreamedSearch.ts (1 hunks)
  • packages/web/src/app/api/(client)/client.ts (1 hunks)
  • packages/web/src/app/api/(server)/search/route.ts (2 hunks)
  • packages/web/src/app/api/(server)/source/route.ts (1 hunks)
  • packages/web/src/app/api/(server)/stream_search/route.ts (1 hunks)
  • packages/web/src/ee/features/codeNav/components/exploreMenu/referenceList.tsx (1 hunks)
  • packages/web/src/ee/features/codeNav/components/symbolHoverPopup/symbolDefinitionPreview.tsx (1 hunks)
  • packages/web/src/ee/features/codeNav/components/symbolHoverPopup/useHoveredOverSymbolInfo.ts (1 hunks)
  • packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts (1 hunks)
  • packages/web/src/features/chat/tools.ts (5 hunks)
  • packages/web/src/features/codeNav/api.ts (4 hunks)
  • packages/web/src/features/codeNav/types.ts (1 hunks)
  • packages/web/src/features/search/README.md (1 hunks)
  • packages/web/src/features/search/fileSourceApi.ts (1 hunks)
  • packages/web/src/features/search/index.ts (1 hunks)
  • packages/web/src/features/search/ir.ts (1 hunks)
  • packages/web/src/features/search/parser.ts (1 hunks)
  • packages/web/src/features/search/schemas.ts (0 hunks)
  • packages/web/src/features/search/searchApi.ts (1 hunks)
  • packages/web/src/features/search/types.ts (1 hunks)
  • packages/web/src/features/search/zoektClient.ts (0 hunks)
  • packages/web/src/features/search/zoektSchema.ts (0 hunks)
  • packages/web/src/features/search/zoektSearcher.ts (1 hunks)
  • packages/web/src/lib/errorCodes.ts (1 hunks)
  • packages/web/src/lib/extensions/searchResultHighlightExtension.ts (1 hunks)
  • packages/web/src/lib/posthogEvents.ts (2 hunks)
  • packages/web/src/lib/types.ts (1 hunks)
  • packages/web/src/lib/utils.ts (1 hunks)
  • packages/web/src/prisma.ts (4 hunks)
  • packages/web/src/proto/google/protobuf/Duration.ts (1 hunks)
  • packages/web/src/proto/google/protobuf/Timestamp.ts (1 hunks)
  • packages/web/src/proto/query.ts (1 hunks)
  • packages/web/src/proto/webserver.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/And.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Boost.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Branch.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/BranchRepos.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/BranchesRepos.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/ChunkMatch.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/FileMatch.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/FileNameSet.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/FlushReason.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/IndexMetadata.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Language.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/LineFragmentMatch.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/LineMatch.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/ListOptions.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/ListRequest.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/ListResponse.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Location.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/MinimalRepoListEntry.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Not.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Or.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Progress.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Q.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/Range.ts (1 hunks)
  • packages/web/src/proto/zoekt/webserver/v1/RawConfig.ts (1 hunks)
⛔ Files not processed due to max files limit (21)
  • packages/web/src/proto/zoekt/webserver/v1/Regexp.ts
  • packages/web/src/proto/zoekt/webserver/v1/Repo.ts
  • packages/web/src/proto/zoekt/webserver/v1/RepoIds.ts
  • packages/web/src/proto/zoekt/webserver/v1/RepoListEntry.ts
  • packages/web/src/proto/zoekt/webserver/v1/RepoRegexp.ts
  • packages/web/src/proto/zoekt/webserver/v1/RepoSet.ts
  • packages/web/src/proto/zoekt/webserver/v1/RepoStats.ts
  • packages/web/src/proto/zoekt/webserver/v1/Repository.ts
  • packages/web/src/proto/zoekt/webserver/v1/RepositoryBranch.ts
  • packages/web/src/proto/zoekt/webserver/v1/SearchOptions.ts
  • packages/web/src/proto/zoekt/webserver/v1/SearchRequest.ts
  • packages/web/src/proto/zoekt/webserver/v1/SearchResponse.ts
  • packages/web/src/proto/zoekt/webserver/v1/Stats.ts
  • packages/web/src/proto/zoekt/webserver/v1/StreamSearchRequest.ts
  • packages/web/src/proto/zoekt/webserver/v1/StreamSearchResponse.ts
  • packages/web/src/proto/zoekt/webserver/v1/Substring.ts
  • packages/web/src/proto/zoekt/webserver/v1/Symbol.ts
  • packages/web/src/proto/zoekt/webserver/v1/SymbolInfo.ts
  • packages/web/src/proto/zoekt/webserver/v1/Type.ts
  • packages/web/src/proto/zoekt/webserver/v1/WebserverService.ts
  • packages/web/src/withAuthV2.ts
💤 Files with no reviewable changes (5)
  • packages/backend/src/index.ts
  • .env.development
  • packages/web/src/features/search/zoektClient.ts
  • packages/web/src/features/search/schemas.ts
  • packages/web/src/features/search/zoektSchema.ts
🧰 Additional context used
🪛 Biome (2.1.2)
packages/web/src/features/search/ir.ts

[error] 4-4: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

🪛 LanguageTool
packages/queryLanguage/test/negation.txt

[grammar] ~193-~193: Use a hyphen to join words.
Context: ...PrefixExpr(FileExpr))))) # Negate short form prefix -f:test.js ==> Program(Ne...

(QB_NEW_EN_HYPHEN)


[grammar] ~209-~209: Use a hyphen to join words.
Context: ...r(PrefixExpr(RepoExpr))) # Negate short form content -c:console ==> Program(N...

(QB_NEW_EN_HYPHEN)

packages/queryLanguage/test/basic.txt

[typographical] ~10-~10: This greeting should probably end with a comma.
Context: ... ==> Program(Term) # Multiple terms hello world ==> Program(AndExpr(Term,Term)) # Mu...

(EN_GREETING_WITHOUT_COMMA)

packages/queryLanguage/test/quoted.txt

[grammar] ~465-~465: Use a hyphen to join words.
Context: ...m,PrefixExpr(FileExpr))) # Quoted short form prefix "f:test" ==> Program(Term...

(QB_NEW_EN_HYPHEN)

packages/queryLanguage/test/operators.txt

[grammar] ~97-~97: Use a hyphen to join words.
Context: ...ixExpr(FileExpr),Term)) # OR with short form prefixes f:test.js or r:myrepo ==...

(QB_NEW_EN_HYPHEN)

Comment on lines +24 to +30
export const searchOptionsSchema = z.object({
matches: z.number(), // The number of matches to return.
contextLines: z.number().optional(), // The number of context lines to return.
whole: z.boolean().optional(), // Whether to return the whole file as part of the response.
isRegexEnabled: z.boolean().optional(), // Whether to enable regular expression search.
isCaseSensitivityEnabled: z.boolean().optional(), // Whether to enable case sensitivity.
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add validation constraints to numeric fields.

The matches field accepts any number, including negative, zero, or unreasonably large values, which could lead to unexpected behavior or resource issues downstream. Similarly, contextLines lacks constraints when provided.

Apply this diff to add sensible constraints:

 export const searchOptionsSchema = z.object({
-    matches: z.number(),                              // The number of matches to return.
-    contextLines: z.number().optional(),              // The number of context lines to return.
+    matches: z.number().int().positive(),             // The number of matches to return.
+    contextLines: z.number().int().nonnegative().optional(), // The number of context lines to return.
     whole: z.boolean().optional(),                    // Whether to return the whole file as part of the response.
     isRegexEnabled: z.boolean().optional(),           // Whether to enable regular expression search.
     isCaseSensitivityEnabled: z.boolean().optional(), // Whether to enable case sensitivity.
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export const searchOptionsSchema = z.object({
matches: z.number(), // The number of matches to return.
contextLines: z.number().optional(), // The number of context lines to return.
whole: z.boolean().optional(), // Whether to return the whole file as part of the response.
isRegexEnabled: z.boolean().optional(), // Whether to enable regular expression search.
isCaseSensitivityEnabled: z.boolean().optional(), // Whether to enable case sensitivity.
});
export const searchOptionsSchema = z.object({
matches: z.number().int().positive(), // The number of matches to return.
contextLines: z.number().int().nonnegative().optional(), // The number of context lines to return.
whole: z.boolean().optional(), // Whether to return the whole file as part of the response.
isRegexEnabled: z.boolean().optional(), // Whether to enable regular expression search.
isCaseSensitivityEnabled: z.boolean().optional(), // Whether to enable case sensitivity.
});
🤖 Prompt for AI Agents
In packages/mcp/src/schemas.ts around lines 24 to 30, the numeric fields lack
validation constraints: `matches` currently allows negative, zero, or huge
numbers and `contextLines` has no bounds when present. Change `matches` to an
integer with a sensible lower and upper bound (e.g., .int().min(1).max(100)) and
change `contextLines` to an optional integer with a non-negative upper bound
(e.g., .optional().int().min(0).max(50)); keep other fields unchanged and ensure
the schema rejects non-integers and out-of-range values.

"start": "next start",
"lint": "cross-env SKIP_ENV_VALIDATION=1 eslint .",
"test": "cross-env SKIP_ENV_VALIDATION=1 vitest",
"generate:protos": "proto-loader-gen-types --includeComments --longs=Number --enums=String --defaults --oneofs --grpcLib=@grpc/grpc-js --keepCase --includeDirs=../../vendor/zoekt/grpc/protos --outDir=src/proto zoekt/webserver/v1/webserver.proto zoekt/webserver/v1/query.proto",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify proto file paths and script correctness.

The generate:protos script references proto files at ../../vendor/zoekt/grpc/protos. Ensure these files exist at:

  • vendor/zoekt/grpc/protos/zoekt/webserver/v1/webserver.proto
  • vendor/zoekt/grpc/protos/zoekt/webserver/v1/query.proto

The script itself appears well-configured with appropriate flags for TypeScript generation.


🏁 Script executed:

#!/bin/bash
# Verify proto files exist at the expected paths
echo "Checking for proto files..."
fd -t f "webserver.proto|query.proto" vendor/zoekt/grpc/protos/

Length of output: 140



🏁 Script executed:

#!/bin/bash
# Search for proto files anywhere in the repository
echo "=== Searching for proto files in entire repo ==="
find . -name "webserver.proto" -o -name "query.proto" 2>/dev/null

echo ""
echo "=== Checking if vendor directory exists ==="
ls -la vendor/ 2>&1 | head -20

echo ""
echo "=== Checking if zoekt directory exists anywhere ==="
find . -type d -name "zoekt" 2>/dev/null | head -20

Length of output: 358



🏁 Script executed:

#!/bin/bash
# Check vendor/zoekt structure
echo "=== Contents of vendor/zoekt ==="
find vendor/zoekt -type f 2>/dev/null | head -30

echo ""
echo "=== Contents of vendor/zoekt (tree view) ==="
ls -laR vendor/zoekt/ 2>&1 | head -50

echo ""
echo "=== Check .gitignore for proto file patterns ==="
rg -i "proto|vendor" .gitignore 2>/dev/null

Length of output: 250


Proto files referenced in the script do not exist in the repository.

The generate:protos script references source proto files that are missing:

  • Expected path: vendor/zoekt/grpc/protos/zoekt/webserver/v1/
  • Expected files: webserver.proto, query.proto
  • Actual state: The vendor/zoekt/ directory exists but is empty; proto files are not present anywhere in the repository

This will cause the script to fail when executed. Either:

  1. Add the missing proto files to vendor/zoekt/grpc/protos/zoekt/webserver/v1/
  2. Update the script to reference the correct proto file location
  3. Remove the script if proto generation is not needed for this PR
🤖 Prompt for AI Agents
In packages/web/package.json around line 11, the "generate:protos" npm script
points at proto files that are not present in the repo
(vendor/zoekt/grpc/protos/zoekt/webserver/v1/{webserver.proto,query.proto}); fix
by either (A) adding the missing proto files to
vendor/zoekt/grpc/protos/zoekt/webserver/v1/ in this PR, (B) updating the script
to point to the actual proto location in the repo (or an external fetch step) so
the referenced files exist at runtime, or (C) remove the generate:protos script
entirely if proto generation is not required for this change — ensure whichever
option you pick is reflected in package.json and any CI scripts that call it.

Comment on lines 55 to 62
const {
scrollOffset: restoreOffset,
measurementsCache: restoreMeasurementsCache,
showAllMatchesStates: restoreShowAllMatchesStates,
showAllMatchesMap: restoreShowAllMatchesStates,
} = history.state as ScrollHistoryState;

const [showAllMatchesStates, setShowAllMatchesStates] = useState(restoreShowAllMatchesStates || Array(fileMatches.length).fill(false));
const [showAllMatchesMap, showAllMatchesActions] = useMap<string, boolean>(restoreShowAllMatchesStates || []);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard against history.state being null on initial load

const { scrollOffset, measurementsCache, showAllMatchesMap } = history.state as ScrollHistoryState; will throw at runtime when history.state is null/undefined (which is the default before any replaceState), since the destructuring happens unguarded.

Recommend defaulting to an empty object before casting so the initial render is safe:

-    const {
-        scrollOffset: restoreOffset,
-        measurementsCache: restoreMeasurementsCache,
-        showAllMatchesMap: restoreShowAllMatchesStates,
-    } = history.state as ScrollHistoryState;
+    const {
+        scrollOffset: restoreOffset,
+        measurementsCache: restoreMeasurementsCache,
+        showAllMatchesMap: restoreShowAllMatchesStates,
+    } = (history.state ?? {}) as ScrollHistoryState;
🤖 Prompt for AI Agents
In packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx
around lines 55 to 62, the code destructures history.state directly which will
throw when history.state is null/undefined on initial load; change the
destructuring to use a safe default object before casting (e.g., use
(history.state ?? {}) as ScrollHistoryState) so
scrollOffset/measurementsCache/showAllMatchesMap default to undefined rather
than causing a runtime error, and keep passing restoreShowAllMatchesStates || []
into useMap to preserve the existing fallback.

Comment on lines +1 to +8
import { Q as QueryIR } from '@/proto/zoekt/webserver/v1/Q';
import { RawConfig } from '@/proto/zoekt/webserver/v1/RawConfig';
import { Regexp } from '@/proto/zoekt/webserver/v1/Regexp';
import { Symbol } from '@/proto/zoekt/webserver/v1/Symbol';
import { Language } from '@/proto/zoekt/webserver/v1/Language';
import { Repo } from '@/proto/zoekt/webserver/v1/Repo';
import { RepoRegexp } from '@/proto/zoekt/webserver/v1/RepoRegexp';
import { BranchesRepos } from '@/proto/zoekt/webserver/v1/BranchesRepos';
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Rename imported Symbol to avoid shadowing the global Symbol (lint error)

The import import { Symbol } from '@/proto/zoekt/webserver/v1/Symbol'; shadows the built‑in global Symbol, which static analysis correctly flags. While it works at runtime, it’s confusing and will keep lint failing.

You can fix this by renaming the import and updating the isSymbolQuery type guard:

-import { Symbol } from '@/proto/zoekt/webserver/v1/Symbol';
+import { Symbol as ZoektSymbol } from '@/proto/zoekt/webserver/v1/Symbol';
@@
-export const isSymbolQuery = (query: QueryIR): query is QueryIR & { symbol: Symbol } => query.symbol != null;
+export const isSymbolQuery = (query: QueryIR): query is QueryIR & { symbol: ZoektSymbol } => query.symbol != null;

Also applies to: 27-28

🧰 Tools
🪛 Biome (2.1.2)

[error] 4-4: Do not shadow the global "Symbol" property.

Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.

(lint/suspicious/noShadowRestrictedNames)

🤖 Prompt for AI Agents
In packages/web/src/features/search/ir.ts around lines 1 to 8 (and also apply
changes at lines 27-28), the imported name "Symbol" shadows the global Symbol
causing lint errors; rename the import (e.g., import { Symbol as ProtoSymbol }
from '@/proto/zoekt/webserver/v1/Symbol') and update all uses and the
isSymbolQuery type guard to refer to the renamed identifier (ProtoSymbol) so the
global Symbol is not shadowed and the type guard compiles/lints cleanly.

Comment on lines +55 to +107
export const parseQuerySyntaxIntoIR = async ({
query,
options,
prisma,
}: {
query: string,
options: {
isCaseSensitivityEnabled?: boolean;
isRegexEnabled?: boolean;
},
prisma: PrismaClient,
}): Promise<QueryIR> => {

try {
// First parse the query into a Lezer tree.
const tree = parser.parse(query);

// Then transform the tree into the intermediate representation.
return transformTreeToIR({
tree,
input: query,
isCaseSensitivityEnabled: options.isCaseSensitivityEnabled ?? false,
isRegexEnabled: options.isRegexEnabled ?? false,
onExpandSearchContext: async (contextName: string) => {
const context = await prisma.searchContext.findUnique({
where: {
name_orgId: {
name: contextName,
orgId: SINGLE_TENANT_ORG_ID,
}
},
include: {
repos: true,
}
});

if (!context) {
throw new Error(`Search context "${contextName}" not found`);
}

return context.repos.map((repo) => repo.name);
},
});
} catch (error) {
if (error instanceof SyntaxError) {
throw new ServiceErrorException({
statusCode: StatusCodes.BAD_REQUEST,
errorCode: ErrorCode.FAILED_TO_PARSE_QUERY,
message: `Failed to parse query "${query}" with message: ${error.message}`,
});
}
throw error;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize all parse/validation failures to a consistent ServiceErrorException

parseQuerySyntaxIntoIR only maps SyntaxError to ServiceErrorException, but transformTreeToIR and transformPrefixExpr throw plain Error for many user-facing validation issues (e.g., invalid visibility:, archived:, fork: values, missing colons, unknown prefixes, missing search contexts). Those will currently bubble out as generic 500s instead of a structured BAD_REQUEST with FAILED_TO_PARSE_QUERY.

Consider centralizing the mapping here so any non-ServiceErrorException from parsing/transformation yields a consistent 400:

-    } catch (error) {
-        if (error instanceof SyntaxError) {
-            throw new ServiceErrorException({
-                statusCode: StatusCodes.BAD_REQUEST,
-                errorCode: ErrorCode.FAILED_TO_PARSE_QUERY,
-                message: `Failed to parse query "${query}" with message: ${error.message}`,
-            });
-        }
-        throw error;
-    }
+    } catch (error) {
+        if (error instanceof ServiceErrorException) {
+            // Already normalized elsewhere; just propagate.
+            throw error;
+        }
+
+        const message = error instanceof Error ? error.message : String(error);
+        throw new ServiceErrorException({
+            statusCode: StatusCodes.BAD_REQUEST,
+            errorCode: ErrorCode.FAILED_TO_PARSE_QUERY,
+            message: `Failed to parse query "${query}" with message: ${message}`,
+        });
+    }

Optionally, if you still want to distinguish “internal” errors, you could gate this on a dedicated QueryParseError subclass and throw that from the various validation branches.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export const parseQuerySyntaxIntoIR = async ({
query,
options,
prisma,
}: {
query: string,
options: {
isCaseSensitivityEnabled?: boolean;
isRegexEnabled?: boolean;
},
prisma: PrismaClient,
}): Promise<QueryIR> => {
try {
// First parse the query into a Lezer tree.
const tree = parser.parse(query);
// Then transform the tree into the intermediate representation.
return transformTreeToIR({
tree,
input: query,
isCaseSensitivityEnabled: options.isCaseSensitivityEnabled ?? false,
isRegexEnabled: options.isRegexEnabled ?? false,
onExpandSearchContext: async (contextName: string) => {
const context = await prisma.searchContext.findUnique({
where: {
name_orgId: {
name: contextName,
orgId: SINGLE_TENANT_ORG_ID,
}
},
include: {
repos: true,
}
});
if (!context) {
throw new Error(`Search context "${contextName}" not found`);
}
return context.repos.map((repo) => repo.name);
},
});
} catch (error) {
if (error instanceof SyntaxError) {
throw new ServiceErrorException({
statusCode: StatusCodes.BAD_REQUEST,
errorCode: ErrorCode.FAILED_TO_PARSE_QUERY,
message: `Failed to parse query "${query}" with message: ${error.message}`,
});
}
throw error;
}
export const parseQuerySyntaxIntoIR = async ({
query,
options,
prisma,
}: {
query: string,
options: {
isCaseSensitivityEnabled?: boolean;
isRegexEnabled?: boolean;
},
prisma: PrismaClient,
}): Promise<QueryIR> => {
try {
// First parse the query into a Lezer tree.
const tree = parser.parse(query);
// Then transform the tree into the intermediate representation.
return transformTreeToIR({
tree,
input: query,
isCaseSensitivityEnabled: options.isCaseSensitivityEnabled ?? false,
isRegexEnabled: options.isRegexEnabled ?? false,
onExpandSearchContext: async (contextName: string) => {
const context = await prisma.searchContext.findUnique({
where: {
name_orgId: {
name: contextName,
orgId: SINGLE_TENANT_ORG_ID,
}
},
include: {
repos: true,
}
});
if (!context) {
throw new Error(`Search context "${contextName}" not found`);
}
return context.repos.map((repo) => repo.name);
},
});
} catch (error) {
if (error instanceof ServiceErrorException) {
// Already normalized elsewhere; just propagate.
throw error;
}
const message = error instanceof Error ? error.message : String(error);
throw new ServiceErrorException({
statusCode: StatusCodes.BAD_REQUEST,
errorCode: ErrorCode.FAILED_TO_PARSE_QUERY,
message: `Failed to parse query "${query}" with message: ${message}`,
});
}
}
🤖 Prompt for AI Agents
packages/web/src/features/search/parser.ts around lines 55 to 107:
parsing/validation errors other than SyntaxError (e.g., plain Error from
transformTreeToIR/transformPrefixExpr) are allowed to bubble out as generic
500s; wrap any non-ServiceErrorException thrown during parsing/transform into a
ServiceErrorException with status BAD_REQUEST and
ErrorCode.FAILED_TO_PARSE_QUERY, preserving the original error message (and
optionally stack) so user-facing validation errors return a consistent 400, and
rethrow ServiceErrorException unchanged.

Comment on lines +45 to +47
logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);

if (!searchResponse.ok) {
return invalidZoektResponse(searchResponse);
}

const transformZoektSearchResponse = async ({ Result }: ZoektSearchResponse) => {
// @note (2025-05-12): in zoekt, repositories are identified by the `RepositoryID` field
// which corresponds to the `id` in the Repo table. In order to efficiently fetch repository
// metadata when transforming (potentially thousands) of file matches, we aggregate a unique
// set of repository ids* and map them to their corresponding Repo record.
//
// *Q: Why is `RepositoryID` optional? And why are we falling back to `Repository`?
// A: Prior to this change, the repository id was not plumbed into zoekt, so RepositoryID was
// always undefined. To make this a non-breaking change, we fallback to using the repository's name
// (`Repository`) as the identifier in these cases. This is not guaranteed to be unique, but in
// practice it is since the repository name includes the host and path (e.g., 'github.com/org/repo',
// 'gitea.com/org/repo', etc.).
//
// Note: When a repository is re-indexed (every hour) this ID will be populated.
// @see: https://github.com/sourcebot-dev/zoekt/pull/6
const repoIdentifiers = new Set(Result.Files?.map((file) => file.RepositoryID ?? file.Repository) ?? []);
const repos = new Map<string | number, Repo>();

(await prisma.repo.findMany({
where: {
id: {
in: Array.from(repoIdentifiers).filter((id) => typeof id === "number"),
},
orgId: org.id,
}
})).forEach(repo => repos.set(repo.id, repo));

(await prisma.repo.findMany({
where: {
name: {
in: Array.from(repoIdentifiers).filter((id) => typeof id === "string"),
},
orgId: org.id,
}
})).forEach(repo => repos.set(repo.name, repo));

const files = Result.Files?.map((file) => {
const fileNameChunks = file.ChunkMatches.filter((chunk) => chunk.FileName);

const webUrl = (() => {
const template: string | undefined = Result.RepoURLs[file.Repository];
if (!template) {
return undefined;
}

// If there are multiple branches pointing to the same revision of this file, it doesn't
// matter which branch we use here, so use the first one.
const branch = file.Branches && file.Branches.length > 0 ? file.Branches[0] : "HEAD";
return getFileWebUrl(template, branch, file.FileName);
})();

const identifier = file.RepositoryID ?? file.Repository;
const repo = repos.get(identifier);
return zoektSearch(zoektSearchRequest, prisma);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Avoid logging full Zoekt requests in production for privacy and performance

logger.debug(JSON.stringify(zoektSearchRequest, null, 2)) in both search and streamSearch will emit the full Zoekt request, including user search text and repo names. This can:

  • Leak potentially sensitive search terms and repo metadata into logs.
  • Produce very large log entries for complex queries.

Consider either:

  • Redacting/summarizing the request (e.g., log only top-level flags and counts), or
  • Gating full-request logging behind a dedicated debug flag or non-production check.
-        logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);
+        if (env.NODE_ENV !== 'production') {
+            logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);
+        }

(and similarly for zoektStreamSearchRequest), or introduce a more targeted debug switch if available.

Also applies to: 67-69



---

🏁 Script executed:

```shell
#!/bin/bash
# Inspect the searchApi.ts file to verify the logger.debug calls and surrounding context
cat packages/web/src/features/search/searchApi.ts | head -100

Length of output: 3235


🏁 Script executed:

#!/bin/bash
# Search for other debug logging patterns in the codebase to understand conventions
rg -nP 'logger\.(debug|log)\(' packages/web/src --type=ts -C2 -m 20

Length of output: 9804


🏁 Script executed:

#!/bin/bash
# Check for environment variable checks (NODE_ENV) in the codebase
rg -nP 'NODE_ENV|env\.NODE_ENV' packages/web/src --type=ts -C2

Length of output: 237


🏁 Script executed:

#!/bin/bash
# Check for env imports/exports in the file or nearby files
rg -nP 'import.*env|export.*env' packages/web/src/features/search --type=ts

Length of output: 226


Wrap debug logging to prevent sensitive search data and large log entries in production

The logger.debug() calls at lines 45 and 67 serialize the full zoektSearchRequest objects, which contain user search terms and repository metadata. In production, this:

  • Leaks potentially sensitive search queries and repo names into logs
  • Creates unnecessarily large log entries for complex queries

Since env is already imported in the file, wrap both debug statements with an environment check:

-        logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);
+        if (env.NODE_ENV !== 'production') {
+            logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);
+        }

Apply the same fix to line 67 (zoektStreamSearchRequest).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);
if (!searchResponse.ok) {
return invalidZoektResponse(searchResponse);
}
const transformZoektSearchResponse = async ({ Result }: ZoektSearchResponse) => {
// @note (2025-05-12): in zoekt, repositories are identified by the `RepositoryID` field
// which corresponds to the `id` in the Repo table. In order to efficiently fetch repository
// metadata when transforming (potentially thousands) of file matches, we aggregate a unique
// set of repository ids* and map them to their corresponding Repo record.
//
// *Q: Why is `RepositoryID` optional? And why are we falling back to `Repository`?
// A: Prior to this change, the repository id was not plumbed into zoekt, so RepositoryID was
// always undefined. To make this a non-breaking change, we fallback to using the repository's name
// (`Repository`) as the identifier in these cases. This is not guaranteed to be unique, but in
// practice it is since the repository name includes the host and path (e.g., 'github.com/org/repo',
// 'gitea.com/org/repo', etc.).
//
// Note: When a repository is re-indexed (every hour) this ID will be populated.
// @see: https://github.com/sourcebot-dev/zoekt/pull/6
const repoIdentifiers = new Set(Result.Files?.map((file) => file.RepositoryID ?? file.Repository) ?? []);
const repos = new Map<string | number, Repo>();
(await prisma.repo.findMany({
where: {
id: {
in: Array.from(repoIdentifiers).filter((id) => typeof id === "number"),
},
orgId: org.id,
}
})).forEach(repo => repos.set(repo.id, repo));
(await prisma.repo.findMany({
where: {
name: {
in: Array.from(repoIdentifiers).filter((id) => typeof id === "string"),
},
orgId: org.id,
}
})).forEach(repo => repos.set(repo.name, repo));
const files = Result.Files?.map((file) => {
const fileNameChunks = file.ChunkMatches.filter((chunk) => chunk.FileName);
const webUrl = (() => {
const template: string | undefined = Result.RepoURLs[file.Repository];
if (!template) {
return undefined;
}
// If there are multiple branches pointing to the same revision of this file, it doesn't
// matter which branch we use here, so use the first one.
const branch = file.Branches && file.Branches.length > 0 ? file.Branches[0] : "HEAD";
return getFileWebUrl(template, branch, file.FileName);
})();
const identifier = file.RepositoryID ?? file.Repository;
const repo = repos.get(identifier);
return zoektSearch(zoektSearchRequest, prisma);
if (env.NODE_ENV !== 'production') {
logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);
}
return zoektSearch(zoektSearchRequest, prisma);
🤖 Prompt for AI Agents
In packages/web/src/features/search/searchApi.ts around lines 45-47 (and
additionally apply to line 67), the code currently does
logger.debug(JSON.stringify(zoektSearchRequest,...)) which can leak sensitive
search terms and create huge logs; guard these debug calls with the existing env
check (only log when not in production, e.g. if (!env.isProduction /
env.NODE_ENV !== 'production')), and apply the same guard to the logger.debug
for zoektStreamSearchRequest at line 67; if desired, replace full JSON.stringify
with a redacted or minimal representation (search term and repo count) inside
the guarded block.

Comment on lines +66 to +106
opts: {
chunk_matches: true,
// @note: Zoekt has several different ways to limit a given search. The two that
// we care about are `MaxMatchDisplayCount` and `TotalMaxMatchCount`:
// - `MaxMatchDisplayCount` truncates the number of matches AFTER performing
// a search (specifically, after collating and sorting the results). The number of
// results returned by the API will be less than or equal to this value.
//
// - `TotalMaxMatchCount` truncates the number of matches DURING a search. The results
// returned by the API the API can be less than, equal to, or greater than this value.
// Why greater? Because this value is compared _after_ a given shard has finished
// being processed, the number of matches returned by the last shard may have exceeded
// this value.
//
// Let's define two variables:
// - `actualMatchCount` : The number of matches that are returned by the API. This is
// always less than or equal to `MaxMatchDisplayCount`.
// - `totalMatchCount` : The number of matches that zoekt found before it either
// 1) found all matches or 2) hit the `TotalMaxMatchCount` limit. This number is
// not bounded and can be less than, equal to, or greater than both `TotalMaxMatchCount`
// and `MaxMatchDisplayCount`.
//
//
// Our challenge is to determine whether or not the search returned all possible matches/
// (it was exaustive) or if it was truncated. By setting the `TotalMaxMatchCount` to
// `MaxMatchDisplayCount + 1`, we can determine which of these occurred by comparing
// `totalMatchCount` to `MaxMatchDisplayCount`.
//
// if (totalMatchCount ≤ actualMatchCount):
// Search is EXHAUSTIVE (found all possible matches)
// Proof: totalMatchCount ≤ MaxMatchDisplayCount < TotalMaxMatchCount
// Therefore Zoekt stopped naturally, not due to limit
//
// if (totalMatchCount > actualMatchCount):
// Search is TRUNCATED (more matches exist)
// Proof: totalMatchCount > MaxMatchDisplayCount + 1 = TotalMaxMatchCount
// Therefore Zoekt hit the limit and stopped searching
//
max_match_display_count: options.matches,
total_max_match_count: options.matches + 1,
num_context_lines: options.contextLines ?? 0,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

isSearchExhaustive condition is inverted in non-streaming search

The explanatory comment says exhaustiveness should be determined by comparing totalMatchCount to actualMatchCount:

  • Exhaustive when totalMatchCount <= actualMatchCount
  • Truncated when totalMatchCount > actualMatchCount

In zoektSearch, the check is currently:

isSearchExhaustive: stats.actualMatchCount <= stats.totalMatchCount,

This inequality will almost always be true (returned matches can’t exceed total matches), so truncated searches will be incorrectly reported as exhaustive. The streaming final response uses the correct direction:

isSearchExhaustive: accumulatedStats.totalMatchCount <= accumulatedStats.actualMatchCount,

I recommend flipping the condition in zoektSearch to align with the documented logic and the streaming path:

-            resolve({
-                stats,
-                files,
-                repositoryInfo,
-                isSearchExhaustive: stats.actualMatchCount <= stats.totalMatchCount,
-            } satisfies SearchResponse);
+            resolve({
+                stats,
+                files,
+                repositoryInfo,
+                isSearchExhaustive: stats.totalMatchCount <= stats.actualMatchCount,
+            } satisfies SearchResponse);

Also applies to: 136-137

🤖 Prompt for AI Agents
In packages/web/src/features/search/zoektSearcher.ts around lines 66-106 (and
also fix lines 136-137), the isSearchExhaustive comparison is inverted: change
the boolean to reflect "exhaustive when totalMatchCount <= actualMatchCount"
(i.e. return totalMatchCount <= actualMatchCount instead of actualMatchCount <=
totalMatchCount) so the non-streaming path matches the documented logic and the
streaming path; update both occurrences accordingly and run tests.

Comment on lines +118 to +139
export const zoektSearch = async (searchRequest: ZoektGrpcSearchRequest, prisma: PrismaClient): Promise<SearchResponse> => {
const client = createGrpcClient();
const metadata = new grpc.Metadata();

return new Promise((resolve, reject) => {
client.Search(searchRequest, metadata, async (error, response) => {
if (error || !response) {
reject(error || new Error('No response received'));
return;
}

const reposMapCache = await createReposMapForChunk(response, new Map<string | number, Repo>(), prisma);
const { stats, files, repositoryInfo } = await transformZoektSearchResponse(response, reposMapCache);

resolve({
stats,
files,
repositoryInfo,
isSearchExhaustive: stats.actualMatchCount <= stats.totalMatchCount,
} satisfies SearchResponse);
});
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Errors in async callback can leave zoektSearch’s Promise hanging

The client.Search callback is marked async and uses await, but thrown/rejected errors inside that callback (e.g. from Prisma in createReposMapForChunk or from transformZoektSearchResponse) won’t be wired to the outer Promise unless you manually reject them. In the current form, such failures can result in a Promise that never resolves or rejects.

Refactor the callback to keep it synchronous and move the async work into an IIFE with explicit try/catch:

 export const zoektSearch = async (searchRequest: ZoektGrpcSearchRequest, prisma: PrismaClient): Promise<SearchResponse> => {
   const client = createGrpcClient();
   const metadata = new grpc.Metadata();

   return new Promise((resolve, reject) => {
-    client.Search(searchRequest, metadata, async (error, response) => {
-      if (error || !response) {
-        reject(error || new Error('No response received'));
-        return;
-      }
-
-      const reposMapCache = await createReposMapForChunk(response, new Map<string | number, Repo>(), prisma);
-      const { stats, files, repositoryInfo } = await transformZoektSearchResponse(response, reposMapCache);
-
-      resolve({
-        stats,
-        files,
-        repositoryInfo,
-        isSearchExhaustive: stats.actualMatchCount <= stats.totalMatchCount,
-      } satisfies SearchResponse);
-    });
+    client.Search(searchRequest, metadata, (error, response) => {
+      if (error || !response) {
+        reject(error || new Error('No response received'));
+        return;
+      }
+
+      (async () => {
+        try {
+          const reposMapCache = await createReposMapForChunk(
+            response,
+            new Map<string | number, Repo>(),
+            prisma,
+          );
+          const { stats, files, repositoryInfo } = await transformZoektSearchResponse(
+            response,
+            reposMapCache,
+          );
+
+          resolve({
+            stats,
+            files,
+            repositoryInfo,
+            isSearchExhaustive: stats.totalMatchCount <= stats.actualMatchCount,
+          } satisfies SearchResponse);
+        } catch (err) {
+          reject(err);
+        }
+      })();
+    });
   });
 }

This ensures all downstream failures either resolve or reject the Promise instead of silently leaking.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In packages/web/src/features/search/zoektSearcher.ts around lines 118 to 139,
the grpc callback is declared async so any thrown/rejected errors from awaited
calls (createReposMapForChunk, transformZoektSearchResponse) are not propagated
to the outer Promise and can leave it hanging; change the client.Search callback
to a synchronous function and immediately invoke an async IIFE inside it that
wraps the awaited work in try/catch, calling resolve(...) on success and
reject(err) on any thrown/rejected error, and ensure the original
error/no-response branch still rejects and returns so the Promise is always
settled.

Comment on lines +379 to +413
export const getCodeHostBrowseFileAtBranchUrl = ({
webUrl,
codeHostType,
branchName,
filePath,
}: {
webUrl?: string | null,
codeHostType: CodeHostType,
branchName: string,
filePath: string,
}) => {
if (!webUrl) {
return undefined;
}

switch (codeHostType) {
case 'github':
return `${webUrl}/blob/${branchName}/${filePath}`;
case 'gitlab':
return `${webUrl}/-/blob/${branchName}/${filePath}`;
case 'gitea':
return `${webUrl}/src/branch/${branchName}/${filePath}`;
case 'azuredevops':
return `${webUrl}?path=${filePath}&version=${branchName}`;
case 'bitbucketCloud':
return `${webUrl}/src/${branchName}/${filePath}`;
case 'bitbucketServer':
return `${webUrl}/browse/${filePath}?at=${branchName}`;
case 'gerrit':
return `${webUrl}/+/${branchName}/${filePath}`;
case 'genericGitHost':
return undefined;

}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

Verify URL patterns for code host file browsing.

The new function follows a consistent pattern with getCodeHostBrowseAtBranchUrl. However, please verify:

  1. Azure DevOps URL format: The pattern ?path={file}&version={branch} differs from the branch-only pattern. Confirm this matches Azure DevOps's file browsing URL structure.
  2. File path encoding: Consider whether filePath should be URL-encoded, especially for paths containing spaces or special characters (e.g., encodeURIComponent(filePath) or proper path segment encoding).

Test with a few sample file paths containing spaces or special characters to ensure the URLs work correctly across different code hosts.


🏁 Script executed:

#!/bin/bash
# Check if Azure DevOps and other code hosts handle file paths with special characters correctly
# Note: This is a manual verification prompt - actual testing should be done in a browser

echo "Sample file paths to test:"
echo "1. src/components/Button.tsx"
echo "2. src/utils/file name with spaces.ts"
echo "3. src/test/special-chars-#@!.js"
echo ""
echo "Verify these URL patterns work:"
echo "- Azure DevOps: {webUrl}?path={file}&version={branch}"
echo "- GitHub: {webUrl}/blob/{branch}/{file}"
echo "- GitLab: {webUrl}/-/blob/{branch}/{file}"

Length of output: 295


I need to verify the concerns raised in the review comment. Let me search for information about the correct URL formats for these code hosting platforms and best practices for URL encoding file paths.

Now let me search for more specific information about how different code hosts handle file paths in URLs, and check if there are URL encoding utilities in the codebase.

Now let me examine the codebase to see if URL encoding is already being handled or if this is indeed a missing concern.

[/run_scripts]

I'll also search for any URL encoding utilities in the codebase:



[/run_scripts]


🏁 Script executed:

#!/bin/bash
# Search for the function and its usage context
rg -n "getCodeHostBrowseFileAtBranchUrl" packages/web/src --context=5

Length of output: 2471


🏁 Script executed:

#!/bin/bash
# Look for encodeURIComponent usage and URL encoding patterns
rg -n "encodeURIComponent|encodeURI|URLEncode" packages/web/src --type=ts --type=tsx | head -20

Length of output: 27


Let me retry the URL encoding search with corrected syntax:

[/run_scripts]

Let me also check where `fileName` comes from in the zoektSearcher:



[/run_scripts]


🏁 Script executed:

#!/bin/bash
# Look for encodeURIComponent usage and URL encoding patterns
rg -n "encodeURIComponent|encodeURI" packages/web/src

Length of output: 1341


🏁 Script executed:

#!/bin/bash
# Check context around fileName variable in zoektSearcher.ts
rg -n "fileName" packages/web/src/features/search/zoektSearcher.ts -B5 -A2 | head -40

Length of output: 1618


Apply URL encoding to file paths across all code host URL patterns.

The review concerns are valid. File paths from Zoekt search results are raw and unencoded, but are directly interpolated into URLs without encoding. This will break URLs containing spaces or special characters (e.g., file name with spaces.ts, special-#@!.js).

The codebase already uses encodeURIComponent() elsewhere for similar URL component encoding (e.g., packages/web/src/app/[domain]/browse/hooks/utils.ts:85). Apply this pattern consistently:

  • Wrap all filePath and branchName parameters with encodeURIComponent() in each case statement
  • Example: ${webUrl}/blob/${encodeURIComponent(branchName)}/${encodeURIComponent(filePath)} for GitHub
  • Apply to all six code host patterns: GitHub, GitLab, Gitea, Azure DevOps, Bitbucket Cloud/Server, and Gerrit
🤖 Prompt for AI Agents
In packages/web/src/lib/utils.ts around lines 379 to 413, filePath (and
branchName where used in path components) is inserted into host URLs unencoded
which breaks URLs containing spaces or special characters; update each case to
wrap branchName and filePath (where they appear in the path or query values)
with encodeURIComponent() — e.g. for GitHub use
`${webUrl}/blob/${encodeURIComponent(branchName)}/${encodeURIComponent(filePath)}`,
apply the same encoding pattern for GitLab, Gitea, Azure DevOps (encode query
values), Bitbucket Cloud/Server, and Gerrit, leaving genericGitHost behavior
unchanged.

Comment on lines +24 to +26
type SubtypeConstructor<Constructor extends new (...args: any) => any, Subtype> = {
new(...args: ConstructorParameters<Constructor>): Subtype;
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

Verify usage of SubtypeConstructor helper type.

The SubtypeConstructor type is defined but not used within this file. Please verify that it's actually used elsewhere in the codebase.


🏁 Script executed:

#!/bin/bash
# Description: Search for usages of SubtypeConstructor in the codebase

# Search for imports and usage of SubtypeConstructor
rg -n --type=ts --type=tsx 'SubtypeConstructor' -g '!packages/web/src/proto/query.ts'

Length of output: 27


🏁 Script executed:

#!/bin/bash
# Description: Search for usages of SubtypeConstructor in the codebase
# Using glob patterns to match both .ts and .tsx files

# Search for SubtypeConstructor in all TypeScript files (excluding query.ts)
rg -n 'SubtypeConstructor' --glob '!packages/web/src/proto/query.ts' -g '*.ts' -g '*.tsx'

Length of output: 452


🏁 Script executed:

#!/bin/bash
# Description: Check imports in webserver.ts to understand SubtypeConstructor relationship

# Show imports/top of webserver.ts
head -60 packages/web/src/proto/webserver.ts | grep -n -A5 -B5 'SubtypeConstructor\|import'

# Also check if query.ts exports SubtypeConstructor explicitly
echo "=== Checking query.ts exports ==="
head -30 packages/web/src/proto/query.ts | grep -n -A2 -B2 'export\|SubtypeConstructor'

Length of output: 8133


Remove unused SubtypeConstructor type definition.

Verification confirms that SubtypeConstructor at lines 24-26 in packages/web/src/proto/query.ts is unused. While webserver.ts defines and uses an identical helper type, it maintains its own independent definition and does not import from query.ts. Delete this unused type from query.ts.

🤖 Prompt for AI Agents
In packages/web/src/proto/query.ts around lines 24 to 26, remove the unused
TypeScript type alias SubtypeConstructor (the constructor-to-subtype helper)
because it's not referenced anywhere in this file; delete the three-line type
declaration and ensure no imports/exports depend on it so the file compiles
cleanly after removal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants