-
Notifications
You must be signed in to change notification settings - Fork 178
feat(web): Streamed code search #623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis PR introduces a Lezer-based query language parser, converts the search API from string-based queries to intermediate representation (IR), implements streaming search via SSE, adds Zoekt gRPC proto definitions, updates search UI for regex and case-sensitivity flags, and refactors Prisma permission filtering to account-based scoping. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant SearchPage as SearchPage
participant useStreamedSearch as useStreamedSearch
participant SearchAPI as /api/stream_search
participant zoektSearcher as zoektSearcher
participant ZoektGRPC as Zoekt gRPC
Client->>SearchPage: Enter query + flags
SearchPage->>useStreamedSearch: query, isRegexEnabled, isCaseSensitivityEnabled
alt Cache Hit
useStreamedSearch-->>SearchPage: Return cached results
else Cache Miss
useStreamedSearch->>SearchAPI: POST (SSE)
SearchAPI->>SearchAPI: Parse query → IR (parseQuerySyntaxIntoIR)
SearchAPI->>zoektSearcher: createZoektSearchRequest(IR)
zoektSearcher->>ZoektGRPC: StreamSearch RPC
ZoektGRPC-->>zoektSearcher: Stream chunk
zoektSearcher->>zoektSearcher: Transform chunk + enrich repos
zoektSearcher-->>SearchAPI: data: {files, stats, ...}
SearchAPI-->>useStreamedSearch: SSE chunk
useStreamedSearch->>useStreamedSearch: Update state
useStreamedSearch-->>SearchPage: Partial results
ZoektGRPC-->>zoektSearcher: Final/Error
zoektSearcher-->>SearchAPI: Final/Error response
SearchAPI-->>useStreamedSearch: SSE final
useStreamedSearch->>useStreamedSearch: Cache results
useStreamedSearch-->>SearchPage: Final results
end
SearchPage->>SearchPage: Render results
sequenceDiagram
participant Input as Query Input
participant Lezer as Lezer Parser
participant transformTreeToIR as transformTreeToIR
participant expandContext as expandSearchContext
participant IROutput as QueryIR Output
Input->>Lezer: "file:test.ts content:bug -archived:yes"
Lezer-->>transformTreeToIR: Parse tree (PrefixExpr, Term, NegateExpr, etc.)
transformTreeToIR->>transformTreeToIR: Walk tree recursively
alt PrefixExpr
transformTreeToIR->>transformTreeToIR: Extract prefix type (FileExpr, ContentExpr, etc.)
transformTreeToIR->>transformTreeToIR: Build IR node (regexp, substring, raw_config, etc.)
else Term
transformTreeToIR->>transformTreeToIR: Check isRegexEnabled / isCaseSensitivityEnabled
transformTreeToIR->>transformTreeToIR: Create regexp or substring query
else ContextExpr
transformTreeToIR->>expandContext: Resolve context name
expandContext-->>transformTreeToIR: Repository IDs
transformTreeToIR->>transformTreeToIR: Build repo_set query
end
transformTreeToIR-->>IROutput: QueryIR (And / Or / Not / PrefixExpr nodes)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
c137be2 to
aad3507
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Nitpick comments (28)
packages/web/src/app/[domain]/components/pathHeader.tsx (1)
236-236: LGTM! Clean branch name display.The regex transformation correctly strips Git's symbolic reference prefixes (
refs/heads/orrefs/tags/) to display user-friendly branch names. The implementation is safe and follows standard Git UI conventions.Recommend extracting to a utility function—this pattern is already duplicated.
Verification found the same transformation in
packages/web/src/app/[domain]/repos/components/repoBranchesTable.tsx(line 39). Consider extracting to a shared utility to reduce duplication:// In lib/utils.ts or similar export const formatBranchDisplayName = (branchName: string): string => { return branchName.replace(/^refs\/(heads|tags)\//, ''); };Update both locations:
packages/web/src/app/[domain]/components/pathHeader.tsx(line 236)packages/web/src/app/[domain]/repos/components/repoBranchesTable.tsx(line 39)packages/mcp/src/schemas.ts (1)
32-35: Consider validating the query field.The
queryfield accepts any string including empty strings. Depending on how queries are processed, you may want to ensure non-empty queries or add appropriate validation.If empty queries should be rejected, apply this diff:
export const searchRequestSchema = z.object({ - query: z.string(), // The zoekt query to execute. + query: z.string().min(1), // The zoekt query to execute. ...searchOptionsSchema.shape, });packages/queryLanguage/tsconfig.json (1)
21-21: Consider broadening the include pattern.The current configuration only includes
src/index.ts, which means TypeScript will only compile files transitively imported from the index. Generated parser files insrc/parser/may not be fully type-checked if they contain exports that aren't used.Consider changing to:
- "include": ["src/index.ts"], + "include": ["src/**/*.ts"],This ensures all TypeScript files, including generated parser code, are properly type-checked during compilation.
packages/queryLanguage/src/query.grammar (1)
23-30: Negation currently cannot target bare terms (only prefixes/paren groups)
NegateExpronly acceptsnegatebeforePrefixExprorParenExpr, so something like-foowon’t ever become aNegateExprnode; it’ll only work if you always express exclusions as-file:...,-repo:..., or-(...). If the intended UX is that-fooexcludes results matching a plain term, consider broadening this rule:-NegateExpr { !negate negate (PrefixExpr | ParenExpr) } +NegateExpr { !negate negate expr }This would allow negation of any
expr(including bareTerm) while still supporting prefixes and grouped subexpressions.packages/queryLanguage/test/prefixes.txt (1)
1-335: Prefix test coverage is strong; consider a few more operator combinationsThis suite hits a wide range of prefix/value shapes (short forms, wildcards, regexy patterns, invalid values), which is great. If you expect users to mix prefixes heavily with
orand negation, it could be worth adding a couple of cases like:
file:test.js or repo:myproject-file:test.js lang:typescript(file:*.ts or file:*.tsx) lang:typescriptto lock in parse shapes for those combinations.
packages/queryLanguage/test/basic.txt (1)
1-71: Solid basic coverage; a couple of extra lexical edge cases might helpThese cases do a nice job exercising plain terms, adjacency, and regex-like patterns. If you want to harden the lexer further, you could optionally add inputs like:
order or or1(ensure only standaloneorbecomes the operator)-foo bar(whatever behavior you intend around leading dashes)hello_or_worldto document how the grammar treats borderline operator/word collisions.
packages/queryLanguage/test.ts (1)
1-46: Turn this into an automated test and use the public parser entrypointThe pretty‑print/reconstruction logic is fine for a quick sanity check, but as a committed file it’s more useful if:
- It’s wired into your actual test runner (e.g., moved under
test/with an assertion instead of console logging).- It imports the parser from the package’s public surface (e.g.,
./src/indexor the published module name) rather than the generated./src/parser, so it exercises what consumers will use.Doing that would turn this from a one‑off debug script into a stable regression test for the grammar.
packages/queryLanguage/src/tokens.ts (1)
27-58: Remove duplicate comment.Lines 28 and 32 contain identical comments. Remove the duplicate to improve code clarity.
Apply this diff:
// Check if followed by a prefix keyword (by checking for keyword followed by colon) -// Look ahead until we hit a delimiter or colon const checkPos = input.pos; let foundColon = false; // Look ahead until we hit a delimiter or colon while (ch >= 0) {
Consider clarifying EOF handling.
The lookahead loop condition
ch >= 0(Line 33) correctly handles EOF (which returns a negative value in Lezer), but this behavior isn't documented. Consider adding a comment to make the EOF handling explicit.Example:
// Look ahead until we hit a delimiter or colon -while (ch >= 0) { +while (ch >= 0) { // ch < 0 indicates EOFpackages/web/src/features/search/zoektSearcher.ts (6)
171-318: Streaming lifecycle mostly robust; consider explicit guards around controller usageThe SSE streaming implementation correctly:
- Tracks in-flight chunks (
pendingChunks) and defers closing until bothisStreamActiveis false andpendingChunks === 0.- Pauses/resumes the gRPC stream to avoid completing before async processing is done.
- Converts processing errors into typed
errorchunks rather than tearing down the stream abruptly.Two small robustness considerations:
Post-error gRPC stream lifetime
In the per-chunkcatch,isStreamActiveis set tofalseand an error chunk is enqueued, but the underlying gRPC stream is neither cancelled nor closed explicitly. You do close the client viatryCloseController, so this is likely fine, but explicitly cancelling the stream can free resources sooner:} catch (error) { // ... isStreamActive = false;
- grpcStream?.cancel();
}2. **Cancel callback and finalization** The `cancel()` handler sets `isStreamActive = false` and calls `grpcStream.cancel()`/`client.close()`, but doesn’t invoke `tryCloseController`. That’s acceptable (consumer has cancelled), but if you want symmetric behavior (e.g., always emit a final “done” marker on server-side aborts), consider calling `tryCloseController()` once in-flight chunks drain. Both are optional refinements; the current implementation should behave correctly for the common happy-path and error-path cases. --- `329-368`: **Per-chunk repo lookup and caching are solid; consider minor ergonomics** The repo resolution logic per chunk behaves well: - Per-request cache (`_reposMapCache`) avoids redundant DB hits across chunks. - Per-chunk `reposMap` limits lookups to just the repositories actually present in that response. - Numeric vs string ids are handled via `getRepoIdForFile`. Two optional improvements: - You could avoid the extra `reposMap` Map and just reuse the shared cache for lookups, since you already filter by presence before transforming files, but the current separation is clear and not problematic. - If Prisma throws (e.g., transient DB issues), that error will bubble into `zoektStreamSearch`’s per-chunk `catch`, which is good; combined with the fix suggested for `zoektSearch`, error propagation is consistent across both search modes. No required changes here. --- `370-446`: **Strict “repo not found” behavior can hard-fail searches** In `transformZoektSearchResponse`, if a file’s repository can’t be found in the DB/cache, you throw: ```ts if (!repo) { throw new Error(`Repository not found for file: ${file.file_name}`); }Given out-of-sync states between Zoekt indices and your Repo table are possible (e.g., repo deleted in DB but still indexed, temporary ingestion lag), this will hard-fail the whole search.
Consider a more forgiving behavior, e.g.:
- Skip such files and continue, or
- Emit them with a sentinel repo name/id and surface a warning elsewhere.
Example adjustment:
- if (!repo) { - throw new Error(`Repository not found for file: ${file.file_name}`); - } + if (!repo) { + // Skip files whose repositories are missing in the DB to avoid failing the entire search. + return undefined; + } ... - }).filter(file => file !== undefined); + }).filter((file): file is SearchResultFile => file !== undefined);This keeps the search resilient to transient data inconsistencies.
448-480: Double-check units and semantics for duration/wait andflushReasonThe stats mapping looks coherent, but relies on external Zoekt semantics:
duration,wait,matchTree*fields are populated from the.nanospart only. If Zoekt ever returns non-zeroseconds, these will under-report timings.flushReasonis set viaresponse.stats?.flush_reason?.toString()and then used as a string; this matches thez.string()type, but you might want to standardize on the enum values directly (no.toString()) for easier downstream handling.Both are minor and can be left as-is, but worth verifying against Zoekt’s current
SearchStatsdefinition and the behavior of your gRPC/proto loader.
544-572: Stats accumulation across stream chunks may overcount depending on Zoekt semantics
accumulateStatscurrently adds all numeric fields (includingtotalMatchCount,fileCount, duration fields, etc.) across chunks, and keeps the first non-unknownflushReason.This is correct only if Zoekt’s streamed
SearchStatsare per-chunk deltas. If Zoekt instead sends cumulative stats (“so far”) for each flush, summing will overcount and could:
- Inflate
totalMatchCount,fileCount, etc.- Distort
isSearchExhaustivein the streaming final response, which comparesaccumulatedStats.totalMatchCounttoaccumulatedStats.actualMatchCount.If Zoekt’s stats are cumulative, the accumulator should instead take the last stats object (or max per field) rather than summing.
Please confirm from the Zoekt docs or implementation whether
SearchStatsin streamed responses are deltas or cumulative, and adjustaccumulateStatsaccordingly.
513-541: gRPC client construction: URL parsing and path robustnessOverall the client setup is clear and matches the generated types. Two small robustness points:
Port extraction from
ZOEKT_WEBSERVER_URLIf the URL omits an explicit port,
zoektUrl.portwill be'', leading to an address likehostname:. You might want a fallback:
- const grpcAddress =
${zoektUrl.hostname}:${zoektUrl.port};
- const port = zoektUrl.port || (zoektUrl.protocol === 'https:' ? '443' : '80');
- const grpcAddress =
${zoektUrl.hostname}:${port};2. **Proto path based on `process.cwd()`** Using `process.cwd()` with `'../../vendor/zoekt/grpc/protos'` assumes a specific working-directory layout at runtime. If this ever runs from a different cwd (e.g., Next.js production server root), proto loading will fail. Consider anchoring paths via a config/env var, or document the assumption clearly. Both are defensive improvements; the current code is fine if your deployment environment guarantees cwd and URL shape. </blockquote></details> <details> <summary>packages/web/src/features/search/types.ts (1)</summary><blockquote> `33-56`: **SearchStats shape matches Zoekt stats mapping** The `searchStatsSchema` fields and comments mirror the Zoekt `SearchStats` fields you populate in `transformZoektSearchResponse`. That alignment is good. Optional improvement: if you want stronger validation, you could constrain these to non-negative integers, e.g.: ```ts actualMatchCount: z.number().int().nonnegative(),for the count fields and similar for the byte counters, while keeping simple
z.number()for timing values if they might be fractional in the future.packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx (1)
72-78: Cosmetic dependency array reordering.The dependency array for
branchDisplayNamewas reordered from[isBranchFilteringEnabled, branches]to[branches, isBranchFilteringEnabled]. While this doesn't affect React's memoization behavior, it may be for consistency with a linting rule or code style.packages/web/src/features/search/fileSourceApi.ts (1)
8-38: Validate IR shape and regex escaping forrepository/fileNameThe
QueryIRshape here looks reasonable, but bothrepo.regexpand the fileregexp.regexpare being populated directly fromrepositoryandfileName. If these values can contain regex metacharacters (e.g.+,?,[], etc.), this will change the match semantics compared to a literal path/repo match and could even produce invalid patterns, depending on Zoekt’s expectations.If the intent is “exact repo” and “exact file path”:
- Consider normalizing both to anchored, escaped patterns, e.g.
^<escapedRepo>$/^<escapedPath>$, or- Use an IR helper that treats these as literal names rather than raw regexes (if such an abstraction exists in your new IR layer).
This is especially important for
getFileSource, which is effectively a “fetch exact file” path rather than a fuzzy search.Also applies to: 41-47
packages/web/src/app/[domain]/search/components/filterPanel/index.tsx (1)
15-20:onFilterChange/isStreamingintegration looks solidThe new props are plumbed cleanly:
onFilterChangeis only invoked when query params actually change, andisStreamingis consistently passed down to bothFiltercomponents. This should make it straightforward for the parent to react to filter changes during streaming without introducing extra renders.If you find the repo/language
onEntryClickedblocks diverging over time, consider extracting the “toggle and sync URL + fireonFilterChange” logic into a small shared helper, but that’s optional.Also applies to: 36-38, 42-44, 137-187
packages/web/src/prisma.ts (1)
3-3: Repo permission filter extraction looks correct; consider saferwherecompositionThe refactor to accept
user?: UserWithAccountsand delegate repo scoping togetRepoPermissionFilterForUserkeeps the semantics clear: repo queries under the feature flag are constrained to repositories that are either permitted to one of the user’s accounts or public, with unauthenticated/no-account users seeing only public repos.One nuance to be aware of: in
repo.$allOperations, the new filter is merged via:argsWithWhere.where = { ...(argsWithWhere.where || {}), ...getRepoPermissionFilterForUser(user), };If callers ever provide their own
ORinargsWithWhere.where, this spread will overwrite it with the permission filter’sOR. Today that may be fine (or pre-existing), but if you expect more complex repo predicates in the future, it might be safer to wrap these in an explicitAND: [originalWhere, permissionFilter]instead.Not a blocker, but worth keeping in mind as you expand repo query usage.
Also applies to: 35-52, 61-85
packages/web/src/app/[domain]/components/searchBar/searchBar.tsx (1)
95-104: Consider syncing regex/case flags from updated defaults likequery
defaultIsRegexEnabled/defaultIsCaseSensitivityEnabledare only used as initial state; if the parent updatesdefaults(e.g. when navigating back/forward and rebuilding search params), the toggles won’t follow, whilequeryis explicitly resynced viauseEffect. If the intent is for the UI to always reflect the URL/props, consider adding similar effects for the flags:useEffect(() => { setIsRegexEnabled(defaultIsRegexEnabled); }, [defaultIsRegexEnabled]); useEffect(() => { setIsCaseSensitivityEnabled(defaultIsCaseSensitivityEnabled); }, [defaultIsCaseSensitivityEnabled]);Or confirm that flags are intentionally “sticky” across navigation and don’t need to follow
defaults.Also applies to: 113-115, 119-133
packages/web/src/app/[domain]/search/useStreamedSearch.ts (2)
9-35: Cache key & entry shape are tightly coupled to currentSearchRequestand omitstatsTwo small things to consider:
createCacheKeymanually lists a subset ofSearchRequestfields. IfSearchRequestlater grows with result-affecting fields (e.g. repo scope, domain, filters), the cache will start cross-contaminating queries unless this function is updated in lockstep. You might want to either:
- Base the key on the entire
SearchRequestobject (e.g.JSON.stringify(params)), or- Add a unit test that fails when
SearchRequestchanges without updatingcreateCacheKey.
CacheEntrydoesn’t storestats, and the cachedsetStatepath also omitsstats, so repeated identical queries served from cache will never havestatspopulated even if the original streamed run did. If the UI relies onstats, consider adding it toCacheEntryand wiring it through in both the cache write and read paths.Also applies to: 91-106, 233-245
152-231: SSE parsing assumes single-linedata:events and only breaks inner loop on[DONE]The SSE loop works for the current “
data: <JSON>\n\nper event” shape, but it has a few assumptions:
message.match(/^data: (.+)$/)only handles a singledata:line per event and ignores any additional lines/fields. If the server ever emits multi-linedata:payloads or other SSE fields, this parser will start dropping information.- When encountering
[DONE], youbreakonly the innerfor (const message of messages)loop; the outerwhilecontinues reading until the stream naturally ends. If the server keeps the SSE connection open after sending[DONE], this hook would never finalize/search-cache until the connection closes.If you expect to rely on a
[DONE]sentinel, consider:
- Tracking a
doneStreamingflag that breaks the outer loop when[DONE]is seen, and- Optionally making the parsing a bit more tolerant to additional
data:lines if the backend ever evolves.packages/web/src/features/codeNav/api.ts (3)
23-45: Consider escapingsymbolNamein the IR regexp to avoid unintended regex behaviorBoth reference and definition queries build a regexp around
symbolNamewith\\b${symbolName}\\b, butsymbolNameis interpolated raw. If it can contain regex metacharacters (e.g.,.,*,+,?,[]), this will change the meaning of the pattern and could either over‑match or fail to match the intended symbol.If you want literal symbol lookup, consider escaping the name before interpolation, e.g.:
+const escapeRegex = (value: string) => + value.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); - regexp: `\\b${symbolName}\\b`, + regexp: `\\b${escapeRegex(symbolName)}\\b`,If, instead, you intentionally rely on regex semantics here, it would be good to document that assumption and add tests for symbols containing regex‑special characters.
Also applies to: 74-88
116-140: Simplified response parsing is good; considermapinstead offlatMaponfilesThe new
parseRelatedSymbolsSearchResponsethat directly mapsSearchResponse→FindRelatedSymbolsResponseis clear, and the innerchunks.flatMap(...matchRanges...)with a final.filter(file => file.matches.length > 0)makes sense.Minor nit:
searchResult.files.flatMap((file) => { return { ... } })always returns a single object per file, soflatMapbehaves likemaphere and is a bit surprising to readers. Switching tomapwould better express intent without behavior change:- files: searchResult.files.flatMap((file) => { + files: searchResult.files.map((file) => { // ... - }).filter((file) => file.matches.length > 0), + }).filter((file) => file.matches.length > 0),
143-181: Language IR filter expansion looks good; just confirm language values match Zoekt’s language namesThe
getExpandedLanguageFilterhelper that builds anorover TypeScript/JavaScript/JSX/TSX (and otherwise a singlelanguagenode) is a nice improvement over string‑based filters. Please just ensure thelanguagestrings here exactly match the language identifiers emitted by your indexer/Zoekt configuration so the filters don’t silently miss matches.packages/web/src/features/search/parser.ts (1)
126-203: Unknown AST node types defaulting toconst: truecan silently broaden queriesIn
transformTreeToIR’sdefaultbranch, unknown node types log a warning and return{ const: true, query: "const" }, effectively “match all”. That’s safe in the sense of not throwing, but if the grammar evolves and new nodes appear, this will silently broaden user queries instead of failing fast.Consider failing closed by throwing (or converting to a parse error as per the previous comment), so unsupported syntax doesn’t produce unexpectedly wide result sets.
packages/web/src/app/[domain]/search/components/searchResultsPage.tsx (2)
80-86: Factor out error message construction to keep UI and toast in syncBoth the toast and inline error message compute the same ternary expression for
ServiceErrorExceptionvs genericError. To keep things DRY and ensure they never diverge, you could precomputeerrorMessageonce:- useEffect(() => { - if (error) { - toast({ - description: `❌ Search failed. Reason: ${error instanceof ServiceErrorException ? error.serviceError.message : error.message}`, - }); - } - }, [error, toast]); + const errorMessage = + error instanceof ServiceErrorException + ? error.serviceError.message + : error?.message; + + useEffect(() => { + if (errorMessage) { + toast({ + description: `❌ Search failed. Reason: ${errorMessage}`, + }); + } + }, [errorMessage, toast]); @@ - <p className="text-sm text-center">{error instanceof ServiceErrorException ? error.serviceError.message : error.message}</p> + <p className="text-sm text-center">{errorMessage}</p>This keeps the error handling consistent and slightly simplifies the JSX.
Also applies to: 184-189
105-114: Consider gatingconsole.debugtiming logs to non‑production environmentsThe
console.debugcalls fortimeToFirstSearchResultMsandtimeToSearchCompletionMsare useful during development, but may add noise in production consoles.If you don’t rely on them in prod, consider wrapping them in an environment check or removing them once metrics are fully wired into analytics:
- console.debug('timeToFirstSearchResultMs:', timeToFirstSearchResultMs); - console.debug('timeToSearchCompletionMs:', timeToSearchCompletionMs); + if (process.env.NODE_ENV !== 'production') { + console.debug('timeToFirstSearchResultMs:', timeToFirstSearchResultMs); + console.debug('timeToSearchCompletionMs:', timeToSearchCompletionMs); + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
yarn.lockis excluded by!**/yarn.lock,!**/*.lock
📒 Files selected for processing (107)
.env.development(0 hunks)CHANGELOG.md(1 hunks)Dockerfile(5 hunks)package.json(2 hunks)packages/backend/src/index.ts(0 hunks)packages/db/src/index.ts(1 hunks)packages/mcp/CHANGELOG.md(1 hunks)packages/mcp/src/index.ts(1 hunks)packages/mcp/src/schemas.ts(2 hunks)packages/queryLanguage/.gitignore(1 hunks)packages/queryLanguage/package.json(1 hunks)packages/queryLanguage/src/index.ts(1 hunks)packages/queryLanguage/src/parser.terms.ts(1 hunks)packages/queryLanguage/src/parser.ts(1 hunks)packages/queryLanguage/src/query.grammar(1 hunks)packages/queryLanguage/src/tokens.ts(1 hunks)packages/queryLanguage/test.ts(1 hunks)packages/queryLanguage/test/basic.txt(1 hunks)packages/queryLanguage/test/grammar.test.ts(1 hunks)packages/queryLanguage/test/grouping.txt(1 hunks)packages/queryLanguage/test/negation.txt(1 hunks)packages/queryLanguage/test/operators.txt(1 hunks)packages/queryLanguage/test/precedence.txt(1 hunks)packages/queryLanguage/test/prefixes.txt(1 hunks)packages/queryLanguage/test/quoted.txt(1 hunks)packages/queryLanguage/tsconfig.json(1 hunks)packages/queryLanguage/vitest.config.ts(1 hunks)packages/web/.eslintignore(1 hunks)packages/web/package.json(3 hunks)packages/web/src/actions.ts(1 hunks)packages/web/src/app/[domain]/browse/layout.tsx(1 hunks)packages/web/src/app/[domain]/components/lightweightCodeHighlighter.tsx(1 hunks)packages/web/src/app/[domain]/components/pathHeader.tsx(1 hunks)packages/web/src/app/[domain]/components/searchBar/constants.ts(1 hunks)packages/web/src/app/[domain]/components/searchBar/searchBar.tsx(5 hunks)packages/web/src/app/[domain]/components/searchBar/searchSuggestionsBox.tsx(4 hunks)packages/web/src/app/[domain]/components/searchBar/useRefineModeSuggestions.ts(1 hunks)packages/web/src/app/[domain]/components/searchBar/useSuggestionModeMappings.ts(1 hunks)packages/web/src/app/[domain]/components/searchBar/useSuggestionsData.ts(1 hunks)packages/web/src/app/[domain]/components/searchBar/zoektLanguageExtension.ts(1 hunks)packages/web/src/app/[domain]/search/components/codePreviewPanel/codePreview.tsx(1 hunks)packages/web/src/app/[domain]/search/components/codePreviewPanel/index.tsx(1 hunks)packages/web/src/app/[domain]/search/components/filterPanel/filter.tsx(3 hunks)packages/web/src/app/[domain]/search/components/filterPanel/index.tsx(5 hunks)packages/web/src/app/[domain]/search/components/filterPanel/useFilterMatches.ts(1 hunks)packages/web/src/app/[domain]/search/components/searchResultsPage.tsx(9 hunks)packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatch.tsx(1 hunks)packages/web/src/app/[domain]/search/components/searchResultsPanel/fileMatchContainer.tsx(2 hunks)packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx(7 hunks)packages/web/src/app/[domain]/search/page.tsx(2 hunks)packages/web/src/app/[domain]/search/useStreamedSearch.ts(1 hunks)packages/web/src/app/api/(client)/client.ts(1 hunks)packages/web/src/app/api/(server)/search/route.ts(2 hunks)packages/web/src/app/api/(server)/source/route.ts(1 hunks)packages/web/src/app/api/(server)/stream_search/route.ts(1 hunks)packages/web/src/ee/features/codeNav/components/exploreMenu/referenceList.tsx(1 hunks)packages/web/src/ee/features/codeNav/components/symbolHoverPopup/symbolDefinitionPreview.tsx(1 hunks)packages/web/src/ee/features/codeNav/components/symbolHoverPopup/useHoveredOverSymbolInfo.ts(1 hunks)packages/web/src/features/agents/review-agent/nodes/fetchFileContent.ts(1 hunks)packages/web/src/features/chat/tools.ts(5 hunks)packages/web/src/features/codeNav/api.ts(4 hunks)packages/web/src/features/codeNav/types.ts(1 hunks)packages/web/src/features/search/README.md(1 hunks)packages/web/src/features/search/fileSourceApi.ts(1 hunks)packages/web/src/features/search/index.ts(1 hunks)packages/web/src/features/search/ir.ts(1 hunks)packages/web/src/features/search/parser.ts(1 hunks)packages/web/src/features/search/schemas.ts(0 hunks)packages/web/src/features/search/searchApi.ts(1 hunks)packages/web/src/features/search/types.ts(1 hunks)packages/web/src/features/search/zoektClient.ts(0 hunks)packages/web/src/features/search/zoektSchema.ts(0 hunks)packages/web/src/features/search/zoektSearcher.ts(1 hunks)packages/web/src/lib/errorCodes.ts(1 hunks)packages/web/src/lib/extensions/searchResultHighlightExtension.ts(1 hunks)packages/web/src/lib/posthogEvents.ts(2 hunks)packages/web/src/lib/types.ts(1 hunks)packages/web/src/lib/utils.ts(1 hunks)packages/web/src/prisma.ts(4 hunks)packages/web/src/proto/google/protobuf/Duration.ts(1 hunks)packages/web/src/proto/google/protobuf/Timestamp.ts(1 hunks)packages/web/src/proto/query.ts(1 hunks)packages/web/src/proto/webserver.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/And.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/Boost.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/Branch.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/BranchRepos.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/BranchesRepos.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/ChunkMatch.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/FileMatch.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/FileNameSet.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/FlushReason.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/IndexMetadata.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/Language.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/LineFragmentMatch.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/LineMatch.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/ListOptions.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/ListRequest.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/ListResponse.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/Location.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/MinimalRepoListEntry.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/Not.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/Or.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/Progress.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/Q.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/Range.ts(1 hunks)packages/web/src/proto/zoekt/webserver/v1/RawConfig.ts(1 hunks)
⛔ Files not processed due to max files limit (21)
- packages/web/src/proto/zoekt/webserver/v1/Regexp.ts
- packages/web/src/proto/zoekt/webserver/v1/Repo.ts
- packages/web/src/proto/zoekt/webserver/v1/RepoIds.ts
- packages/web/src/proto/zoekt/webserver/v1/RepoListEntry.ts
- packages/web/src/proto/zoekt/webserver/v1/RepoRegexp.ts
- packages/web/src/proto/zoekt/webserver/v1/RepoSet.ts
- packages/web/src/proto/zoekt/webserver/v1/RepoStats.ts
- packages/web/src/proto/zoekt/webserver/v1/Repository.ts
- packages/web/src/proto/zoekt/webserver/v1/RepositoryBranch.ts
- packages/web/src/proto/zoekt/webserver/v1/SearchOptions.ts
- packages/web/src/proto/zoekt/webserver/v1/SearchRequest.ts
- packages/web/src/proto/zoekt/webserver/v1/SearchResponse.ts
- packages/web/src/proto/zoekt/webserver/v1/Stats.ts
- packages/web/src/proto/zoekt/webserver/v1/StreamSearchRequest.ts
- packages/web/src/proto/zoekt/webserver/v1/StreamSearchResponse.ts
- packages/web/src/proto/zoekt/webserver/v1/Substring.ts
- packages/web/src/proto/zoekt/webserver/v1/Symbol.ts
- packages/web/src/proto/zoekt/webserver/v1/SymbolInfo.ts
- packages/web/src/proto/zoekt/webserver/v1/Type.ts
- packages/web/src/proto/zoekt/webserver/v1/WebserverService.ts
- packages/web/src/withAuthV2.ts
💤 Files with no reviewable changes (5)
- packages/backend/src/index.ts
- .env.development
- packages/web/src/features/search/zoektClient.ts
- packages/web/src/features/search/schemas.ts
- packages/web/src/features/search/zoektSchema.ts
🧰 Additional context used
🪛 Biome (2.1.2)
packages/web/src/features/search/ir.ts
[error] 4-4: Do not shadow the global "Symbol" property.
Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.
(lint/suspicious/noShadowRestrictedNames)
🪛 LanguageTool
packages/queryLanguage/test/negation.txt
[grammar] ~193-~193: Use a hyphen to join words.
Context: ...PrefixExpr(FileExpr))))) # Negate short form prefix -f:test.js ==> Program(Ne...
(QB_NEW_EN_HYPHEN)
[grammar] ~209-~209: Use a hyphen to join words.
Context: ...r(PrefixExpr(RepoExpr))) # Negate short form content -c:console ==> Program(N...
(QB_NEW_EN_HYPHEN)
packages/queryLanguage/test/basic.txt
[typographical] ~10-~10: This greeting should probably end with a comma.
Context: ... ==> Program(Term) # Multiple terms hello world ==> Program(AndExpr(Term,Term)) # Mu...
(EN_GREETING_WITHOUT_COMMA)
packages/queryLanguage/test/quoted.txt
[grammar] ~465-~465: Use a hyphen to join words.
Context: ...m,PrefixExpr(FileExpr))) # Quoted short form prefix "f:test" ==> Program(Term...
(QB_NEW_EN_HYPHEN)
packages/queryLanguage/test/operators.txt
[grammar] ~97-~97: Use a hyphen to join words.
Context: ...ixExpr(FileExpr),Term)) # OR with short form prefixes f:test.js or r:myrepo ==...
(QB_NEW_EN_HYPHEN)
| export const searchOptionsSchema = z.object({ | ||
| matches: z.number(), // The number of matches to return. | ||
| contextLines: z.number().optional(), // The number of context lines to return. | ||
| whole: z.boolean().optional(), // Whether to return the whole file as part of the response. | ||
| isRegexEnabled: z.boolean().optional(), // Whether to enable regular expression search. | ||
| isCaseSensitivityEnabled: z.boolean().optional(), // Whether to enable case sensitivity. | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add validation constraints to numeric fields.
The matches field accepts any number, including negative, zero, or unreasonably large values, which could lead to unexpected behavior or resource issues downstream. Similarly, contextLines lacks constraints when provided.
Apply this diff to add sensible constraints:
export const searchOptionsSchema = z.object({
- matches: z.number(), // The number of matches to return.
- contextLines: z.number().optional(), // The number of context lines to return.
+ matches: z.number().int().positive(), // The number of matches to return.
+ contextLines: z.number().int().nonnegative().optional(), // The number of context lines to return.
whole: z.boolean().optional(), // Whether to return the whole file as part of the response.
isRegexEnabled: z.boolean().optional(), // Whether to enable regular expression search.
isCaseSensitivityEnabled: z.boolean().optional(), // Whether to enable case sensitivity.
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export const searchOptionsSchema = z.object({ | |
| matches: z.number(), // The number of matches to return. | |
| contextLines: z.number().optional(), // The number of context lines to return. | |
| whole: z.boolean().optional(), // Whether to return the whole file as part of the response. | |
| isRegexEnabled: z.boolean().optional(), // Whether to enable regular expression search. | |
| isCaseSensitivityEnabled: z.boolean().optional(), // Whether to enable case sensitivity. | |
| }); | |
| export const searchOptionsSchema = z.object({ | |
| matches: z.number().int().positive(), // The number of matches to return. | |
| contextLines: z.number().int().nonnegative().optional(), // The number of context lines to return. | |
| whole: z.boolean().optional(), // Whether to return the whole file as part of the response. | |
| isRegexEnabled: z.boolean().optional(), // Whether to enable regular expression search. | |
| isCaseSensitivityEnabled: z.boolean().optional(), // Whether to enable case sensitivity. | |
| }); |
🤖 Prompt for AI Agents
In packages/mcp/src/schemas.ts around lines 24 to 30, the numeric fields lack
validation constraints: `matches` currently allows negative, zero, or huge
numbers and `contextLines` has no bounds when present. Change `matches` to an
integer with a sensible lower and upper bound (e.g., .int().min(1).max(100)) and
change `contextLines` to an optional integer with a non-negative upper bound
(e.g., .optional().int().min(0).max(50)); keep other fields unchanged and ensure
the schema rejects non-integers and out-of-range values.
| "start": "next start", | ||
| "lint": "cross-env SKIP_ENV_VALIDATION=1 eslint .", | ||
| "test": "cross-env SKIP_ENV_VALIDATION=1 vitest", | ||
| "generate:protos": "proto-loader-gen-types --includeComments --longs=Number --enums=String --defaults --oneofs --grpcLib=@grpc/grpc-js --keepCase --includeDirs=../../vendor/zoekt/grpc/protos --outDir=src/proto zoekt/webserver/v1/webserver.proto zoekt/webserver/v1/query.proto", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify proto file paths and script correctness.
The generate:protos script references proto files at ../../vendor/zoekt/grpc/protos. Ensure these files exist at:
vendor/zoekt/grpc/protos/zoekt/webserver/v1/webserver.protovendor/zoekt/grpc/protos/zoekt/webserver/v1/query.proto
The script itself appears well-configured with appropriate flags for TypeScript generation.
🏁 Script executed:
#!/bin/bash
# Verify proto files exist at the expected paths
echo "Checking for proto files..."
fd -t f "webserver.proto|query.proto" vendor/zoekt/grpc/protos/Length of output: 140
🏁 Script executed:
#!/bin/bash
# Search for proto files anywhere in the repository
echo "=== Searching for proto files in entire repo ==="
find . -name "webserver.proto" -o -name "query.proto" 2>/dev/null
echo ""
echo "=== Checking if vendor directory exists ==="
ls -la vendor/ 2>&1 | head -20
echo ""
echo "=== Checking if zoekt directory exists anywhere ==="
find . -type d -name "zoekt" 2>/dev/null | head -20Length of output: 358
🏁 Script executed:
#!/bin/bash
# Check vendor/zoekt structure
echo "=== Contents of vendor/zoekt ==="
find vendor/zoekt -type f 2>/dev/null | head -30
echo ""
echo "=== Contents of vendor/zoekt (tree view) ==="
ls -laR vendor/zoekt/ 2>&1 | head -50
echo ""
echo "=== Check .gitignore for proto file patterns ==="
rg -i "proto|vendor" .gitignore 2>/dev/nullLength of output: 250
Proto files referenced in the script do not exist in the repository.
The generate:protos script references source proto files that are missing:
- Expected path:
vendor/zoekt/grpc/protos/zoekt/webserver/v1/ - Expected files:
webserver.proto,query.proto - Actual state: The
vendor/zoekt/directory exists but is empty; proto files are not present anywhere in the repository
This will cause the script to fail when executed. Either:
- Add the missing proto files to
vendor/zoekt/grpc/protos/zoekt/webserver/v1/ - Update the script to reference the correct proto file location
- Remove the script if proto generation is not needed for this PR
🤖 Prompt for AI Agents
In packages/web/package.json around line 11, the "generate:protos" npm script
points at proto files that are not present in the repo
(vendor/zoekt/grpc/protos/zoekt/webserver/v1/{webserver.proto,query.proto}); fix
by either (A) adding the missing proto files to
vendor/zoekt/grpc/protos/zoekt/webserver/v1/ in this PR, (B) updating the script
to point to the actual proto location in the repo (or an external fetch step) so
the referenced files exist at runtime, or (C) remove the generate:protos script
entirely if proto generation is not required for this change — ensure whichever
option you pick is reflected in package.json and any CI scripts that call it.
| const { | ||
| scrollOffset: restoreOffset, | ||
| measurementsCache: restoreMeasurementsCache, | ||
| showAllMatchesStates: restoreShowAllMatchesStates, | ||
| showAllMatchesMap: restoreShowAllMatchesStates, | ||
| } = history.state as ScrollHistoryState; | ||
|
|
||
| const [showAllMatchesStates, setShowAllMatchesStates] = useState(restoreShowAllMatchesStates || Array(fileMatches.length).fill(false)); | ||
| const [showAllMatchesMap, showAllMatchesActions] = useMap<string, boolean>(restoreShowAllMatchesStates || []); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against history.state being null on initial load
const { scrollOffset, measurementsCache, showAllMatchesMap } = history.state as ScrollHistoryState; will throw at runtime when history.state is null/undefined (which is the default before any replaceState), since the destructuring happens unguarded.
Recommend defaulting to an empty object before casting so the initial render is safe:
- const {
- scrollOffset: restoreOffset,
- measurementsCache: restoreMeasurementsCache,
- showAllMatchesMap: restoreShowAllMatchesStates,
- } = history.state as ScrollHistoryState;
+ const {
+ scrollOffset: restoreOffset,
+ measurementsCache: restoreMeasurementsCache,
+ showAllMatchesMap: restoreShowAllMatchesStates,
+ } = (history.state ?? {}) as ScrollHistoryState;🤖 Prompt for AI Agents
In packages/web/src/app/[domain]/search/components/searchResultsPanel/index.tsx
around lines 55 to 62, the code destructures history.state directly which will
throw when history.state is null/undefined on initial load; change the
destructuring to use a safe default object before casting (e.g., use
(history.state ?? {}) as ScrollHistoryState) so
scrollOffset/measurementsCache/showAllMatchesMap default to undefined rather
than causing a runtime error, and keep passing restoreShowAllMatchesStates || []
into useMap to preserve the existing fallback.
| import { Q as QueryIR } from '@/proto/zoekt/webserver/v1/Q'; | ||
| import { RawConfig } from '@/proto/zoekt/webserver/v1/RawConfig'; | ||
| import { Regexp } from '@/proto/zoekt/webserver/v1/Regexp'; | ||
| import { Symbol } from '@/proto/zoekt/webserver/v1/Symbol'; | ||
| import { Language } from '@/proto/zoekt/webserver/v1/Language'; | ||
| import { Repo } from '@/proto/zoekt/webserver/v1/Repo'; | ||
| import { RepoRegexp } from '@/proto/zoekt/webserver/v1/RepoRegexp'; | ||
| import { BranchesRepos } from '@/proto/zoekt/webserver/v1/BranchesRepos'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename imported Symbol to avoid shadowing the global Symbol (lint error)
The import import { Symbol } from '@/proto/zoekt/webserver/v1/Symbol'; shadows the built‑in global Symbol, which static analysis correctly flags. While it works at runtime, it’s confusing and will keep lint failing.
You can fix this by renaming the import and updating the isSymbolQuery type guard:
-import { Symbol } from '@/proto/zoekt/webserver/v1/Symbol';
+import { Symbol as ZoektSymbol } from '@/proto/zoekt/webserver/v1/Symbol';
@@
-export const isSymbolQuery = (query: QueryIR): query is QueryIR & { symbol: Symbol } => query.symbol != null;
+export const isSymbolQuery = (query: QueryIR): query is QueryIR & { symbol: ZoektSymbol } => query.symbol != null;Also applies to: 27-28
🧰 Tools
🪛 Biome (2.1.2)
[error] 4-4: Do not shadow the global "Symbol" property.
Consider renaming this variable. It's easy to confuse the origin of variables when they're named after a known global.
(lint/suspicious/noShadowRestrictedNames)
🤖 Prompt for AI Agents
In packages/web/src/features/search/ir.ts around lines 1 to 8 (and also apply
changes at lines 27-28), the imported name "Symbol" shadows the global Symbol
causing lint errors; rename the import (e.g., import { Symbol as ProtoSymbol }
from '@/proto/zoekt/webserver/v1/Symbol') and update all uses and the
isSymbolQuery type guard to refer to the renamed identifier (ProtoSymbol) so the
global Symbol is not shadowed and the type guard compiles/lints cleanly.
| export const parseQuerySyntaxIntoIR = async ({ | ||
| query, | ||
| options, | ||
| prisma, | ||
| }: { | ||
| query: string, | ||
| options: { | ||
| isCaseSensitivityEnabled?: boolean; | ||
| isRegexEnabled?: boolean; | ||
| }, | ||
| prisma: PrismaClient, | ||
| }): Promise<QueryIR> => { | ||
|
|
||
| try { | ||
| // First parse the query into a Lezer tree. | ||
| const tree = parser.parse(query); | ||
|
|
||
| // Then transform the tree into the intermediate representation. | ||
| return transformTreeToIR({ | ||
| tree, | ||
| input: query, | ||
| isCaseSensitivityEnabled: options.isCaseSensitivityEnabled ?? false, | ||
| isRegexEnabled: options.isRegexEnabled ?? false, | ||
| onExpandSearchContext: async (contextName: string) => { | ||
| const context = await prisma.searchContext.findUnique({ | ||
| where: { | ||
| name_orgId: { | ||
| name: contextName, | ||
| orgId: SINGLE_TENANT_ORG_ID, | ||
| } | ||
| }, | ||
| include: { | ||
| repos: true, | ||
| } | ||
| }); | ||
|
|
||
| if (!context) { | ||
| throw new Error(`Search context "${contextName}" not found`); | ||
| } | ||
|
|
||
| return context.repos.map((repo) => repo.name); | ||
| }, | ||
| }); | ||
| } catch (error) { | ||
| if (error instanceof SyntaxError) { | ||
| throw new ServiceErrorException({ | ||
| statusCode: StatusCodes.BAD_REQUEST, | ||
| errorCode: ErrorCode.FAILED_TO_PARSE_QUERY, | ||
| message: `Failed to parse query "${query}" with message: ${error.message}`, | ||
| }); | ||
| } | ||
| throw error; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normalize all parse/validation failures to a consistent ServiceErrorException
parseQuerySyntaxIntoIR only maps SyntaxError to ServiceErrorException, but transformTreeToIR and transformPrefixExpr throw plain Error for many user-facing validation issues (e.g., invalid visibility:, archived:, fork: values, missing colons, unknown prefixes, missing search contexts). Those will currently bubble out as generic 500s instead of a structured BAD_REQUEST with FAILED_TO_PARSE_QUERY.
Consider centralizing the mapping here so any non-ServiceErrorException from parsing/transformation yields a consistent 400:
- } catch (error) {
- if (error instanceof SyntaxError) {
- throw new ServiceErrorException({
- statusCode: StatusCodes.BAD_REQUEST,
- errorCode: ErrorCode.FAILED_TO_PARSE_QUERY,
- message: `Failed to parse query "${query}" with message: ${error.message}`,
- });
- }
- throw error;
- }
+ } catch (error) {
+ if (error instanceof ServiceErrorException) {
+ // Already normalized elsewhere; just propagate.
+ throw error;
+ }
+
+ const message = error instanceof Error ? error.message : String(error);
+ throw new ServiceErrorException({
+ statusCode: StatusCodes.BAD_REQUEST,
+ errorCode: ErrorCode.FAILED_TO_PARSE_QUERY,
+ message: `Failed to parse query "${query}" with message: ${message}`,
+ });
+ }Optionally, if you still want to distinguish “internal” errors, you could gate this on a dedicated QueryParseError subclass and throw that from the various validation branches.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export const parseQuerySyntaxIntoIR = async ({ | |
| query, | |
| options, | |
| prisma, | |
| }: { | |
| query: string, | |
| options: { | |
| isCaseSensitivityEnabled?: boolean; | |
| isRegexEnabled?: boolean; | |
| }, | |
| prisma: PrismaClient, | |
| }): Promise<QueryIR> => { | |
| try { | |
| // First parse the query into a Lezer tree. | |
| const tree = parser.parse(query); | |
| // Then transform the tree into the intermediate representation. | |
| return transformTreeToIR({ | |
| tree, | |
| input: query, | |
| isCaseSensitivityEnabled: options.isCaseSensitivityEnabled ?? false, | |
| isRegexEnabled: options.isRegexEnabled ?? false, | |
| onExpandSearchContext: async (contextName: string) => { | |
| const context = await prisma.searchContext.findUnique({ | |
| where: { | |
| name_orgId: { | |
| name: contextName, | |
| orgId: SINGLE_TENANT_ORG_ID, | |
| } | |
| }, | |
| include: { | |
| repos: true, | |
| } | |
| }); | |
| if (!context) { | |
| throw new Error(`Search context "${contextName}" not found`); | |
| } | |
| return context.repos.map((repo) => repo.name); | |
| }, | |
| }); | |
| } catch (error) { | |
| if (error instanceof SyntaxError) { | |
| throw new ServiceErrorException({ | |
| statusCode: StatusCodes.BAD_REQUEST, | |
| errorCode: ErrorCode.FAILED_TO_PARSE_QUERY, | |
| message: `Failed to parse query "${query}" with message: ${error.message}`, | |
| }); | |
| } | |
| throw error; | |
| } | |
| export const parseQuerySyntaxIntoIR = async ({ | |
| query, | |
| options, | |
| prisma, | |
| }: { | |
| query: string, | |
| options: { | |
| isCaseSensitivityEnabled?: boolean; | |
| isRegexEnabled?: boolean; | |
| }, | |
| prisma: PrismaClient, | |
| }): Promise<QueryIR> => { | |
| try { | |
| // First parse the query into a Lezer tree. | |
| const tree = parser.parse(query); | |
| // Then transform the tree into the intermediate representation. | |
| return transformTreeToIR({ | |
| tree, | |
| input: query, | |
| isCaseSensitivityEnabled: options.isCaseSensitivityEnabled ?? false, | |
| isRegexEnabled: options.isRegexEnabled ?? false, | |
| onExpandSearchContext: async (contextName: string) => { | |
| const context = await prisma.searchContext.findUnique({ | |
| where: { | |
| name_orgId: { | |
| name: contextName, | |
| orgId: SINGLE_TENANT_ORG_ID, | |
| } | |
| }, | |
| include: { | |
| repos: true, | |
| } | |
| }); | |
| if (!context) { | |
| throw new Error(`Search context "${contextName}" not found`); | |
| } | |
| return context.repos.map((repo) => repo.name); | |
| }, | |
| }); | |
| } catch (error) { | |
| if (error instanceof ServiceErrorException) { | |
| // Already normalized elsewhere; just propagate. | |
| throw error; | |
| } | |
| const message = error instanceof Error ? error.message : String(error); | |
| throw new ServiceErrorException({ | |
| statusCode: StatusCodes.BAD_REQUEST, | |
| errorCode: ErrorCode.FAILED_TO_PARSE_QUERY, | |
| message: `Failed to parse query "${query}" with message: ${message}`, | |
| }); | |
| } | |
| } |
🤖 Prompt for AI Agents
packages/web/src/features/search/parser.ts around lines 55 to 107:
parsing/validation errors other than SyntaxError (e.g., plain Error from
transformTreeToIR/transformPrefixExpr) are allowed to bubble out as generic
500s; wrap any non-ServiceErrorException thrown during parsing/transform into a
ServiceErrorException with status BAD_REQUEST and
ErrorCode.FAILED_TO_PARSE_QUERY, preserving the original error message (and
optionally stack) so user-facing validation errors return a consistent 400, and
rethrow ServiceErrorException unchanged.
| logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`); | ||
|
|
||
| if (!searchResponse.ok) { | ||
| return invalidZoektResponse(searchResponse); | ||
| } | ||
|
|
||
| const transformZoektSearchResponse = async ({ Result }: ZoektSearchResponse) => { | ||
| // @note (2025-05-12): in zoekt, repositories are identified by the `RepositoryID` field | ||
| // which corresponds to the `id` in the Repo table. In order to efficiently fetch repository | ||
| // metadata when transforming (potentially thousands) of file matches, we aggregate a unique | ||
| // set of repository ids* and map them to their corresponding Repo record. | ||
| // | ||
| // *Q: Why is `RepositoryID` optional? And why are we falling back to `Repository`? | ||
| // A: Prior to this change, the repository id was not plumbed into zoekt, so RepositoryID was | ||
| // always undefined. To make this a non-breaking change, we fallback to using the repository's name | ||
| // (`Repository`) as the identifier in these cases. This is not guaranteed to be unique, but in | ||
| // practice it is since the repository name includes the host and path (e.g., 'github.com/org/repo', | ||
| // 'gitea.com/org/repo', etc.). | ||
| // | ||
| // Note: When a repository is re-indexed (every hour) this ID will be populated. | ||
| // @see: https://github.com/sourcebot-dev/zoekt/pull/6 | ||
| const repoIdentifiers = new Set(Result.Files?.map((file) => file.RepositoryID ?? file.Repository) ?? []); | ||
| const repos = new Map<string | number, Repo>(); | ||
|
|
||
| (await prisma.repo.findMany({ | ||
| where: { | ||
| id: { | ||
| in: Array.from(repoIdentifiers).filter((id) => typeof id === "number"), | ||
| }, | ||
| orgId: org.id, | ||
| } | ||
| })).forEach(repo => repos.set(repo.id, repo)); | ||
|
|
||
| (await prisma.repo.findMany({ | ||
| where: { | ||
| name: { | ||
| in: Array.from(repoIdentifiers).filter((id) => typeof id === "string"), | ||
| }, | ||
| orgId: org.id, | ||
| } | ||
| })).forEach(repo => repos.set(repo.name, repo)); | ||
|
|
||
| const files = Result.Files?.map((file) => { | ||
| const fileNameChunks = file.ChunkMatches.filter((chunk) => chunk.FileName); | ||
|
|
||
| const webUrl = (() => { | ||
| const template: string | undefined = Result.RepoURLs[file.Repository]; | ||
| if (!template) { | ||
| return undefined; | ||
| } | ||
|
|
||
| // If there are multiple branches pointing to the same revision of this file, it doesn't | ||
| // matter which branch we use here, so use the first one. | ||
| const branch = file.Branches && file.Branches.length > 0 ? file.Branches[0] : "HEAD"; | ||
| return getFileWebUrl(template, branch, file.FileName); | ||
| })(); | ||
|
|
||
| const identifier = file.RepositoryID ?? file.Repository; | ||
| const repo = repos.get(identifier); | ||
| return zoektSearch(zoektSearchRequest, prisma); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Avoid logging full Zoekt requests in production for privacy and performance
logger.debug(JSON.stringify(zoektSearchRequest, null, 2)) in both search and streamSearch will emit the full Zoekt request, including user search text and repo names. This can:
- Leak potentially sensitive search terms and repo metadata into logs.
- Produce very large log entries for complex queries.
Consider either:
- Redacting/summarizing the request (e.g., log only top-level flags and counts), or
- Gating full-request logging behind a dedicated debug flag or non-production check.
- logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);
+ if (env.NODE_ENV !== 'production') {
+ logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);
+ }(and similarly for zoektStreamSearchRequest), or introduce a more targeted debug switch if available.
Also applies to: 67-69
---
🏁 Script executed:
```shell
#!/bin/bash
# Inspect the searchApi.ts file to verify the logger.debug calls and surrounding context
cat packages/web/src/features/search/searchApi.ts | head -100
Length of output: 3235
🏁 Script executed:
#!/bin/bash
# Search for other debug logging patterns in the codebase to understand conventions
rg -nP 'logger\.(debug|log)\(' packages/web/src --type=ts -C2 -m 20Length of output: 9804
🏁 Script executed:
#!/bin/bash
# Check for environment variable checks (NODE_ENV) in the codebase
rg -nP 'NODE_ENV|env\.NODE_ENV' packages/web/src --type=ts -C2Length of output: 237
🏁 Script executed:
#!/bin/bash
# Check for env imports/exports in the file or nearby files
rg -nP 'import.*env|export.*env' packages/web/src/features/search --type=tsLength of output: 226
Wrap debug logging to prevent sensitive search data and large log entries in production
The logger.debug() calls at lines 45 and 67 serialize the full zoektSearchRequest objects, which contain user search terms and repository metadata. In production, this:
- Leaks potentially sensitive search queries and repo names into logs
- Creates unnecessarily large log entries for complex queries
Since env is already imported in the file, wrap both debug statements with an environment check:
- logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);
+ if (env.NODE_ENV !== 'production') {
+ logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`);
+ }Apply the same fix to line 67 (zoektStreamSearchRequest).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`); | |
| if (!searchResponse.ok) { | |
| return invalidZoektResponse(searchResponse); | |
| } | |
| const transformZoektSearchResponse = async ({ Result }: ZoektSearchResponse) => { | |
| // @note (2025-05-12): in zoekt, repositories are identified by the `RepositoryID` field | |
| // which corresponds to the `id` in the Repo table. In order to efficiently fetch repository | |
| // metadata when transforming (potentially thousands) of file matches, we aggregate a unique | |
| // set of repository ids* and map them to their corresponding Repo record. | |
| // | |
| // *Q: Why is `RepositoryID` optional? And why are we falling back to `Repository`? | |
| // A: Prior to this change, the repository id was not plumbed into zoekt, so RepositoryID was | |
| // always undefined. To make this a non-breaking change, we fallback to using the repository's name | |
| // (`Repository`) as the identifier in these cases. This is not guaranteed to be unique, but in | |
| // practice it is since the repository name includes the host and path (e.g., 'github.com/org/repo', | |
| // 'gitea.com/org/repo', etc.). | |
| // | |
| // Note: When a repository is re-indexed (every hour) this ID will be populated. | |
| // @see: https://github.com/sourcebot-dev/zoekt/pull/6 | |
| const repoIdentifiers = new Set(Result.Files?.map((file) => file.RepositoryID ?? file.Repository) ?? []); | |
| const repos = new Map<string | number, Repo>(); | |
| (await prisma.repo.findMany({ | |
| where: { | |
| id: { | |
| in: Array.from(repoIdentifiers).filter((id) => typeof id === "number"), | |
| }, | |
| orgId: org.id, | |
| } | |
| })).forEach(repo => repos.set(repo.id, repo)); | |
| (await prisma.repo.findMany({ | |
| where: { | |
| name: { | |
| in: Array.from(repoIdentifiers).filter((id) => typeof id === "string"), | |
| }, | |
| orgId: org.id, | |
| } | |
| })).forEach(repo => repos.set(repo.name, repo)); | |
| const files = Result.Files?.map((file) => { | |
| const fileNameChunks = file.ChunkMatches.filter((chunk) => chunk.FileName); | |
| const webUrl = (() => { | |
| const template: string | undefined = Result.RepoURLs[file.Repository]; | |
| if (!template) { | |
| return undefined; | |
| } | |
| // If there are multiple branches pointing to the same revision of this file, it doesn't | |
| // matter which branch we use here, so use the first one. | |
| const branch = file.Branches && file.Branches.length > 0 ? file.Branches[0] : "HEAD"; | |
| return getFileWebUrl(template, branch, file.FileName); | |
| })(); | |
| const identifier = file.RepositoryID ?? file.Repository; | |
| const repo = repos.get(identifier); | |
| return zoektSearch(zoektSearchRequest, prisma); | |
| if (env.NODE_ENV !== 'production') { | |
| logger.debug(`zoektSearchRequest:\n${JSON.stringify(zoektSearchRequest, null, 2)}`); | |
| } | |
| return zoektSearch(zoektSearchRequest, prisma); |
🤖 Prompt for AI Agents
In packages/web/src/features/search/searchApi.ts around lines 45-47 (and
additionally apply to line 67), the code currently does
logger.debug(JSON.stringify(zoektSearchRequest,...)) which can leak sensitive
search terms and create huge logs; guard these debug calls with the existing env
check (only log when not in production, e.g. if (!env.isProduction /
env.NODE_ENV !== 'production')), and apply the same guard to the logger.debug
for zoektStreamSearchRequest at line 67; if desired, replace full JSON.stringify
with a redacted or minimal representation (search term and repo count) inside
the guarded block.
| opts: { | ||
| chunk_matches: true, | ||
| // @note: Zoekt has several different ways to limit a given search. The two that | ||
| // we care about are `MaxMatchDisplayCount` and `TotalMaxMatchCount`: | ||
| // - `MaxMatchDisplayCount` truncates the number of matches AFTER performing | ||
| // a search (specifically, after collating and sorting the results). The number of | ||
| // results returned by the API will be less than or equal to this value. | ||
| // | ||
| // - `TotalMaxMatchCount` truncates the number of matches DURING a search. The results | ||
| // returned by the API the API can be less than, equal to, or greater than this value. | ||
| // Why greater? Because this value is compared _after_ a given shard has finished | ||
| // being processed, the number of matches returned by the last shard may have exceeded | ||
| // this value. | ||
| // | ||
| // Let's define two variables: | ||
| // - `actualMatchCount` : The number of matches that are returned by the API. This is | ||
| // always less than or equal to `MaxMatchDisplayCount`. | ||
| // - `totalMatchCount` : The number of matches that zoekt found before it either | ||
| // 1) found all matches or 2) hit the `TotalMaxMatchCount` limit. This number is | ||
| // not bounded and can be less than, equal to, or greater than both `TotalMaxMatchCount` | ||
| // and `MaxMatchDisplayCount`. | ||
| // | ||
| // | ||
| // Our challenge is to determine whether or not the search returned all possible matches/ | ||
| // (it was exaustive) or if it was truncated. By setting the `TotalMaxMatchCount` to | ||
| // `MaxMatchDisplayCount + 1`, we can determine which of these occurred by comparing | ||
| // `totalMatchCount` to `MaxMatchDisplayCount`. | ||
| // | ||
| // if (totalMatchCount ≤ actualMatchCount): | ||
| // Search is EXHAUSTIVE (found all possible matches) | ||
| // Proof: totalMatchCount ≤ MaxMatchDisplayCount < TotalMaxMatchCount | ||
| // Therefore Zoekt stopped naturally, not due to limit | ||
| // | ||
| // if (totalMatchCount > actualMatchCount): | ||
| // Search is TRUNCATED (more matches exist) | ||
| // Proof: totalMatchCount > MaxMatchDisplayCount + 1 = TotalMaxMatchCount | ||
| // Therefore Zoekt hit the limit and stopped searching | ||
| // | ||
| max_match_display_count: options.matches, | ||
| total_max_match_count: options.matches + 1, | ||
| num_context_lines: options.contextLines ?? 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isSearchExhaustive condition is inverted in non-streaming search
The explanatory comment says exhaustiveness should be determined by comparing totalMatchCount to actualMatchCount:
- Exhaustive when
totalMatchCount <= actualMatchCount - Truncated when
totalMatchCount > actualMatchCount
In zoektSearch, the check is currently:
isSearchExhaustive: stats.actualMatchCount <= stats.totalMatchCount,This inequality will almost always be true (returned matches can’t exceed total matches), so truncated searches will be incorrectly reported as exhaustive. The streaming final response uses the correct direction:
isSearchExhaustive: accumulatedStats.totalMatchCount <= accumulatedStats.actualMatchCount,I recommend flipping the condition in zoektSearch to align with the documented logic and the streaming path:
- resolve({
- stats,
- files,
- repositoryInfo,
- isSearchExhaustive: stats.actualMatchCount <= stats.totalMatchCount,
- } satisfies SearchResponse);
+ resolve({
+ stats,
+ files,
+ repositoryInfo,
+ isSearchExhaustive: stats.totalMatchCount <= stats.actualMatchCount,
+ } satisfies SearchResponse);Also applies to: 136-137
🤖 Prompt for AI Agents
In packages/web/src/features/search/zoektSearcher.ts around lines 66-106 (and
also fix lines 136-137), the isSearchExhaustive comparison is inverted: change
the boolean to reflect "exhaustive when totalMatchCount <= actualMatchCount"
(i.e. return totalMatchCount <= actualMatchCount instead of actualMatchCount <=
totalMatchCount) so the non-streaming path matches the documented logic and the
streaming path; update both occurrences accordingly and run tests.
| export const zoektSearch = async (searchRequest: ZoektGrpcSearchRequest, prisma: PrismaClient): Promise<SearchResponse> => { | ||
| const client = createGrpcClient(); | ||
| const metadata = new grpc.Metadata(); | ||
|
|
||
| return new Promise((resolve, reject) => { | ||
| client.Search(searchRequest, metadata, async (error, response) => { | ||
| if (error || !response) { | ||
| reject(error || new Error('No response received')); | ||
| return; | ||
| } | ||
|
|
||
| const reposMapCache = await createReposMapForChunk(response, new Map<string | number, Repo>(), prisma); | ||
| const { stats, files, repositoryInfo } = await transformZoektSearchResponse(response, reposMapCache); | ||
|
|
||
| resolve({ | ||
| stats, | ||
| files, | ||
| repositoryInfo, | ||
| isSearchExhaustive: stats.actualMatchCount <= stats.totalMatchCount, | ||
| } satisfies SearchResponse); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Errors in async callback can leave zoektSearch’s Promise hanging
The client.Search callback is marked async and uses await, but thrown/rejected errors inside that callback (e.g. from Prisma in createReposMapForChunk or from transformZoektSearchResponse) won’t be wired to the outer Promise unless you manually reject them. In the current form, such failures can result in a Promise that never resolves or rejects.
Refactor the callback to keep it synchronous and move the async work into an IIFE with explicit try/catch:
export const zoektSearch = async (searchRequest: ZoektGrpcSearchRequest, prisma: PrismaClient): Promise<SearchResponse> => {
const client = createGrpcClient();
const metadata = new grpc.Metadata();
return new Promise((resolve, reject) => {
- client.Search(searchRequest, metadata, async (error, response) => {
- if (error || !response) {
- reject(error || new Error('No response received'));
- return;
- }
-
- const reposMapCache = await createReposMapForChunk(response, new Map<string | number, Repo>(), prisma);
- const { stats, files, repositoryInfo } = await transformZoektSearchResponse(response, reposMapCache);
-
- resolve({
- stats,
- files,
- repositoryInfo,
- isSearchExhaustive: stats.actualMatchCount <= stats.totalMatchCount,
- } satisfies SearchResponse);
- });
+ client.Search(searchRequest, metadata, (error, response) => {
+ if (error || !response) {
+ reject(error || new Error('No response received'));
+ return;
+ }
+
+ (async () => {
+ try {
+ const reposMapCache = await createReposMapForChunk(
+ response,
+ new Map<string | number, Repo>(),
+ prisma,
+ );
+ const { stats, files, repositoryInfo } = await transformZoektSearchResponse(
+ response,
+ reposMapCache,
+ );
+
+ resolve({
+ stats,
+ files,
+ repositoryInfo,
+ isSearchExhaustive: stats.totalMatchCount <= stats.actualMatchCount,
+ } satisfies SearchResponse);
+ } catch (err) {
+ reject(err);
+ }
+ })();
+ });
});
}This ensures all downstream failures either resolve or reject the Promise instead of silently leaking.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In packages/web/src/features/search/zoektSearcher.ts around lines 118 to 139,
the grpc callback is declared async so any thrown/rejected errors from awaited
calls (createReposMapForChunk, transformZoektSearchResponse) are not propagated
to the outer Promise and can leave it hanging; change the client.Search callback
to a synchronous function and immediately invoke an async IIFE inside it that
wraps the awaited work in try/catch, calling resolve(...) on success and
reject(err) on any thrown/rejected error, and ensure the original
error/no-response branch still rejects and returns so the Promise is always
settled.
| export const getCodeHostBrowseFileAtBranchUrl = ({ | ||
| webUrl, | ||
| codeHostType, | ||
| branchName, | ||
| filePath, | ||
| }: { | ||
| webUrl?: string | null, | ||
| codeHostType: CodeHostType, | ||
| branchName: string, | ||
| filePath: string, | ||
| }) => { | ||
| if (!webUrl) { | ||
| return undefined; | ||
| } | ||
|
|
||
| switch (codeHostType) { | ||
| case 'github': | ||
| return `${webUrl}/blob/${branchName}/${filePath}`; | ||
| case 'gitlab': | ||
| return `${webUrl}/-/blob/${branchName}/${filePath}`; | ||
| case 'gitea': | ||
| return `${webUrl}/src/branch/${branchName}/${filePath}`; | ||
| case 'azuredevops': | ||
| return `${webUrl}?path=${filePath}&version=${branchName}`; | ||
| case 'bitbucketCloud': | ||
| return `${webUrl}/src/${branchName}/${filePath}`; | ||
| case 'bitbucketServer': | ||
| return `${webUrl}/browse/${filePath}?at=${branchName}`; | ||
| case 'gerrit': | ||
| return `${webUrl}/+/${branchName}/${filePath}`; | ||
| case 'genericGitHost': | ||
| return undefined; | ||
|
|
||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify URL patterns for code host file browsing.
The new function follows a consistent pattern with getCodeHostBrowseAtBranchUrl. However, please verify:
- Azure DevOps URL format: The pattern
?path={file}&version={branch}differs from the branch-only pattern. Confirm this matches Azure DevOps's file browsing URL structure. - File path encoding: Consider whether
filePathshould be URL-encoded, especially for paths containing spaces or special characters (e.g.,encodeURIComponent(filePath)or proper path segment encoding).
Test with a few sample file paths containing spaces or special characters to ensure the URLs work correctly across different code hosts.
🏁 Script executed:
#!/bin/bash
# Check if Azure DevOps and other code hosts handle file paths with special characters correctly
# Note: This is a manual verification prompt - actual testing should be done in a browser
echo "Sample file paths to test:"
echo "1. src/components/Button.tsx"
echo "2. src/utils/file name with spaces.ts"
echo "3. src/test/special-chars-#@!.js"
echo ""
echo "Verify these URL patterns work:"
echo "- Azure DevOps: {webUrl}?path={file}&version={branch}"
echo "- GitHub: {webUrl}/blob/{branch}/{file}"
echo "- GitLab: {webUrl}/-/blob/{branch}/{file}"Length of output: 295
I need to verify the concerns raised in the review comment. Let me search for information about the correct URL formats for these code hosting platforms and best practices for URL encoding file paths.
Now let me search for more specific information about how different code hosts handle file paths in URLs, and check if there are URL encoding utilities in the codebase.
Now let me examine the codebase to see if URL encoding is already being handled or if this is indeed a missing concern.
[/run_scripts]
I'll also search for any URL encoding utilities in the codebase:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Search for the function and its usage context
rg -n "getCodeHostBrowseFileAtBranchUrl" packages/web/src --context=5Length of output: 2471
🏁 Script executed:
#!/bin/bash
# Look for encodeURIComponent usage and URL encoding patterns
rg -n "encodeURIComponent|encodeURI|URLEncode" packages/web/src --type=ts --type=tsx | head -20Length of output: 27
Let me retry the URL encoding search with corrected syntax:
[/run_scripts]
Let me also check where `fileName` comes from in the zoektSearcher:
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Look for encodeURIComponent usage and URL encoding patterns
rg -n "encodeURIComponent|encodeURI" packages/web/srcLength of output: 1341
🏁 Script executed:
#!/bin/bash
# Check context around fileName variable in zoektSearcher.ts
rg -n "fileName" packages/web/src/features/search/zoektSearcher.ts -B5 -A2 | head -40Length of output: 1618
Apply URL encoding to file paths across all code host URL patterns.
The review concerns are valid. File paths from Zoekt search results are raw and unencoded, but are directly interpolated into URLs without encoding. This will break URLs containing spaces or special characters (e.g., file name with spaces.ts, special-#@!.js).
The codebase already uses encodeURIComponent() elsewhere for similar URL component encoding (e.g., packages/web/src/app/[domain]/browse/hooks/utils.ts:85). Apply this pattern consistently:
- Wrap all
filePathandbranchNameparameters withencodeURIComponent()in each case statement - Example:
${webUrl}/blob/${encodeURIComponent(branchName)}/${encodeURIComponent(filePath)}for GitHub - Apply to all six code host patterns: GitHub, GitLab, Gitea, Azure DevOps, Bitbucket Cloud/Server, and Gerrit
🤖 Prompt for AI Agents
In packages/web/src/lib/utils.ts around lines 379 to 413, filePath (and
branchName where used in path components) is inserted into host URLs unencoded
which breaks URLs containing spaces or special characters; update each case to
wrap branchName and filePath (where they appear in the path or query values)
with encodeURIComponent() — e.g. for GitHub use
`${webUrl}/blob/${encodeURIComponent(branchName)}/${encodeURIComponent(filePath)}`,
apply the same encoding pattern for GitLab, Gitea, Azure DevOps (encode query
values), Bitbucket Cloud/Server, and Gerrit, leaving genericGitHost behavior
unchanged.
| type SubtypeConstructor<Constructor extends new (...args: any) => any, Subtype> = { | ||
| new(...args: ConstructorParameters<Constructor>): Subtype; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify usage of SubtypeConstructor helper type.
The SubtypeConstructor type is defined but not used within this file. Please verify that it's actually used elsewhere in the codebase.
🏁 Script executed:
#!/bin/bash
# Description: Search for usages of SubtypeConstructor in the codebase
# Search for imports and usage of SubtypeConstructor
rg -n --type=ts --type=tsx 'SubtypeConstructor' -g '!packages/web/src/proto/query.ts'Length of output: 27
🏁 Script executed:
#!/bin/bash
# Description: Search for usages of SubtypeConstructor in the codebase
# Using glob patterns to match both .ts and .tsx files
# Search for SubtypeConstructor in all TypeScript files (excluding query.ts)
rg -n 'SubtypeConstructor' --glob '!packages/web/src/proto/query.ts' -g '*.ts' -g '*.tsx'Length of output: 452
🏁 Script executed:
#!/bin/bash
# Description: Check imports in webserver.ts to understand SubtypeConstructor relationship
# Show imports/top of webserver.ts
head -60 packages/web/src/proto/webserver.ts | grep -n -A5 -B5 'SubtypeConstructor\|import'
# Also check if query.ts exports SubtypeConstructor explicitly
echo "=== Checking query.ts exports ==="
head -30 packages/web/src/proto/query.ts | grep -n -A2 -B2 'export\|SubtypeConstructor'Length of output: 8133
Remove unused SubtypeConstructor type definition.
Verification confirms that SubtypeConstructor at lines 24-26 in packages/web/src/proto/query.ts is unused. While webserver.ts defines and uses an identical helper type, it maintains its own independent definition and does not import from query.ts. Delete this unused type from query.ts.
🤖 Prompt for AI Agents
In packages/web/src/proto/query.ts around lines 24 to 26, remove the unused
TypeScript type alias SubtypeConstructor (the constructor-to-subtype helper)
because it's not referenced anywhere in this file; delete the three-line type
declaration and ensure no imports/exports depend on it so the file compiles
cleanly after removal.
Summary by CodeRabbit
New Features
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.