⚡ Bolt: optimize fuzzy matching and string sanitization by T-ahamed2 · Pull Request #12 · T-ahamed2/gh-aw

T-ahamed2 · 2026-06-28T12:42:20Z

This PR implements several high-impact performance optimizations in pkg/stringutil based on measured bottlenecks.

Optimizations:

Fuzzy Matching (LevenshteinDistance):
- Input Swapping: Ensures the shorter string drives the row allocation size, minimizing memory footprint.
- Stack Allocation: Uses stack-allocated buffers ([65]int) for strings up to 64 characters (common for identifiers/typos), eliminating heap allocations for these cases.
Fuzzy Matching (FindClosestMatches):
- Early Exit: Added an O(1) length check. Since Levenshtein distance is at least the absolute difference in string lengths, we can skip the O(N*M) calculation if abs(len(a) - len(b)) > 3.
String Sanitization (SanitizeName):
- Regex Pre-compilation: Pre-compiled frequently used sanitization patterns into a package-level map, removing the overhead of regexp.MustCompile on every call.

Performance Impact (Benchmarks):

BenchmarkLevenshteinDistance: 298.4 ns/op -> 197.5 ns/op (~34% faster)
BenchmarkFindClosestMatches: 2625 ns/op -> 1884 ns/op (~28% faster)
BenchmarkSanitizeName: 5511 ns/op -> 3319 ns/op (~40% faster)

All changes are verified with the full test suite. Pre-existing failures in pkg/cli were confirmed as unrelated to these changes.

PR created automatically by Jules for task 16899512580012730208 started by @T-ahamed2

- Optimized LevenshteinDistance by implementing input swapping and stack-allocated buffers for small strings, reducing allocations and improving performance by ~34%. - Added a length-difference early exit in FindClosestMatches to skip expensive calculations when strings are clearly too different, improving performance by ~28%. - Pre-compiled common sanitization regex patterns in SanitizeName, avoiding redundant runtime compilation and improving performance by ~40%. - Verified all changes with unit tests and benchmarks.

google-labs-jules · 2026-06-28T12:42:21Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

coderabbitai · 2026-06-28T12:42:26Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 093318c3-1345-4441-82a2-0c81e2975c9c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch bolt-optimize-stringutil-16899512580012730208

_{Comment @coderabbitai help to get the list of available commands.}

gemini-code-assist

Code Review

This pull request introduces performance optimizations for string processing utilities, including stack-allocated buffers and input swapping in the Levenshtein distance calculation, an early length-difference check to skip expensive computations, and a pre-compiled regex cache for common sanitization patterns. The review feedback suggests normalizing the allowed characters key before performing the regex cache lookup to ensure robust cache hits, and simplifying the nested min calls in the Levenshtein distance calculation using Go 1.21's multi-argument min function.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-28T12:43:40Z

+	var pattern *regexp.Regexp
+	if p, ok := commonSanitizePatterns[allowedChars]; ok {
+		pattern = p
+	} else {
+		pattern = regexp.MustCompile(`[^` + allowedChars + `]+`)
+	}


The pre-compiled regex cache lookup in commonSanitizePatterns is sensitive to the exact order and duplication of characters in allowedChars. If opts.PreserveSpecialChars contains characters in a different order (e.g., _ before .) or contains duplicates, the generated allowedChars string will not match any key in commonSanitizePatterns, causing a silent fallback to regexp.MustCompile on every call.

To ensure robust cache hits and maintain the performance optimization, we should normalize allowedChars to a canonical key before performing the map lookup.

canonicalKey := "a-z0-9-" if strings.ContainsRune(allowedChars, '.') { canonicalKey += "." } if strings.ContainsRune(allowedChars, '_') { canonicalKey += "_" } var pattern *regexp.Regexp if p, ok := commonSanitizePatterns[canonicalKey]; ok { pattern = p } else { pattern = regexp.MustCompile("[^" + allowedChars + "]+") }

gemini-code-assist · 2026-06-28T12:43:40Z

+			insertion := currRow[j-1] + 1
+			substitution := prevRow[j-1] + cost
+
+			currRow[j] = min(deletion, min(insertion, substitution))


Since Go 1.21, the built-in min function supports a variable number of arguments. We can simplify the nested min calls to a single call for better readability.

Suggested change

currRow[j] = min(deletion, min(insertion, substitution))

currRow[j] = min(deletion, insertion, substitution)

- Removed non-existent reports/ directory check from link-check workflow. - Fixed lint errors in pkg/stringutil (prefer s != "" over len(s) > 0).

gemini-code-assist Bot reviewed Jun 28, 2026

View reviewed changes

⚡ Bolt: fix CI failures and lint errors

94f13a5

- Removed non-existent reports/ directory check from link-check workflow. - Fixed lint errors in pkg/stringutil (prefer s != "" over len(s) > 0).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Bolt: optimize fuzzy matching and string sanitization#12

⚡ Bolt: optimize fuzzy matching and string sanitization#12
T-ahamed2 wants to merge 2 commits into
mainfrom
bolt-optimize-stringutil-16899512580012730208

T-ahamed2 commented Jun 28, 2026

Uh oh!

google-labs-jules Bot commented Jun 28, 2026

Uh oh!

coderabbitai Bot commented Jun 28, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 28, 2026

Uh oh!

gemini-code-assist Bot Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	currRow[j] = min(deletion, min(insertion, substitution))
	currRow[j] = min(deletion, insertion, substitution)

Conversation

T-ahamed2 commented Jun 28, 2026

Optimizations:

Performance Impact (Benchmarks):

Uh oh!

google-labs-jules Bot commented Jun 28, 2026

Uh oh!

coderabbitai Bot commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 28, 2026 •

edited

Loading