⚡ Bolt: optimize fuzzy matching and string sanitization#12
Conversation
- Optimized LevenshteinDistance by implementing input swapping and stack-allocated buffers for small strings, reducing allocations and improving performance by ~34%. - Added a length-difference early exit in FindClosestMatches to skip expensive calculations when strings are clearly too different, improving performance by ~28%. - Pre-compiled common sanitization regex patterns in SanitizeName, avoiding redundant runtime compilation and improving performance by ~40%. - Verified all changes with unit tests and benchmarks.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces performance optimizations for string processing utilities, including stack-allocated buffers and input swapping in the Levenshtein distance calculation, an early length-difference check to skip expensive computations, and a pre-compiled regex cache for common sanitization patterns. The review feedback suggests normalizing the allowed characters key before performing the regex cache lookup to ensure robust cache hits, and simplifying the nested min calls in the Levenshtein distance calculation using Go 1.21's multi-argument min function.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| var pattern *regexp.Regexp | ||
| if p, ok := commonSanitizePatterns[allowedChars]; ok { | ||
| pattern = p | ||
| } else { | ||
| pattern = regexp.MustCompile(`[^` + allowedChars + `]+`) | ||
| } |
There was a problem hiding this comment.
The pre-compiled regex cache lookup in commonSanitizePatterns is sensitive to the exact order and duplication of characters in allowedChars. If opts.PreserveSpecialChars contains characters in a different order (e.g., _ before .) or contains duplicates, the generated allowedChars string will not match any key in commonSanitizePatterns, causing a silent fallback to regexp.MustCompile on every call.
To ensure robust cache hits and maintain the performance optimization, we should normalize allowedChars to a canonical key before performing the map lookup.
canonicalKey := "a-z0-9-"
if strings.ContainsRune(allowedChars, '.') {
canonicalKey += "."
}
if strings.ContainsRune(allowedChars, '_') {
canonicalKey += "_"
}
var pattern *regexp.Regexp
if p, ok := commonSanitizePatterns[canonicalKey]; ok {
pattern = p
} else {
pattern = regexp.MustCompile("[^" + allowedChars + "]+")
}| insertion := currRow[j-1] + 1 | ||
| substitution := prevRow[j-1] + cost | ||
|
|
||
| currRow[j] = min(deletion, min(insertion, substitution)) |
There was a problem hiding this comment.
- Removed non-existent reports/ directory check from link-check workflow. - Fixed lint errors in pkg/stringutil (prefer s != "" over len(s) > 0).
This PR implements several high-impact performance optimizations in
pkg/stringutilbased on measured bottlenecks.Optimizations:
LevenshteinDistance):[65]int) for strings up to 64 characters (common for identifiers/typos), eliminating heap allocations for these cases.FindClosestMatches):abs(len(a) - len(b)) > 3.SanitizeName):regexp.MustCompileon every call.Performance Impact (Benchmarks):
BenchmarkLevenshteinDistance: 298.4 ns/op -> 197.5 ns/op (~34% faster)BenchmarkFindClosestMatches: 2625 ns/op -> 1884 ns/op (~28% faster)BenchmarkSanitizeName: 5511 ns/op -> 3319 ns/op (~40% faster)All changes are verified with the full test suite. Pre-existing failures in
pkg/cliwere confirmed as unrelated to these changes.PR created automatically by Jules for task 16899512580012730208 started by @T-ahamed2