⚡ Bolt: Optimize SanitizeName by pre-compiling regex variants#10
⚡ Bolt: Optimize SanitizeName by pre-compiling regex variants#10T-ahamed2 wants to merge 1 commit into
Conversation
Pre-compiled the 4 possible regex patterns used in `applySanitizePattern` to avoid expensive recompilation on every call to `SanitizeName`. Performance impact: Reduces `SanitizeName` execution time by ~60% (from ~6.5µs to ~2.6µs per op). Tests: - Added `BenchmarkSanitizeName` to `pkg/stringutil/sanitize_test.go` - Verified with `go test -bench BenchmarkSanitizeName ./pkg/stringutil/...` - Ensured all tests in `pkg/stringutil` pass.
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Code Review
This pull request optimizes string sanitization in pkg/stringutil/sanitize.go by pre-compiling regex variants to avoid expensive recompilation, and adds corresponding benchmarks in pkg/stringutil/sanitize_test.go. The review feedback recommends placing the literal hyphen at the end of the regex character classes to prevent parsing ambiguity, adding a defensive nil check for options in buildSanitizePreservePattern, and updating the pattern matching in applySanitizePattern with a fallback compilation mechanism to ensure future-proofing and correctness.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| var ( | ||
| multipleHyphens = regexp.MustCompile(`-+`) | ||
|
|
||
| // Pre-compiled regex variants for SanitizeName to avoid expensive recompilation. | ||
| // These cover all possible combinations of preserved special characters (currently '.' and '_'). | ||
| reSanitizeDefault = regexp.MustCompile(`[^a-z0-9-]+`) | ||
| reSanitizeWithDot = regexp.MustCompile(`[^a-z0-9-.]+`) | ||
| reSanitizeWithUnderscore = regexp.MustCompile(`[^a-z0-9-_]+`) | ||
| reSanitizeWithBoth = regexp.MustCompile(`[^a-z0-9-._]+`) | ||
| ) |
There was a problem hiding this comment.
In regular expression character classes, placing a hyphen - in the middle of other characters (e.g., [^a-z0-9-.]+ or [^a-z0-9-_]+) can be ambiguous or error-prone, as it might be interpreted as a range separator depending on the context or regex engine.
It is a best practice to place the literal hyphen at the very end of the character class (e.g., [^a-z0-9.-]+ or [^a-z0-9_-]+) to ensure it is unambiguously parsed as a literal hyphen.
| var ( | |
| multipleHyphens = regexp.MustCompile(`-+`) | |
| // Pre-compiled regex variants for SanitizeName to avoid expensive recompilation. | |
| // These cover all possible combinations of preserved special characters (currently '.' and '_'). | |
| reSanitizeDefault = regexp.MustCompile(`[^a-z0-9-]+`) | |
| reSanitizeWithDot = regexp.MustCompile(`[^a-z0-9-.]+`) | |
| reSanitizeWithUnderscore = regexp.MustCompile(`[^a-z0-9-_]+`) | |
| reSanitizeWithBoth = regexp.MustCompile(`[^a-z0-9-._]+`) | |
| ) | |
| var ( | |
| multipleHyphens = regexp.MustCompile(`-+`) | |
| // Pre-compiled regex variants for SanitizeName to avoid expensive recompilation. | |
| // These cover all possible combinations of preserved special characters (currently '.' and '_'). | |
| // The hyphen is placed at the end of the character class to prevent it from being interpreted as a range. | |
| reSanitizeDefault = regexp.MustCompile(`[^a-z0-9-]+`) | |
| reSanitizeWithDot = regexp.MustCompile(`[^a-z0-9.-]+`) | |
| reSanitizeWithUnderscore = regexp.MustCompile(`[^a-z0-9_-]+`) | |
| reSanitizeWithBoth = regexp.MustCompile(`[^a-z0-9._-]+`) | |
| ) |
| func buildSanitizePreservePattern(opts *SanitizeOptions) string { | ||
| var preserveChars strings.Builder | ||
| preserveChars.WriteString("a-z0-9-") // Always preserve alphanumeric and hyphens | ||
| for _, char := range opts.PreserveSpecialChars { | ||
| switch char { | ||
| case '.', '_': | ||
| preserveChars.WriteRune(char) | ||
| } | ||
| hasDot := slices.Contains(opts.PreserveSpecialChars, '.') | ||
| hasUnderscore := slices.Contains(opts.PreserveSpecialChars, '_') | ||
|
|
||
| if hasDot && hasUnderscore { | ||
| return "a-z0-9-._" | ||
| } | ||
| return preserveChars.String() | ||
| if hasDot { | ||
| return "a-z0-9-." | ||
| } | ||
| if hasUnderscore { | ||
| return "a-z0-9-_" | ||
| } | ||
| return "a-z0-9-" | ||
| } |
There was a problem hiding this comment.
To align with placing the hyphen at the end of the character class in the pre-compiled regexes, we should update the returned strings here as well.
Additionally, let's add a defensive nil check for opts to prevent potential nil-pointer dereferences if this helper is ever called with a nil options struct in the future.
| func buildSanitizePreservePattern(opts *SanitizeOptions) string { | |
| var preserveChars strings.Builder | |
| preserveChars.WriteString("a-z0-9-") // Always preserve alphanumeric and hyphens | |
| for _, char := range opts.PreserveSpecialChars { | |
| switch char { | |
| case '.', '_': | |
| preserveChars.WriteRune(char) | |
| } | |
| hasDot := slices.Contains(opts.PreserveSpecialChars, '.') | |
| hasUnderscore := slices.Contains(opts.PreserveSpecialChars, '_') | |
| if hasDot && hasUnderscore { | |
| return "a-z0-9-._" | |
| } | |
| return preserveChars.String() | |
| if hasDot { | |
| return "a-z0-9-." | |
| } | |
| if hasUnderscore { | |
| return "a-z0-9-_" | |
| } | |
| return "a-z0-9-" | |
| } | |
| func buildSanitizePreservePattern(opts *SanitizeOptions) string { | |
| if opts == nil { | |
| return "a-z0-9-" | |
| } | |
| hasDot := slices.Contains(opts.PreserveSpecialChars, '.') | |
| hasUnderscore := slices.Contains(opts.PreserveSpecialChars, '_') | |
| if hasDot && hasUnderscore { | |
| return "a-z0-9._-" | |
| } | |
| if hasDot { | |
| return "a-z0-9.-" | |
| } | |
| if hasUnderscore { | |
| return "a-z0-9_-" | |
| } | |
| return "a-z0-9-" | |
| } |
| func applySanitizePattern(result, allowedChars string, preserveSpecialChars bool) string { | ||
| pattern := regexp.MustCompile(`[^` + allowedChars + `]+`) | ||
| var pattern *regexp.Regexp | ||
| switch allowedChars { | ||
| case "a-z0-9-.": | ||
| pattern = reSanitizeWithDot | ||
| case "a-z0-9-_": | ||
| pattern = reSanitizeWithUnderscore | ||
| case "a-z0-9-._": | ||
| pattern = reSanitizeWithBoth | ||
| default: | ||
| pattern = reSanitizeDefault | ||
| } |
There was a problem hiding this comment.
Let's update the switch cases to match the updated character class strings.
Additionally, the current default case silently falls back to reSanitizeDefault if an unexpected allowedChars string is passed. If a developer adds support for other special characters in the future but forgets to update this switch statement, it would lead to silent correctness bugs (unwanted characters would be stripped instead of preserved).
Adding an explicit case for "a-z0-9-" and falling back to compiling the regex on the fly in the default case ensures correctness and future-proofs the code.
func applySanitizePattern(result, allowedChars string, preserveSpecialChars bool) string {
var pattern *regexp.Regexp
switch allowedChars {
case "a-z0-9-":
pattern = reSanitizeDefault
case "a-z0-9.-":
pattern = reSanitizeWithDot
case "a-z0-9_-":
pattern = reSanitizeWithUnderscore
case "a-z0-9._-":
pattern = reSanitizeWithBoth
default:
// Fallback to compiling on the fly if an unexpected pattern is passed (e.g. future extensions)
pattern = regexp.MustCompile(`[^` + allowedChars + `]+`)
}
This optimization targets the
SanitizeNamefunction inpkg/stringutil/sanitize.go. Previously,applySanitizePatternwould compile a new regular expression on every invocation, which is expensive for a frequently used utility.By identifying that only a small number of regex patterns are actually possible (based on whether '.' or '_' are preserved), I pre-compiled these 4 variants at the package level.
📊 Performance Gain:
The change was verified for correctness using existing unit tests and for performance using newly added benchmarks. Pre-existing unrelated test failures in other packages were noted.
PR created automatically by Jules for task 3564610116317565925 started by @T-ahamed2