Skip to content

⚡ Bolt: Optimize SanitizeName by pre-compiling regex variants#10

Draft
T-ahamed2 wants to merge 1 commit into
mainfrom
bolt-optimize-sanitize-regex-variants-3564610116317565925
Draft

⚡ Bolt: Optimize SanitizeName by pre-compiling regex variants#10
T-ahamed2 wants to merge 1 commit into
mainfrom
bolt-optimize-sanitize-regex-variants-3564610116317565925

Conversation

@T-ahamed2

Copy link
Copy Markdown
Owner

This optimization targets the SanitizeName function in pkg/stringutil/sanitize.go. Previously, applySanitizePattern would compile a new regular expression on every invocation, which is expensive for a frequently used utility.

By identifying that only a small number of regex patterns are actually possible (based on whether '.' or '_' are preserved), I pre-compiled these 4 variants at the package level.

📊 Performance Gain:

  • Baseline: ~6324 ns/op
  • Optimized: ~2435 ns/op
  • Improvement: ~61% faster

The change was verified for correctness using existing unit tests and for performance using newly added benchmarks. Pre-existing unrelated test failures in other packages were noted.


PR created automatically by Jules for task 3564610116317565925 started by @T-ahamed2

Pre-compiled the 4 possible regex patterns used in `applySanitizePattern`
to avoid expensive recompilation on every call to `SanitizeName`.

Performance impact:
Reduces `SanitizeName` execution time by ~60% (from ~6.5µs to ~2.6µs per op).

Tests:
- Added `BenchmarkSanitizeName` to `pkg/stringutil/sanitize_test.go`
- Verified with `go test -bench BenchmarkSanitizeName ./pkg/stringutil/...`
- Ensured all tests in `pkg/stringutil` pass.
@google-labs-jules

Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 734e4afd-9b3f-4ff8-91e3-8db875d3def6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bolt-optimize-sanitize-regex-variants-3564610116317565925

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes string sanitization in pkg/stringutil/sanitize.go by pre-compiling regex variants to avoid expensive recompilation, and adds corresponding benchmarks in pkg/stringutil/sanitize_test.go. The review feedback recommends placing the literal hyphen at the end of the regex character classes to prevent parsing ambiguity, adding a defensive nil check for options in buildSanitizePreservePattern, and updating the pattern matching in applySanitizePattern with a fallback compilation mechanism to ensure future-proofing and correctness.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +13 to +22
var (
multipleHyphens = regexp.MustCompile(`-+`)

// Pre-compiled regex variants for SanitizeName to avoid expensive recompilation.
// These cover all possible combinations of preserved special characters (currently '.' and '_').
reSanitizeDefault = regexp.MustCompile(`[^a-z0-9-]+`)
reSanitizeWithDot = regexp.MustCompile(`[^a-z0-9-.]+`)
reSanitizeWithUnderscore = regexp.MustCompile(`[^a-z0-9-_]+`)
reSanitizeWithBoth = regexp.MustCompile(`[^a-z0-9-._]+`)
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In regular expression character classes, placing a hyphen - in the middle of other characters (e.g., [^a-z0-9-.]+ or [^a-z0-9-_]+) can be ambiguous or error-prone, as it might be interpreted as a range separator depending on the context or regex engine.

It is a best practice to place the literal hyphen at the very end of the character class (e.g., [^a-z0-9.-]+ or [^a-z0-9_-]+) to ensure it is unambiguously parsed as a literal hyphen.

Suggested change
var (
multipleHyphens = regexp.MustCompile(`-+`)
// Pre-compiled regex variants for SanitizeName to avoid expensive recompilation.
// These cover all possible combinations of preserved special characters (currently '.' and '_').
reSanitizeDefault = regexp.MustCompile(`[^a-z0-9-]+`)
reSanitizeWithDot = regexp.MustCompile(`[^a-z0-9-.]+`)
reSanitizeWithUnderscore = regexp.MustCompile(`[^a-z0-9-_]+`)
reSanitizeWithBoth = regexp.MustCompile(`[^a-z0-9-._]+`)
)
var (
multipleHyphens = regexp.MustCompile(`-+`)
// Pre-compiled regex variants for SanitizeName to avoid expensive recompilation.
// These cover all possible combinations of preserved special characters (currently '.' and '_').
// The hyphen is placed at the end of the character class to prevent it from being interpreted as a range.
reSanitizeDefault = regexp.MustCompile(`[^a-z0-9-]+`)
reSanitizeWithDot = regexp.MustCompile(`[^a-z0-9.-]+`)
reSanitizeWithUnderscore = regexp.MustCompile(`[^a-z0-9_-]+`)
reSanitizeWithBoth = regexp.MustCompile(`[^a-z0-9._-]+`)
)

Comment on lines 137 to 151
func buildSanitizePreservePattern(opts *SanitizeOptions) string {
var preserveChars strings.Builder
preserveChars.WriteString("a-z0-9-") // Always preserve alphanumeric and hyphens
for _, char := range opts.PreserveSpecialChars {
switch char {
case '.', '_':
preserveChars.WriteRune(char)
}
hasDot := slices.Contains(opts.PreserveSpecialChars, '.')
hasUnderscore := slices.Contains(opts.PreserveSpecialChars, '_')

if hasDot && hasUnderscore {
return "a-z0-9-._"
}
return preserveChars.String()
if hasDot {
return "a-z0-9-."
}
if hasUnderscore {
return "a-z0-9-_"
}
return "a-z0-9-"
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To align with placing the hyphen at the end of the character class in the pre-compiled regexes, we should update the returned strings here as well.

Additionally, let's add a defensive nil check for opts to prevent potential nil-pointer dereferences if this helper is ever called with a nil options struct in the future.

Suggested change
func buildSanitizePreservePattern(opts *SanitizeOptions) string {
var preserveChars strings.Builder
preserveChars.WriteString("a-z0-9-") // Always preserve alphanumeric and hyphens
for _, char := range opts.PreserveSpecialChars {
switch char {
case '.', '_':
preserveChars.WriteRune(char)
}
hasDot := slices.Contains(opts.PreserveSpecialChars, '.')
hasUnderscore := slices.Contains(opts.PreserveSpecialChars, '_')
if hasDot && hasUnderscore {
return "a-z0-9-._"
}
return preserveChars.String()
if hasDot {
return "a-z0-9-."
}
if hasUnderscore {
return "a-z0-9-_"
}
return "a-z0-9-"
}
func buildSanitizePreservePattern(opts *SanitizeOptions) string {
if opts == nil {
return "a-z0-9-"
}
hasDot := slices.Contains(opts.PreserveSpecialChars, '.')
hasUnderscore := slices.Contains(opts.PreserveSpecialChars, '_')
if hasDot && hasUnderscore {
return "a-z0-9._-"
}
if hasDot {
return "a-z0-9.-"
}
if hasUnderscore {
return "a-z0-9_-"
}
return "a-z0-9-"
}

Comment on lines 157 to +168
func applySanitizePattern(result, allowedChars string, preserveSpecialChars bool) string {
pattern := regexp.MustCompile(`[^` + allowedChars + `]+`)
var pattern *regexp.Regexp
switch allowedChars {
case "a-z0-9-.":
pattern = reSanitizeWithDot
case "a-z0-9-_":
pattern = reSanitizeWithUnderscore
case "a-z0-9-._":
pattern = reSanitizeWithBoth
default:
pattern = reSanitizeDefault
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Let's update the switch cases to match the updated character class strings.

Additionally, the current default case silently falls back to reSanitizeDefault if an unexpected allowedChars string is passed. If a developer adds support for other special characters in the future but forgets to update this switch statement, it would lead to silent correctness bugs (unwanted characters would be stripped instead of preserved).

Adding an explicit case for "a-z0-9-" and falling back to compiling the regex on the fly in the default case ensures correctness and future-proofs the code.

func applySanitizePattern(result, allowedChars string, preserveSpecialChars bool) string {
	var pattern *regexp.Regexp
	switch allowedChars {
	case "a-z0-9-":
		pattern = reSanitizeDefault
	case "a-z0-9.-":
		pattern = reSanitizeWithDot
	case "a-z0-9_-":
		pattern = reSanitizeWithUnderscore
	case "a-z0-9._-":
		pattern = reSanitizeWithBoth
	default:
		// Fallback to compiling on the fly if an unexpected pattern is passed (e.g. future extensions)
		pattern = regexp.MustCompile(`[^` + allowedChars + `]+`)
	}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant