Skip to content

⚡ Bolt: optimize string sanitization and identifier normalization#17

Draft
T-ahamed2 wants to merge 4 commits into
mainfrom
bolt-optimize-stringutil-11803481015905286462
Draft

⚡ Bolt: optimize string sanitization and identifier normalization#17
T-ahamed2 wants to merge 4 commits into
mainfrom
bolt-optimize-stringutil-11803481015905286462

Conversation

@T-ahamed2

Copy link
Copy Markdown
Owner

⚡ Bolt: optimize string sanitization and identifier normalization

💡 What:

  • Optimized NormalizeSafeOutputIdentifier and normalizeSanitizeSeparators using strings.Replacer.
  • Pre-compiled common regex character class patterns in applySanitizePattern.
  • Optimized SanitizeForFilename with strings.Builder.Grow and a single-pass character iteration loop.
  • Added a performance learning to .jules/bolt.md.

🎯 Why:
String sanitization and identifier normalization are frequently called during workflow compilation and execution. The previous implementation used multiple strings.ReplaceAll calls and re-compiled regexes on every call, leading to unnecessary allocations and CPU cycles.

📊 Impact:

  • NormalizeSafeOutputIdentifier: ~24% faster.
  • SanitizeForFilename: ~53% faster.
  • SanitizeName (various paths): ~40-70% faster for separator normalization and pattern application.

🔬 Measurement:
Verified with go test -bench . ./pkg/stringutil/.... All existing unit tests pass. Corrected regex range interpretation and functional regressions identified in initial review.


PR created automatically by Jules for task 11803481015905286462 started by @T-ahamed2

This commit implements several performance optimizations in pkg/stringutil:
- Use strings.Replacer for multi-character substitutions in NormalizeSafeOutputIdentifier and normalizeSanitizeSeparators (~24-70% faster).
- Pre-compile common regex patterns in applySanitizePattern (~40% faster).
- Optimize SanitizeForFilename using strings.Builder.Grow and single-pass iteration (~53% faster).

Benchmarks confirm measurable speedups across all affected functions while preserving existing behavior. Correctness verified with unit tests.

Signed-off-by: Bolt ⚡
@google-labs-jules

Copy link
Copy Markdown

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 926116cc-6f4a-4b52-9b26-f93d3030e300

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bolt-optimize-stringutil-11803481015905286462

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes string sanitization and identifier normalization by pre-compiling common regular expressions and string replacers, and pre-allocating string builder capacity. Feedback was provided to handle alternative orderings of allowed characters in applySanitizePattern to prevent falling back to on-the-fly regex compilation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +162 to +163
case "a-z0-9-._":
pattern = reAlphaNumHyphenDotUnd

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If opts.PreserveSpecialChars contains _ before ., buildSanitizePreservePattern will return "a-z0-9-_." . This will miss the pre-compiled regex cache and fall back to the default case, which compiles a new regular expression on every call via regexp.MustCompile. This can cause a significant performance cliff.

We should add "a-z0-9-_." as an alternative case to reuse the pre-compiled reAlphaNumHyphenDotUnd pattern.

Suggested change
case "a-z0-9-._":
pattern = reAlphaNumHyphenDotUnd
case "a-z0-9-._", "a-z0-9-_.":
pattern = reAlphaNumHyphenDotUnd

This commit implements several performance optimizations in pkg/stringutil:
- Use strings.Replacer for multi-character substitutions (~24-70% faster).
- Pre-compile common regex patterns in applySanitizePattern (~40% faster).
- Optimize SanitizeForFilename using strings.Builder.Grow and single-pass iteration (~56% faster).

Includes fixes for regex range interpretation and idiomatic string/slice length checks.
Also ensures reports/ directory exists for CI link checking.

Signed-off-by: Bolt ⚡
This commit implements several performance optimizations in pkg/stringutil:
- Use strings.Replacer for multi-character substitutions (~24-70% faster).
- Pre-compile common regex patterns in applySanitizePattern (~40% faster).
- Optimize SanitizeForFilename using strings.Builder.Grow and single-pass iteration (~53% faster).

Includes fixes for regex range interpretation and idiomatic string/slice checks.
Also ensures reports/ directory exists for CI link checking.

Signed-off-by: Bolt ⚡
Identified and implemented multiple performance optimizations in `pkg/stringutil`:

1.  **Identifier Normalization**: Optimized `NormalizeSafeOutputIdentifier` using `strings.Replacer` for single-pass substitution of dashes and periods, reducing allocations.
2.  **String Sanitization**:
    *   Optimized `normalizeSanitizeSeparators` and `applySanitizePattern` by replacing `strings.ReplaceAll` chains with `strings.NewReplacer`.
    *   Pre-compiled common regular expression patterns to avoid runtime compilation overhead.
    *   Fixed a bug in regex character classes where hyphens could be misinterpreted as ranges by moving them to the end of the class.
3.  **Filename Sanitization**: Re-implemented `SanitizeForFilename` to use a single-pass loop with a pre-allocated `strings.Builder`, eliminating redundant string allocations and multiple regex passes.

**Performance Impact:**
*   `NormalizeSafeOutputIdentifier`: ~24% faster.
*   `SanitizeName` (with common options): ~40-70% faster.
*   `SanitizeForFilename`: ~53% faster.

**CI/Maintenance:**
*   Updated string length checks to use `s != ""` as preferred by the project's custom linter.
*   Fixed a `staticcheck` error regarding redundant nil checks.
*   Added `reports/.gitkeep` to ensure the reports directory exists for CI workflows.
*   Recorded critical learning about Go regex hyphen placement in `.jules/bolt.md`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant