⚡ Bolt: Performance optimization for validation and fix for telemetry cache bug#324
Conversation
- Pre-compile SECRET_PATTERNS regexes in 02_validate_clean.py - Use faster whitespace removal in fuzzy_hash using str.split() - Implement regex-based fast-path for secret detection - Fix NameError in telemetry.py get_state that broke state caching
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
There was a problem hiding this comment.
Code Review
This pull request introduces several performance optimizations, primarily in the secret detection and fuzzy hashing logic within scripts/02_validate_clean.py by pre-compiling regex patterns and using more efficient string manipulation. It also updates the telemetry state cache lookup in heidi_engine/telemetry.py. A review comment points out that the cache check in telemetry.py is redundant because the same lookup is performed earlier in the function, suggesting its removal to simplify the code.
| # BOLT OPTIMIZATION: Check thread-safe state cache | ||
| cached = _state_cache.get(target_run_id, state_file) | ||
| cached = _state_cache.get(resolved_run_id) | ||
| if cached: | ||
| return cached |
There was a problem hiding this comment.
This cache check is redundant. The state cache is already checked at line 721 using the same resolved_run_id. Since no cache update or state modification occurs between these two points (only a file existence check), this second check will always yield the same result as the first one. Removing it simplifies the code and avoids an unnecessary lock acquisition in the cache.
💡 What:
scripts/02_validate_clean.pyby pre-compiling regex patterns and implementing a regex-based fast-path for secret detection.re.sub(r"\s+", "", text)with"".join(text.split())infuzzy_hashfor faster whitespace removal.NameErrorinheidi_engine/telemetry.pywithin theget_statefunction that was causing the state cache to be bypassed due to an undefinedtarget_run_id.🎯 Why:
detect_secretswas happening redundantly for every field in every sample.re.subfor simple whitespace removal is significantly slower than string splitting and joining.telemetry.pycaused frequent disk I/O and JSON parsing during high-frequency polling (e.g., from the dashboard).📊 Impact:
🔬 Measurement:
pytest tests/test_telemetry_cache.py, which now passes.PR created automatically by Jules for task 1993706443237528340 started by @heidi-dang