⚡ Bolt: Hoist regex patterns and avoid repeated string allocations in ATS Generator#317
⚡ Bolt: Hoist regex patterns and avoid repeated string allocations in ATS Generator#317anchapin wants to merge 1 commit into
Conversation
By hoisting regular expressions out of repeatedly called methods, we avoid the overhead of dynamically compiling and re-allocating them during execution. Also hoisted the static action verbs list and refactored the lowercase text caching to only do what's required without mutating cases too early, preventing logical bugs like missing acronyms. Co-authored-by: anchapin <6326294+anchapin@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
Reviewer's GuideHoists frequently used regex patterns and static lists in the ATS generator to module-level constants, adjusts string handling to preserve case where needed, and reduces repeated lowercasing and regex compilation in hot paths, with associated test updates and documentation of the performance learnings. Flow diagram for updated text processing and regex usage in ATS generatorflowchart TD
A["resume_data"] --> B["_get_all_text(resume_data)"]
B --> C["all_text (case preserved)"]
%% Acronym path uses original casing
C --> D["_ACRONYM_PATTERN.findall(all_text)"]
D --> E["Acronym heuristics in _check_readability"]
%% Readability checks using lowercased text
C --> F["all_text_lower = all_text.lower()"]
F --> G["sum(1 for verb in _ACTION_VERBS if verb in all_text_lower)"]
F --> H["_QUANTIFIABLE_PATTERN.search(all_text)"]
%% Format parsing using precompiled patterns
C --> I["_TABLE_PATTERN.search(all_text)"]
C --> J["_SPECIAL_CHARS_PATTERN.findall(all_text)"]
%% Contact info using precompiled patterns
K["contact.get('email')"] --> L["_EMAIL_PATTERN.search(email)"]
M["contact.get('phone')"] --> N["_PHONE_PATTERN.search(phone)"]
%% Keyword extraction using precompiled patterns
O["response from _call_openai"] --> P["_JSON_PATTERN.search(response)"]
Q["resume bullets text.lower()"] --> R["_RESUME_KEYWORD_PATTERN.findall(text)"]
S["summary.lower()"] --> T["_SUMMARY_KEYWORD_PATTERN.findall(summary)"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
_JSON_PATTERN = re.compile(r"\[.*\]", flags=re.DOTALL)is a greedy match and can overrun or be unnecessarily expensive on long responses; consider making it non-greedy (e.g.r"\[.*?\]") or anchoring it more tightly around the expected JSON segment. - Now that
_get_all_textreturns case-preserved text, it might be worth quickly scanning other call sites in this module to see if any still implicitly rely on the old lowercased behavior and should explicitly call.lower()like_check_readabilitydoes.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `_JSON_PATTERN = re.compile(r"\[.*\]", flags=re.DOTALL)` is a greedy match and can overrun or be unnecessarily expensive on long responses; consider making it non-greedy (e.g. `r"\[.*?\]"`) or anchoring it more tightly around the expected JSON segment.
- Now that `_get_all_text` returns case-preserved text, it might be worth quickly scanning other call sites in this module to see if any still implicitly rely on the old lowercased behavior and should explicitly call `.lower()` like `_check_readability` does.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
💡 What:
_TABLE_PATTERN,_SPECIAL_CHARS_PATTERN,_EMAIL_PATTERN,_PHONE_PATTERN,_QUANTIFIABLE_PATTERN,_ACRONYM_PATTERN,_JSON_PATTERN,_RESUME_KEYWORD_PATTERN,_SUMMARY_KEYWORD_PATTERN) to module-level constants incli/generators/ats_generator.py._ACTION_VERBSto a module-level constant._get_all_textto preserve string case, resolving previous regressions around acronym searches which require uppercase characters._check_readabilityto cache.lower()text outside loop comprehensions.🎯 Why:
The ATS Generator frequently recalculates complex logic and dynamically rebuilds regular expressions during execution. Specifically,
_check_readabilitywas doingsum(1 for verb in action_verbs if verb in all_text.lower()), recalculating.lower()inside a loop. The regex patterns were being continually constructed via string inputs on every parse cycle.📊 Impact:
🔬 Measurement:
pytest tests/test_ats_generator.py(passes)make formatandmake lintconfirm no breakages.PR created automatically by Jules for task 1347257136309222624 started by @anchapin
Summary by Sourcery
Hoist frequently used regex patterns and static data in the ATS generator to module scope and adjust text handling to improve performance and correctness.
Enhancements:
Documentation:
Tests: