-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[TST] More benchmark queries for regex #4910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer ChecklistPlease leverage this checklist to ensure your code review is thorough before approving Testing, Bugs, Errors, Logs, Documentation
System Compatibility
Quality
|
Benchmark Expansion for Regex Patterns and Rust Dataset; Arrow/Parquet Ecosystem Updates This PR makes major changes to the benchmarking/integration suite for regex and fulltext search functionality by expanding the coverage and diversity of regular expressions used in tests and benchmarks, primarily focusing on realistic Rust code patterns. It introduces and wires up a new Rust dataset based on the BigCode 'the-stack-dedup' corpus, reworks the relevant dataset loader logic for async streaming from HuggingFace/hf-hub, and adapts the benchmark routines to leverage this dataset. There is also a substantial update to the Arrow and Parquet-related dependencies (from 52.x to 55.1) across the workspace, resulting in a large Key Changes: Affected Areas: This summary was automatically generated by @propel-code-bot |
4d57d3b
to
99c96f1
Compare
Description of changes
Summarize the changes made by this PR.
Test plan
How are these changes tested?
pytest
for python,yarn test
for js,cargo test
for rustDocumentation Changes
Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs section?