-
Notifications
You must be signed in to change notification settings - Fork 2.3k
perf: Speed up method LayoutPostprocessor._process_special_clusters
by 653%
#1952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mohammedahmed18
wants to merge
5
commits into
docling-project:main
Choose a base branch
from
mohammedahmed18:codeflash/optimize-LayoutPostprocessor._process_special_clusters-mcu3u6n5
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
β¦ 653% Here are targeted optimizations based on the profiling output and the code. ### Major bottlenecks & optimization strategies #### 1. `_process_special_clusters`: - **Main bottleneck:** - The nested loop: for each special cluster, loop through all regular clusters and compute `.bbox.intersection_over_self(special.bbox)`. - This is `O(N*M)` for N special and M regular clusters and is by far the slowest part. - **Optimization:** - **Pre-index regular clusters by bounding box for fast containment:** - Build a simple R-tree-like spatial grid (using bins, or just a fast bbox filtering pass) to filter out regular clusters that are definitely non-overlapping before running the expensive geometric calculation. - **If spatial index unavailable:** Pre-filter regulars to those whose bbox intersects the specialβs bbox (quick min/max bbox checks), greatly reducing pairwise calculations. #### 2. `_handle_cross_type_overlaps`: - **Similar bottleneck:** Again, checking every regular cluster for every wrapper. - We can apply the same bbox quick-check. #### 3. Miscellaneous. - **`_deduplicate_cells`/`_sort_cells` optimizations:** Minor, but batch sort/unique patterns can help. - **Avoid recomputation:** Avoid recomputing thresholds/constants in hot loops. Below is the optimized code addressing the biggest O(N*M) loop, using fast bbox intersection check for quick rejection before expensive calculation. We achieve this purely with local logic in the function (no external indices needed), and respect your constraint not to introduce module-level classes. Comments in the code indicate all changes. **Summary of changes:** - For both `_process_special_clusters` and `_handle_cross_type_overlaps`, we avoid unnecessary `.intersection_over_self` calculations by pre-filtering clusters based on simple bbox intersection conditions (`l < rx and r > lx and t < by and b > ty`). - This turns expensive O(N*M) geometric checks into a two-stage filter, which is extremely fast for typical bbox distributions. - All hot-spot loops now use local variables rather than repeated attribute lookups. - No changes are made to APIs, outputs, or major logic branches; only faster candidate filtering is introduced. This should reduce total runtime of `_process_special_clusters` and `_handle_cross_type_overlaps` by an order of magnitude on large documents.
β DCO Check Passed Thanks @mohammedahmed18, all your commits are properly signed off. π |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. π’ Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
β¦, mohammed <[email protected]>, hereby add my Signed-off-by to this commit: d982474\n\nSigned-off-by: mohammed <[email protected]>n
β¦bot]@users.noreply.github.com> I, codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>, hereby add my Signed-off-by to this commit: 3b8deae I, mohammed <[email protected]>, hereby add my Signed-off-by to this commit: bd8b1c4 I, mohammed <[email protected]>, hereby add my Signed-off-by to this commit: 7b84668 I, mohammed <[email protected]>, hereby add my Signed-off-by to this commit: ad90f33 Signed-off-by: mohammed <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
π 653% (6.53x) speedup for
LayoutPostprocessor._process_special_clusters
indocling/utils/layout_postprocessor.py
β±οΈ Runtime :
236 milliseconds
β31.3 milliseconds
(best of43
runs)π Explanation and details
Here are targeted optimizations based on the profiling output and the code.
Major bottlenecks & optimization strategies
1.
_process_special_clusters
:.bbox.intersection_over_self(special.bbox)
.O(N*M)
for N special and M regular clusters and is by far the slowest part.2.
_handle_cross_type_overlaps
:3. Miscellaneous.
_deduplicate_cells
/_sort_cells
optimizations: Minor, but batch sort/unique patterns can help.Below is the optimized code addressing the biggest O(N*M) loop, using fast bbox intersection check for quick rejection before expensive calculation.
We achieve this purely with local logic in the function (no external indices needed), and respect your constraint not to introduce module-level classes.
Comments in the code indicate all changes.
Summary of changes:
_process_special_clusters
and_handle_cross_type_overlaps
, we avoid unnecessary.intersection_over_self
calculations by pre-filtering clusters based on simple bbox intersection conditions (l < rx and r > lx and t < by and b > ty
).This should reduce total runtime of
_process_special_clusters
and_handle_cross_type_overlaps
by an order of magnitude on large documents.β Correctness verification report:
π Generated Regression Tests and Runtime
To edit these changes
git checkout codeflash/optimize-LayoutPostprocessor._process_special_clusters-mcu3u6n5
and push.