-
Notifications
You must be signed in to change notification settings - Fork 9
add polylines module: simplify + closest point queries #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Douglas-Peucker simplification in C++ - AABB tree batch closest point queries (CGAL) - Python wrapper with tests
|
@petrasvestartas can you help reviewing the technical aspects of the binding? thanks! |
|
@jf--- without an entry in the changelog, the checks will not complete... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a new polylines module to compas_cgal providing polyline simplification using the Douglas-Peucker algorithm and efficient batch closest point queries using CGAL's AABB tree.
Key Changes:
- Implements Douglas-Peucker simplification in C++ for 2D/3D polylines with configurable distance threshold
- Adds AABB tree-based closest point queries for batch processing of query points against a polyline
- Provides Python wrappers with functions:
simplify_polyline,simplify_polylines, andclosest_points_on_polyline
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_polylines.py | Comprehensive test suite covering simplification and closest point functionality with various edge cases |
| src/polylines.h | C++ header declaring the two main functions with documentation |
| src/polylines.cpp | C++ implementation of Douglas-Peucker algorithm and AABB tree closest point queries with Python bindings |
| src/compas_cgal/polylines.py | Python wrapper module providing user-facing API with type hints and documentation |
| src/compas_cgal/init.py | Registers the new polylines module for lazy loading |
| CMakeLists.txt | Adds the polylines module to the build system |
| CHANGELOG.md | Documents the new features in the unreleased section |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| static double perpendicular_distance( | ||
| double px, double py, | ||
| double x1, double y1, double x2, double y2) | ||
| { | ||
| double dx = x2 - x1; | ||
| double dy = y2 - y1; | ||
| double len_sq = dx * dx + dy * dy; | ||
|
|
||
| if (len_sq < 1e-12) { | ||
| // Degenerate segment | ||
| return std::sqrt((px - x1) * (px - x1) + (py - y1) * (py - y1)); | ||
| } | ||
|
|
||
| // Project point onto line | ||
| double t = ((px - x1) * dx + (py - y1) * dy) / len_sq; | ||
| t = std::max(0.0, std::min(1.0, t)); | ||
|
|
||
| double proj_x = x1 + t * dx; | ||
| double proj_y = y1 + t * dy; | ||
|
|
||
| return std::sqrt((px - proj_x) * (px - proj_x) + (py - proj_y) * (py - proj_y)); | ||
| } |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Douglas-Peucker implementation only uses X and Y coordinates (2D projection) for distance calculations, even when working with 3D polylines. This means that simplification will not consider deviations in the Z dimension.
For example, a 3D polyline like [[0,0,0], [1,0,5], [2,0,0]] would be simplified to just start and end points if the threshold is 0.1, even though the middle point has significant Z deviation.
This should be documented clearly, or the algorithm should be extended to handle 3D distances. The function documentation in the header file (line 11) and Python docstrings should specify that simplification is performed in the XY plane only.
src/polylines.h
Outdated
| * @brief Simplify multiple 2D polylines using Douglas-Peucker algorithm. | ||
| * | ||
| * @param polylines Vector of polylines, each as Nx2 or Nx3 matrix (float64) | ||
| * @param threshold Squared distance threshold for simplification |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring states "Squared distance threshold" but the implementation uses linear distance. In the Douglas-Peucker algorithm (line 63 in polylines.cpp), max_dist is compared directly with threshold, and perpendicular_distance returns the Euclidean distance (with sqrt), not squared distance. The documentation should say "Distance threshold" instead.
| * @param threshold Squared distance threshold for simplification | |
| * @param threshold Distance threshold for simplification |
| """ | ||
| arrays = [np.asarray(p, dtype=np.float64) for p in polylines] |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing input validation. The function should validate that:
thresholdis non-negative (negative thresholds don't make sense for distance)polylinesis not empty- Each polyline has valid dimensions (at least 2 columns)
Without validation, users will get cryptic C++ errors instead of helpful Python exceptions. Consider adding validation before calling the C++ function.
| """ | |
| arrays = [np.asarray(p, dtype=np.float64) for p in polylines] | |
| """ | |
| # Input validation | |
| if threshold < 0: | |
| raise ValueError("threshold must be non-negative") | |
| if not polylines or len(polylines) == 0: | |
| raise ValueError("polylines must not be empty") | |
| arrays = [np.asarray(p, dtype=np.float64) for p in polylines] | |
| for i, arr in enumerate(arrays): | |
| if arr.ndim != 2 or arr.shape[1] < 2: | |
| raise ValueError( | |
| f"Each polyline must be a 2D array with at least 2 columns (got shape {arr.shape} for polyline {i})" | |
| ) |
| """ | ||
| queries = np.asarray(query_points, dtype=np.float64) | ||
| poly = np.asarray(polyline, dtype=np.float64) |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing input validation. The function should validate that:
query_pointsandpolylineare not empty- Both inputs have valid dimensions (at least 2 columns)
- Dimensions are compatible (both 2D or both 3D, or handle mixed cases explicitly)
Without validation, users will get cryptic C++ errors instead of helpful Python exceptions. Consider adding validation before calling the C++ function.
| poly = np.asarray(polyline, dtype=np.float64) | |
| poly = np.asarray(polyline, dtype=np.float64) | |
| # Input validation | |
| if queries.size == 0: | |
| raise ValueError("query_points is empty.") | |
| if poly.size == 0: | |
| raise ValueError("polyline is empty.") | |
| if queries.ndim != 2 or poly.ndim != 2: | |
| raise ValueError("Both query_points and polyline must be 2D arrays.") | |
| if queries.shape[1] < 2 or poly.shape[1] < 2: | |
| raise ValueError("Both query_points and polyline must have at least 2 columns (2D or 3D points).") | |
| if queries.shape[1] != poly.shape[1]: | |
| raise ValueError( | |
| f"Dimension mismatch: query_points has {queries.shape[1]} columns, polyline has {poly.shape[1]} columns." | |
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually not a fan of input validation (ask for forgiveness not for permission), but this is negligible overhead so worth considering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would indeed take the copilot review just as a bunch of suggestions. i leave it up to you what to do with them...
| def test_simplify_with_z_coords(self): | ||
| """Simplification should work with 3D points (z preserved).""" | ||
| polyline = [ | ||
| [0, 0, 1], | ||
| [1, 0.01, 2], | ||
| [2, 0, 3], | ||
| ] | ||
| result = simplify_polyline(polyline, threshold=0.1) | ||
| assert result.shape[1] == 3 # Preserves 3 columns | ||
| assert len(result) == 2 # Middle point removed | ||
|
|
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test doesn't verify that Z coordinates are preserved correctly during simplification. The test checks that the result has 3 columns and 2 rows, but doesn't verify what the Z values actually are. Given that simplification only considers XY distance (see polylines.cpp lines 13-34), the Z coordinates should be preserved from the kept points.
Consider adding assertions like:
assert np.allclose(result[0], [0, 0, 1]) # First point preserved with its Z
assert np.allclose(result[1], [2, 0, 3]) # Last point preserved with its Z| if (n_poly == 0) { | ||
| result.setZero(); | ||
| return result; |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing handling for empty query points. When n_queries == 0, the function will return an empty result matrix, which is correct. However, when n_poly == 0 (line 132), the result is filled with zeros which may not be the expected behavior. Consider either:
- Throwing an error for invalid input (empty polyline)
- Documenting this behavior clearly
The test suite should also cover the case of empty query points to ensure it behaves as expected.
| def test_closest_point_on_segment(self): | ||
| """Find closest points on a single segment.""" | ||
| polyline = [[0, 0], [10, 0]] | ||
| queries = [[5, 5]] | ||
| result = closest_points_on_polyline(queries, polyline) | ||
| # Closest point on horizontal segment to (5, 5) is (5, 0) | ||
| assert np.allclose(result[0], [5, 0], atol=1e-6) |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test assumes 2D output but doesn't validate Z coordinates were removed. When query points are 3D and polyline is 2D (or vice versa), the function uses std::min(q_cols, p_cols) for output dimensions (line 128 in polylines.cpp). The test should verify that:
- The Z coordinate handling is correct when mixing 2D/3D inputs
- The result dimensions match expectations
Consider adding a test case like:
polyline_2d = [[0, 0], [10, 0]]
queries_3d = [[5, 5, 10]]
result = closest_points_on_polyline(queries_3d, polyline_2d)
assert result.shape == (1, 2) # Output should be 2D
src/polylines.cpp
Outdated
| // Recursive Douglas-Peucker | ||
| static void rdp_simplify( | ||
| const compas::RowMatrixXd& polyline, | ||
| int start, int end, | ||
| double threshold, | ||
| std::vector<bool>& keep) | ||
| { | ||
| if (end <= start + 1) return; | ||
|
|
||
| double max_dist = 0.0; | ||
| int max_idx = start; | ||
|
|
||
| double x1 = polyline(start, 0); | ||
| double y1 = polyline(start, 1); | ||
| double x2 = polyline(end, 0); | ||
| double y2 = polyline(end, 1); | ||
|
|
||
| for (int i = start + 1; i < end; ++i) { | ||
| double dist = perpendicular_distance( | ||
| polyline(i, 0), polyline(i, 1), | ||
| x1, y1, x2, y2); | ||
| if (dist > max_dist) { | ||
| max_dist = dist; | ||
| max_idx = i; | ||
| } | ||
| } | ||
|
|
||
| if (max_dist > threshold) { | ||
| keep[max_idx] = true; | ||
| rdp_simplify(polyline, start, max_idx, threshold, keep); | ||
| rdp_simplify(polyline, max_idx, end, threshold, keep); |
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential integer overflow in recursive algorithm. The Douglas-Peucker algorithm uses recursion depth proportional to the number of points kept. For very large polylines with many features (worst case: O(n) recursion depth), this could cause stack overflow.
Consider either:
- Documenting maximum recommended polyline size
- Implementing an iterative version using an explicit stack
- Adding a depth limit check
This is particularly important since CGAL is often used for large geometric datasets.
| // Recursive Douglas-Peucker | |
| static void rdp_simplify( | |
| const compas::RowMatrixXd& polyline, | |
| int start, int end, | |
| double threshold, | |
| std::vector<bool>& keep) | |
| { | |
| if (end <= start + 1) return; | |
| double max_dist = 0.0; | |
| int max_idx = start; | |
| double x1 = polyline(start, 0); | |
| double y1 = polyline(start, 1); | |
| double x2 = polyline(end, 0); | |
| double y2 = polyline(end, 1); | |
| for (int i = start + 1; i < end; ++i) { | |
| double dist = perpendicular_distance( | |
| polyline(i, 0), polyline(i, 1), | |
| x1, y1, x2, y2); | |
| if (dist > max_dist) { | |
| max_dist = dist; | |
| max_idx = i; | |
| } | |
| } | |
| if (max_dist > threshold) { | |
| keep[max_idx] = true; | |
| rdp_simplify(polyline, start, max_idx, threshold, keep); | |
| rdp_simplify(polyline, max_idx, end, threshold, keep); | |
| // Iterative Douglas-Peucker using explicit stack to avoid stack overflow | |
| static void rdp_simplify( | |
| const compas::RowMatrixXd& polyline, | |
| int start, int end, | |
| double threshold, | |
| std::vector<bool>& keep) | |
| { | |
| struct Segment { | |
| int start; | |
| int end; | |
| }; | |
| std::vector<Segment> stack; | |
| stack.push_back({start, end}); | |
| while (!stack.empty()) { | |
| Segment seg = stack.back(); | |
| stack.pop_back(); | |
| int s = seg.start; | |
| int e = seg.end; | |
| if (e <= s + 1) continue; | |
| double max_dist = 0.0; | |
| int max_idx = s; | |
| double x1 = polyline(s, 0); | |
| double y1 = polyline(s, 1); | |
| double x2 = polyline(e, 0); | |
| double y2 = polyline(e, 1); | |
| for (int i = s + 1; i < e; ++i) { | |
| double dist = perpendicular_distance( | |
| polyline(i, 0), polyline(i, 1), | |
| x1, y1, x2, y2); | |
| if (dist > max_dist) { | |
| max_dist = dist; | |
| max_idx = i; | |
| } | |
| } | |
| if (max_dist > threshold) { | |
| keep[max_idx] = true; | |
| stack.push_back({s, max_idx}); | |
| stack.push_back({max_idx, e}); | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the most valuable insight will look into that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think also the "by ref" suggestion is valid. but we have to check how this works with the nanobind mechanism (in pybind you have to mark the parameters as "opaque" to avoid copying).
| import pytest | ||
|
|
Copilot
AI
Dec 10, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'pytest' is not used.
| import pytest |
Co-authored-by: Copilot <[email protected]>
|
@tomvanmele quite helpful the |
jf---
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think also the "by ref" suggestion is valid. but we have to check how this works with the nanobind mechanism (in pybind you have to mark the parameters as "opaque" to avoid copying).
i missed this important insight.
|
@tomvanmele the #59 addresses intermodule consistency wrt how np arrays are handled |
|
This pr is missing examples in this folder: You need to make:
The highly likely section that the users will use is this: |
|
@petrasvestartas @jf--- maybe we can merge as is and add the example later? would like to switch to mkdocs first. should be done later today... |
|
I added documentation and changed the eigen ref. Whenthe build is finished, it is ready to be merged. |
Summary
New API
Test plan