-
Notifications
You must be signed in to change notification settings - Fork 175
feat(expr-ir): Impl rank, with_row_index_by, over(*partition_by, order_by=...)
#3295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Will do work through theses in order: 1. `Expr.rank()` 2. `DataFrame.with_row_index(order_by=...)` 3. `Expr.over(*partition_by, order_by=...)`
Already working, but should be able to optimize by passing `order_by` to `pc.RankOptions(sort_keys=...)` instead
Adapted from #3292
Pure stubs issue
- `DataFrame.sort` is a deviation from polars, which makes this a no-op - `with_row_index` already deviates from `polars` - This just ensures we always have a column for the lazy case
(Pending existing issue in #3300)
Related, but managed to do the inverse of it 😭 pola-rs/polars#24989
Mostly happy with the options stuff now The main bit of work is rewrapping native functions/idioms (so I don't need to think about array indices every 5 seconds 😭)
Man there is sooooo much weird stuff in here
- The messy stuff has been because of gaps in the API - Filling those out some more at each level gives more to work with
100% losing my mind
dangotbanned
commented
Nov 16, 2025
The motivation for the previous version was to avoid the `descending` ignoring bug (which is now fixed!)
This can serve as a model for any other complex special casing for windows: If ordering is required, add an optional `sort_indices` parameter so they can be handled at the right time for the special case Otherwise, just let `over_ordered` handle it
Just need to add the `pyarrow` bit now
Finishes off the partial support added earlier in the PR
dangotbanned
commented
Nov 18, 2025
dangotbanned
commented
Nov 18, 2025
dangotbanned
commented
Nov 18, 2025
dangotbanned
commented
Nov 18, 2025
Comment on lines
+347
to
+371
| @pytest.fixture(scope="module") | ||
| def data_order() -> Mapping[str, list[NonNestedLiteral]]: | ||
| return { | ||
| "o1": [0, 1, 2, 3], | ||
| "o2": ["y", "y", "x", "a"], | ||
| "o3": [None, 5, 2, 5], | ||
| "o4": ["L", "M", "A", None], | ||
| "o5": [1, None, None, -1], | ||
| "v1": [12, 1, 5, 2], | ||
| "v2": ["under", "water", "unicorn", "magic"], | ||
| "v3": [5.9, 1.2, 22.9, 999.1], | ||
| } | ||
|
|
||
|
|
||
| def order_case( | ||
| columns: ValueColumn | list[ValueColumn], | ||
| aggregation: Agg, | ||
| /, | ||
| order_by: OrderColumn | Sequence[OrderColumn], | ||
| *, | ||
| descending: bool = False, | ||
| nulls_last: bool = False, | ||
| expected: NonNestedLiteral | list[NonNestedLiteral], | ||
| ) -> ParameterSet: | ||
| """Generate `Expr`s and an expected dataset for ordered aggregations. |
Member
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow-up
I'd like to use both data_order and order_case in more tests.
The idea is stemming from (#3300 (comment)).
- Add some columns prefixed with
"p"for partitions- They should be arranged to support lots of different groups
- And also across different types + nulls
- Support things which aren't
firstorlast - Support
with_columns(broadcasting behavior)
Renamed `is_column`, since it is ambiguous alongside all the other `ExprIR` guards #3295 (comment)
79 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Tracking
over()with nulls #3300Related issues
ExprIR #2572CompliantExpr#3304, feat(expr-ir): AddSeries.scatter#3305Tasks
Expr.rankArrowExpr.rank(method="average")Expr.rank().over(order_by=...)DataFrame.with_row_count(order_by=...)Expr.over(*partition_by, order_by=...)over(*partition_by, order_by=...)over(*partition_by)over(null_last=...)nulls_lastnull_last, enforcepolarsdefaultarrow.functions.scatterArrowExpr.{sort,sort_by,over_ordered}ArrowDataFrame.{sort,with_row_index_by}Expr.is_{first,last}_distinct.over(...).over(order_by=...).over(*partition_by).over(*partition_by, order_by=...).over(*partition_by, order_by=..., descending=...).over(*partition_by, order_by=..., descending=..., null_last=...)Exprin.over(*partition_by)(same rules asgroup_by(*keys))ExprExpr.is_in