feat(expr-ir): Impl `rank`, `with_row_index_by`, `over(*partition_by, order_by=...)` #3295

dangotbanned · 2025-11-09T21:11:56Z

Tracking

Incorrect results/errors for partitioned over() with nulls #3300

Related issues

Child of feat(RFC): A richer Expr IR #2572
Gave birth to chore(expr-ir): Fill out more of CompliantExpr #3304, feat(expr-ir): Add Series.scatter #3305

Tasks

Will do work through theses in order: 1. `Expr.rank()` 2. `DataFrame.with_row_index(order_by=...)` 3. `Expr.over(*partition_by, order_by=...)`

Already working, but should be able to optimize by passing `order_by` to `pc.RankOptions(sort_keys=...)` instead

Adapted from #3292

Pure stubs issue

Borrowed from https://github.com/pandas-dev/pandas/blob/f4851e500a43125d505db64e548af0355227714b/pandas/core/arrays/arrow/array.py#L2290-L2316

- `DataFrame.sort` is a deviation from polars, which makes this a no-op - `with_row_index` already deviates from `polars` - This just ensures we always have a column for the lazy case

(Pending existing issue in #3300)

Related, but managed to do the inverse of it 😭 pola-rs/polars#24989

Mostly happy with the options stuff now The main bit of work is rewrapping native functions/idioms (so I don't need to think about array indices every 5 seconds 😭)

Man there is sooooo much weird stuff in here

- The messy stuff has been because of gaps in the API - Filling those out some more at each level gives more to work with

100% losing my mind

Fixes (#3295 (comment)) See also (pola-rs/polars#24989 (comment))

narwhals/_plan/compliant/column.py

The motivation for the previous version was to avoid the `descending` ignoring bug (which is now fixed!)

This can serve as a model for any other complex special casing for windows: If ordering is required, add an optional `sort_indices` parameter so they can be handled at the right time for the special case Otherwise, just let `over_ordered` handle it

Just need to add the `pyarrow` bit now

Finishes off the partial support added earlier in the PR

See #3300

narwhals/_plan/arrow/expr.py

tests/plan/is_in_test.py

See pola-rs/polars#22178

Fixes #3295 (comment)

dangotbanned · 2025-11-18T17:06:47Z

tests/plan/over_test.py

+@pytest.fixture(scope="module")
+def data_order() -> Mapping[str, list[NonNestedLiteral]]:
+    return {
+        "o1": [0, 1, 2, 3],
+        "o2": ["y", "y", "x", "a"],
+        "o3": [None, 5, 2, 5],
+        "o4": ["L", "M", "A", None],
+        "o5": [1, None, None, -1],
+        "v1": [12, 1, 5, 2],
+        "v2": ["under", "water", "unicorn", "magic"],
+        "v3": [5.9, 1.2, 22.9, 999.1],
+    }
+
+
+def order_case(
+    columns: ValueColumn | list[ValueColumn],
+    aggregation: Agg,
+    /,
+    order_by: OrderColumn | Sequence[OrderColumn],
+    *,
+    descending: bool = False,
+    nulls_last: bool = False,
+    expected: NonNestedLiteral | list[NonNestedLiteral],
+) -> ParameterSet:
+    """Generate `Expr`s and an expected dataset for ordered aggregations.


Follow-up

I'd like to use both data_order and order_case in more tests.

The idea is stemming from (#3300 (comment)).

Add some columns prefixed with "p" for partitions

They should be arranged to support lots of different groups

And also across different types + nulls

Support things which aren't first or last

Support with_columns (broadcasting behavior)

Renamed `is_column`, since it is ambiguous alongside all the other `ExprIR` guards #3295 (comment)

Follow-up to (0a552cc) Related to #3316

dangotbanned added 10 commits November 9, 2025 10:55

feat(DRAFT): Start adding rank

0daaac9

Will do work through theses in order: 1. `Expr.rank()` 2. `DataFrame.with_row_index(order_by=...)` 3. `Expr.over(*partition_by, order_by=...)`

test: rank(descending=True) as well

1f406d6

test: Add placeholder for test_rank_expr_partition_by

c7bec7f

test: Add test_rank_expr_order_by

cc1caea

Already working, but should be able to optimize by passing `order_by` to `pc.RankOptions(sort_keys=...)` instead

refactor: Add RankOptions.to_arrow

b587b4e

test: Add test_rank_expr_order_by_3177

1efa41a

tidy up

b21a126

feat(DRAFT): Start adding with_row_index_by

20b6cbf

feat: Impl ArrowDataFrame.with_row_index_by

8974740

Adapted from #3292

perf: Massively simplify pyarrow<20 branch

6fe0af8

dangotbanned added enhancement New feature or request internal labels Nov 9, 2025

dangotbanned added 4 commits November 9, 2025 22:08

fix(typing): Avoid [no-any-return]

7c74c3f

Pure stubs issue

feat: Support ArrowExpr.rank(method="average")

7e82c40

Borrowed from https://github.com/pandas-dev/pandas/blob/f4851e500a43125d505db64e548af0355227714b/pandas/core/arrays/arrow/array.py#L2290-L2316

feat: Consistently raise (when needed) on empty expansions

51b336f

- `DataFrame.sort` is a deviation from polars, which makes this a no-op - `with_row_index` already deviates from `polars` - This just ensures we always have a column for the lazy case

chore: Drive-by add CompliantSeries.len

60ad2cf

dangotbanned mentioned this pull request Nov 10, 2025

Incorrect results/errors for partitioned over() with nulls #3300

Closed

dangotbanned added 11 commits November 10, 2025 16:59

feat: Mostly ready over(*partition_by, order_by=...)

43209a2

(Pending existing issue in #3300)

test: add test for respecting nulls_last

081bff1

Related, but managed to do the inverse of it 😭 pola-rs/polars#24989

fix: Always respect null_last, enforce polars default

228551c

shrinky-dink

9e82fd6

Merge branch 'oh-nodes' into expr-ir/over-partition-by-order-by

0633b1a

chore(expr-ir): Fill out more of CompliantExpr (#3304)

f21a862

refactor: Sorting cleanup part 1

411b4e9

Mostly happy with the options stuff now The main bit of work is rewrapping native functions/idioms (so I don't need to think about array indices every 5 seconds 😭)

refactor: Sorting cleanup part 2

2948a12

Man there is sooooo much weird stuff in here

refactor: Sorting cleanup part 3

560ae93

- The messy stuff has been because of gaps in the API - Filling those out some more at each level gives more to work with

test: Cover more order_by variants

cf1ae6d

refactor: Sorting cleanup part 4

a1f33f3

100% losing my mind

dangotbanned mentioned this pull request Nov 13, 2025

feat(expr-ir): Add Series.scatter #3305

Merged

feat(expr-ir): Add Series.scatter (#3305)

feaf002

feat: Support more complex expressions in over windows

2bcd3ff

Fixes (#3295 (comment)) See also (pola-rs/polars#24989 (comment))

dangotbanned commented Nov 16, 2025

View reviewed changes

narwhals/_plan/compliant/column.py Outdated Show resolved Hide resolved

dangotbanned added 13 commits November 16, 2025 20:39

remove finished todo

5eb6b78

typo

82a0867

refactor: Rename to Indices

1c8fdb4

refactor: Merge the 3x is_{first,last}_distinct impls

e9df821

The motivation for the previous version was to avoid the `descending` ignoring bug (which is now fixed!)

chore: Huge cleanup of sort/indices

b5345e4

This can serve as a model for any other complex special casing for windows: If ordering is required, add an optional `sort_indices` parameter so they can be handled at the right time for the special case Otherwise, just let `over_ordered` handle it

oooh missed a spot

c6cf386

refactor: Switch more over to functions.sort_indices

67d2003

feat: Impl is_in_seq

5ab7d9f

test: xfail is_in_expr

b3129b3

feat(DRAFT): Prep for is_in_expr

3958645

Just need to add the `pyarrow` bit now

feat: Support Expr.is_in(Expr)

ad68560

Finishes off the partial support added earlier in the PR

🧹🧹🧹

911d8d7

fix: Partial support for nulls in over(*partition_by)

0a552cc

See #3300

dangotbanned commented Nov 18, 2025

View reviewed changes

narwhals/_plan/arrow/expr.py Outdated Show resolved Hide resolved

dangotbanned commented Nov 18, 2025

View reviewed changes

narwhals/_plan/arrow/expr.py Outdated Show resolved Hide resolved

dangotbanned commented Nov 18, 2025

View reviewed changes

tests/plan/is_in_test.py Outdated Show resolved Hide resolved

dangotbanned added 2 commits November 18, 2025 14:00

docs: Note difference from upstream

e0fd01b

See pola-rs/polars#22178

feat: Give more context on MultiOutputExpressionError

a8564a4

Fixes #3295 (comment)

dangotbanned commented Nov 18, 2025

View reviewed changes

dangotbanned added 3 commits November 18, 2025 17:21

chore: Explain Incomplete

4dcd15a

refactor: Move is_seq_column guard

017e918

Renamed `is_column`, since it is ambiguous alongside all the other `ExprIR` guards #3295 (comment)

somewhat less complex

da21c44

dangotbanned marked this pull request as ready for review November 18, 2025 19:46

dangotbanned mentioned this pull request Nov 18, 2025

feat(RFC): A richer Expr IR #2572

Draft

79 tasks

dangotbanned added 2 commits November 19, 2025 12:50

test: Add tests for (#3316)

1c24115

feat: Fully support nulls in over(*partition_by)

212f1ce

Follow-up to (0a552cc) Related to #3316

dangotbanned merged commit 944fffc into oh-nodes Nov 19, 2025
34 of 35 checks passed

dangotbanned deleted the expr-ir/over-partition-by-order-by branch November 19, 2025 17:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(expr-ir): Impl `rank`, `with_row_index_by`, `over(*partition_by, order_by=...)` #3295

feat(expr-ir): Impl `rank`, `with_row_index_by`, `over(*partition_by, order_by=...)` #3295

Uh oh!

dangotbanned commented Nov 9, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dangotbanned Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(expr-ir): Impl rank, with_row_index_by, over(*partition_by, order_by=...) #3295

feat(expr-ir): Impl rank, with_row_index_by, over(*partition_by, order_by=...) #3295

Uh oh!

Conversation

dangotbanned commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tracking

Related issues

Tasks

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dangotbanned Nov 18, 2025

Choose a reason for hiding this comment

Follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(expr-ir): Impl `rank`, `with_row_index_by`, `over(*partition_by, order_by=...)` #3295

feat(expr-ir): Impl `rank`, `with_row_index_by`, `over(*partition_by, order_by=...)` #3295

dangotbanned commented Nov 9, 2025 •

edited

Loading