Skip to content

Conversation

@dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Nov 9, 2025

Tracking

Related issues

Tasks

@dangotbanned dangotbanned added enhancement New feature or request internal labels Nov 9, 2025
- `DataFrame.sort` is a deviation from polars, which makes this a no-op
- `with_row_index` already deviates from `polars`
  - This just ensures we always have a column for the lazy case
Related, but managed to do the inverse of it 😭
pola-rs/polars#24989
Mostly happy with the options stuff now
The main bit of work is rewrapping native functions/idioms (so I don't need to think about array indices every 5 seconds 😭)
Man there is sooooo much weird stuff in here
- The messy stuff has been because of gaps in the API
- Filling those out some more at each level gives more to work with
The motivation for the previous version was to avoid the `descending` ignoring bug (which is now fixed!)
This can serve as a model for any other complex special casing for windows:

If ordering is required, add an optional `sort_indices` parameter so they can be handled at the right time for the special case
Otherwise, just let `over_ordered` handle it
Just need to add the `pyarrow` bit now
Finishes off the partial support added earlier in the PR
Comment on lines +347 to +371
@pytest.fixture(scope="module")
def data_order() -> Mapping[str, list[NonNestedLiteral]]:
return {
"o1": [0, 1, 2, 3],
"o2": ["y", "y", "x", "a"],
"o3": [None, 5, 2, 5],
"o4": ["L", "M", "A", None],
"o5": [1, None, None, -1],
"v1": [12, 1, 5, 2],
"v2": ["under", "water", "unicorn", "magic"],
"v3": [5.9, 1.2, 22.9, 999.1],
}


def order_case(
columns: ValueColumn | list[ValueColumn],
aggregation: Agg,
/,
order_by: OrderColumn | Sequence[OrderColumn],
*,
descending: bool = False,
nulls_last: bool = False,
expected: NonNestedLiteral | list[NonNestedLiteral],
) -> ParameterSet:
"""Generate `Expr`s and an expected dataset for ordered aggregations.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up

I'd like to use both data_order and order_case in more tests.

The idea is stemming from (#3300 (comment)).

  • Add some columns prefixed with "p" for partitions
    • They should be arranged to support lots of different groups
    • And also across different types + nulls
  • Support things which aren't first or last
  • Support with_columns (broadcasting behavior)

Renamed `is_column`, since it is ambiguous alongside all the other `ExprIR` guards

#3295 (comment)
@dangotbanned dangotbanned marked this pull request as ready for review November 18, 2025 19:46
@dangotbanned dangotbanned mentioned this pull request Nov 18, 2025
79 tasks
@dangotbanned dangotbanned merged commit 944fffc into oh-nodes Nov 19, 2025
34 of 35 checks passed
@dangotbanned dangotbanned deleted the expr-ir/over-partition-by-order-by branch November 19, 2025 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request internal

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants