[data] support generator udf for map_groups #58039

my-vegetable-has-exploded · 2025-10-23T08:34:45Z

Description

This pr support return a generator object from map_groups UDF. if UDF have a large output , we return iterator to reduce memory cost.

Related issues

Close #57935

Additional information

This change centers on the _apply_udf_to_groups helper function within the file ray/data/grouped_data.py.

map_groups internally calls map_batches, providing a wrapper function (wrapped_fn) that in turn calls _apply_udf_to_groups to apply the user's UDF to each group.

The key modification is that instead of directly yielding the UDF's return value, the logic now inspects the result first. If the result is an Iterator, it is consumed with yield from to produce each data batch individually. If it is not an iterator, the single data batch is yielded directly, preserving the original behavior.

Signed-off-by: my-vegetable-has-exploded <[email protected]>

gemini-code-assist

Code Review

This pull request introduces a valuable enhancement by adding support for generator UDFs in map_groups. This change can significantly reduce memory consumption for UDFs that produce large outputs. The implementation in _apply_udf_to_groups is clean and correctly handles both single DataBatch and Iterator[DataBatch] return types. The new test case test_map_groups_generator_udf is comprehensive and effectively validates the new functionality. I have one minor suggestion to improve code clarity by updating a type hint to align with this new capability.

python/ray/data/grouped_data.py

cursor · 2025-10-23T08:36:57Z

Bug: UDF Type Checking Fails on Older Python Versions

The isinstance check for UDF results uses typing.Iterator, which is a generic alias not suitable for runtime type checking. This causes a TypeError on Python versions prior to 3.10, preventing the correct detection of iterator or generator objects.

Signed-off-by: my-vegetable-has-exploded <[email protected]>

cursor · 2025-10-23T08:54:40Z

Bug: Runtime Error with Typing Generics

The isinstance(result, Iterator) check uses typing.Iterator, which is a generic type alias. On Python 3.7-3.9, this raises a TypeError because typing generics aren't designed for runtime checks. This prevents UDFs returning generators from being correctly unpacked, causing map_groups to fail or produce incorrect results.

Signed-off-by: my-vegetable-has-exploded <[email protected]>

feat(data): suppeort generator udf for map_groups.

5f19efa

Signed-off-by: my-vegetable-has-exploded <[email protected]>

my-vegetable-has-exploded requested a review from a team as a code owner October 23, 2025 08:34

gemini-code-assist bot reviewed Oct 23, 2025

View reviewed changes

python/ray/data/grouped_data.py Show resolved Hide resolved

fix UDF parameter type.

9c7dc1c

Signed-off-by: my-vegetable-has-exploded <[email protected]>

ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Oct 23, 2025

fix test fixture

5993d03

Signed-off-by: my-vegetable-has-exploded <[email protected]>

This comment was marked as outdated.

Sign in to view

fix iterator type check.

18854f5

Signed-off-by: my-vegetable-has-exploded <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[data] support generator udf for map_groups #58039

[data] support generator udf for map_groups #58039

Uh oh!

my-vegetable-has-exploded commented Oct 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

cursor bot commented Oct 23, 2025

Uh oh!

cursor bot commented Oct 23, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[data] support generator udf for map_groups #58039

Are you sure you want to change the base?

[data] support generator udf for map_groups #58039

Uh oh!

Conversation

my-vegetable-has-exploded commented Oct 23, 2025

Description

Related issues

Additional information

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

cursor bot commented Oct 23, 2025

Bug: UDF Type Checking Fails on Older Python Versions

Uh oh!

cursor bot commented Oct 23, 2025

Bug: Runtime Error with Typing Generics

Uh oh!

This comment was marked as outdated.

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant