Fix ValueError in pad function when tensors list is empty #4043

alhridoy · 2025-09-09T05:18:51Z

Description

Fixes #4035 - CI fails with ValueError: zero-size array to reduction operation maximum which has no identity when using use_transformers_paged=True in OnlineDPOTrainer.

Root Cause

The bug occurred in the pad function in trl/trainer/utils.py when generate_batch returns empty results, leading to an empty completion_ids list. When this empty list is passed to the pad function, numpy.max([t.shape for t in tensors], 0) fails because numpy cannot compute the maximum of an empty array.

The issue affects three trainers:

OnlineDPOTrainer
GRPOTrainer
RLOOTrainer

Solution

Implemented a two-layer fix:

1. Fixed the `pad` function to handle empty lists gracefully

def pad(tensors: list[torch.Tensor], ...):
    # Handle empty tensors list
    if not tensors:
        return torch.empty((0,), dtype=torch.int64)
    
    # Original logic continues...

2. Added defensive checks in affected trainers

Added checks to handle cases where no completions are generated, returning appropriate empty tensors instead of calling pad with empty lists.

3. Added comprehensive unit tests

Added three new test cases in tests/test_utils.py to ensure the pad function handles empty lists correctly with different parameters.

…e#4035) - Fix pad function in utils.py to handle empty tensors list gracefully - Add defensive checks in OnlineDPOTrainer, GRPOTrainer, and RLOOTrainer - Add comprehensive unit tests for empty list handling - Resolves CI failures with use_transformers_paged=True The bug occurred when generate_batch returns no outputs, leading to an empty completion_ids list. When passed to pad(), numpy.max() would fail with: 'ValueError: zero-size array to reduction operation maximum which has no identity' This fix provides two layers of protection: 1. pad() function now returns empty tensor for empty input lists 2. Trainers handle empty completion_ids gracefully before calling pad() Fixes: huggingface#4035

kashif · 2025-09-09T06:27:45Z

@alhridoy the reason its empty is due to a transformer's bug, which is now fixed in main huggingface/transformers#40692

perhaps instead of these changes we can comment out the tests for now?

albertvillanova

Thanks for your contribution, @alhridoy.

However, I agree with @kashif: this PR seems to address the consequence rather than the root cause of the underlying issue:

Why completion_ids is empty in the first place? The empty list indicates that the paged attention generation failed completely, which suggests a more fundamental problem.
With your fix, users will now get a clearer error message, but the underlying functionality (paged attention generation) remains broken. The training will still fail, just with a different error.
The original error provided a breadcrumb trail to the root cause. This fix might make it harder to diagnose the upstream issue.

@kashif if the PR you mentioned already fixes the upstream issue, I would propose to temporarily (until the patch is released) mark the failing tests so the CI becomes green. I can take care of that.

kashif · 2025-09-09T10:15:55Z

yes @albertvillanova that would be awesome if we can temp. disable the failing tests

albertvillanova reviewed Sep 9, 2025

View reviewed changes

albertvillanova mentioned this pull request Sep 9, 2025

CI hotfix: xfail test_training_with_transformers_paged #4046

Merged

albertvillanova closed this in #4046 Sep 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ValueError in pad function when tensors list is empty #4043

Fix ValueError in pad function when tensors list is empty #4043

Uh oh!

alhridoy commented Sep 9, 2025 •

edited

Loading

Uh oh!

kashif commented Sep 9, 2025

Uh oh!

albertvillanova left a comment •

edited

Loading

Uh oh!

kashif commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix ValueError in pad function when tensors list is empty #4043

Fix ValueError in pad function when tensors list is empty #4043

Uh oh!

Conversation

alhridoy commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Root Cause

Solution

1. Fixed the pad function to handle empty lists gracefully

2. Added defensive checks in affected trainers

3. Added comprehensive unit tests

Uh oh!

kashif commented Sep 9, 2025

Uh oh!

albertvillanova left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kashif commented Sep 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alhridoy commented Sep 9, 2025 •

edited

Loading

1. Fixed the `pad` function to handle empty lists gracefully

albertvillanova left a comment •

edited

Loading