Fix: Correct float feature generation in `generate_examples` #7770

Sanjaykumar030 · 2025-09-13T17:37:09Z

This PR fixes a bug in the generate_examples function where datasets.Value features with a float dtype were incorrectly generated using np.random.randint. This resulted in integer values being cast to float, which is not representative of true floating-point data.

Key changes include:

Added explicit handling for float features using np.random.rand to generate continuous values.
Introduced fail-fast type checks for unsupported dtypes to improve robustness.
Added validation for sequence features to ensure seq_shapes is provided.

Before Fix

Float features were generated incorrectly as integers cast to float:

- Example 0:
- int_feature: 0
- float_feature: 9.0  <-- Incorrect: An integer disguised as a float
- string_feature: The small grey turtle was surprisingly fast...
- seq_feature: [0.3048 0.4291 0.4283]

After Fix

Float features are now correctly generated as continuous numbers in the range [0, 1):

+ Example 0:
+ int_feature: 0
+ float_feature: 0.0183  <-- Correct: A true random float
+ string_feature: The small grey turtle was surprisingly fast...
+ seq_feature: [0.9237 0.7972 0.8526]

Note: This PR is a follow-up/fix of the previously closed PR #7769 for clarity and context.

Sanjaykumar030 · 2025-09-28T12:43:04Z

Hi @lhoestq, just a gentle follow-up on this PR.

fix: correct float feature generation in generate_examples

0767262

Sanjaykumar030 changed the title ~~Fix: Correct float feature generation in generate_examples #7769~~ Fix: Correct float feature generation in generate_examples Sep 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Correct float feature generation in `generate_examples` #7770

Fix: Correct float feature generation in `generate_examples` #7770

Uh oh!

Sanjaykumar030 commented Sep 13, 2025 •

edited

Loading

Uh oh!

Sanjaykumar030 commented Sep 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: Correct float feature generation in generate_examples #7770

Are you sure you want to change the base?

Fix: Correct float feature generation in generate_examples #7770

Uh oh!

Conversation

Sanjaykumar030 commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Before Fix

After Fix

Note: This PR is a follow-up/fix of the previously closed PR #7769 for clarity and context.

Uh oh!

Sanjaykumar030 commented Sep 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix: Correct float feature generation in `generate_examples` #7770

Fix: Correct float feature generation in `generate_examples` #7770

Sanjaykumar030 commented Sep 13, 2025 •

edited

Loading