Skip to content

Conversation

ArjunJagdale
Copy link
Contributor

This PR fixes a bug in the JSON loader where columns containing float values like [0.0, 1.0, 2.0] were being implicitly coerced to int, due to pandas or Arrow type inference.

This caused issues downstream in statistics computation (e.g., dataset-viewer) where such columns were incorrectly labeled as "int" instead of "float".

🔍 What was happening:

When the JSON loader falls back to pandas_read_json() (after pa.read_json() fails), pandas/Arrow can coerce float values to integers if all values are integer-like (e.g., 0.0 == 0).

✅ What this PR does:

  • Adds a check in the fallback path of _generate_tables()
  • Ensures that columns made entirely of floats are preserved as "float64" even if they are integer-like (e.g. 0.0, 1.0)
  • This prevents loss of float semantics when creating the Arrow table

🧪 Reproducible Example:

[{"col": 0.0}, {"col": 1.0}, {"col": 2.0}]

Previously loaded as:

  • int

Now correctly loaded as:

  • float

Fixes #6937

…ke (e.g. 0.0, 1.0)

This PR fixes a bug in the JSON loader where columns containing float values like `[0.0, 1.0, 2.0]` were being implicitly coerced to `int`, due to pandas or Arrow type inference.

This caused issues downstream in statistics computation (e.g., dataset-viewer) where such columns were incorrectly labeled as `"int"` instead of `"float"`.

### 🔍 What was happening:
When the JSON loader falls back to `pandas_read_json()` (after `pa.read_json()` fails), pandas/Arrow can coerce float values to integers if all values are integer-like (e.g., `0.0 == 0`).

### ✅ What this PR does:
- Adds a check in the fallback path of `_generate_tables()`
- Ensures that columns made entirely of floats are preserved as `"float64"` even if they are integer-like (e.g. `0.0`, `1.0`)
- This prevents loss of float semantics when creating the Arrow table

### 🧪 Reproducible Example:
```json
[{"col": 0.0}, {"col": 1.0}, {"col": 2.0}]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JSON loader implicitly coerces floats to integers
1 participant