Skip to content

Allow saving multi-dimensional ndarray with dynamic shapes #7738

@ryan-minato

Description

@ryan-minato

Feature request

I propose adding a dedicated feature to the datasets library that allows for the efficient storage and retrieval of multi-dimensional ndarray with dynamic shapes. Similar to how Image columns handle variable-sized images, this feature would provide a structured way to store array data where the dimensions are not fixed.

A possible implementation could be a new Array or Tensor feature type that stores the data in a structured format, for example,

{
  "shape": (5, 224, 224), 
  "dtype": "uint8", 
  "data": [...]
}

This would allow the datasets library to handle heterogeneous array sizes within a single column without requiring a fixed shape definition in the feature schema.

Motivation

I am currently trying to upload data from astronomical telescopes, specifically FITS files, to the Hugging Face Hub. This type of data is very similar to images but often has more than three dimensions. For example, data from the SDSS project contains five channels (u, g, r, i, z), and the pixel values can exceed 255, making the Pillow based Image feature unsuitable.

The current datasets library requires a fixed shape to be defined in the feature schema for multi-dimensional arrays, which is a major roadblock. This prevents me from saving my data, as the dimensions of the arrays can vary across different FITS files.

shape: tuple
dtype: str

A feature that supports dynamic shapes would be incredibly beneficial for the astronomy community and other fields dealing with similar high-dimensional, variable-sized data (e.g., medical imaging, scientific simulations).

Your contribution

I am willing to create a PR to help implement this feature if the proposal is accepted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions