Determining the resultant metadata of a function

When wrapping other Array libraries (as happens in Dask or XArray), there is a need to determine what the result of an operation may look like in terms of its metadata. This typically happens before any real computation has begun.

For example take `a.sum(axis=0)`, we would like to determine the data type, shape, etc. for this resultant array without computing it. Currently this is done by carrying around a `a._meta` attribute with a sample array that has similar characteristics, but is much smaller and easier to operate on. This `a._meta` object is then passed to operations (like `a._meta.sum(axis=0)`) and the result is inspected to ascertain what would likely happen to the result from `a.sum(axis=0)`. This isn't perfect and some cases with UDFs can get tricky (like `apply_along_axis`). However it still works reasonably well for common use cases.

That said, it would be nice to have an API solution that was not reliant on doing these sample computations. Admittedly there may not be an easy answer to this use case, but wanted to raise it for discussion given this could be quite helpful when reasoning about applying operations to large arrays.

Note: While this comes up with Arrays, there is similar logic for DataFrames as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Determining the resultant metadata of a function #450

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Determining the resultant metadata of a function #450

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions