Skip to content

[DISCUSSION] Memory accounting model discussion #16841

@gabotechs

Description

@gabotechs

The current model used in DataFusion for measuring memory consumption assumes that the different entities that can consume memory (accumulators, joins, etc...) are the ones owning the data. This does not exactly match with how memory is managed in arrow-rs, where underlying data buffers might be referenced more than 1 time in different parts of DataFusion.

For example: ArrayAggAccumulator accumulates data by storing ArrayRefs, so even if it's accumulating and retaining data, the actual memory was allocated before ArrayAggAccumulator came into play, and the accumulator is only adding a reference to it:

For that case, one could argue ArrayAggAccumulator is not really consuming any memory, as it's not performing new allocations, and the underlying data was there potentially even before the ArrayAggAccumulator was instantiated.

There has been several attempts in the past towards addressing memory accounting issues in DataFusion code

Some imply copying/compacting just the necessary slice of data from the underlying buffer (ScalarValue::compact) so that it's actually owned by the consumer, but in certain cases that could take a hit to performance.


The main point about this issue is to start a conversation around what could be the ideal approach for memory counting:

  • Is copying/compacting accumulated data and calling get_array_memory_size() or storing array references and calling get_slice_memory_size() acceptable for measuring memory consumption?
  • Should the memory counting model in DataFusion be expanded so that it does not take into account just memory consumed, but also memory retained because of references to shared buffers

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions