Skip to content

Conversation

aestriplex
Copy link

This PR implements a first draft of the data structures required to implement sparse tensors support in ACL.
Here are three points worth discussing:

  1. In the COOTensor and CSRTensor classes, I implemented two different strategies for saving indexes. Specifically, in COO tensors, indexes are saved as std::vector (i.e., not in the same buffer provided by the tensor allocator, which is used by values instead). In CSR tensors, on the other hand, I stored them in the allocator buffer and saved only the corresponding (integer) offsets as class members. The method used in CSR is more efficient when working with the tensor (e.g., in a kernel/operator), so I thought I would extend it to the COO tensor as well. Are there any problems if this approach is too “low level”?
  2. The test class should be converted into a Fixture.
  3. As Pablo said in his comment on my design document in the mailing list, we need to make sure that the validation of current operators fails when using sparse tensors. A good idea, in my opinion, would be to create a “validate” method in ICpuOperator - and other kind of base operators -, so that it can be called within the validate of classes that implement this interface and do not (yet) support sparse tensors (e.g., CpuGemmConv2d). This method, i.e., not performing the direct check !src->info()->is_sparse(), allows these checks to be extended in the future without having to modify all operators.

COOTensor init
CSRTensor init
added SparseTensorAllocator
moved SparseTensor to arm_compute/core
fixed to_dense method for COOTensor
fixed to_sparse and to_dense methods for CSRTensor
changed CSR indices; print function moved to the bae class
implemented get_value to CSR and COO tensors
added SparseIterator

Change-Id: Id5ed095587662b750b89cbabd600a286ab9bf53b
*/
COOTensor(const ITensor *tensor);

std::vector<Coordinates> _indices;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See point 1 in the PR description

*/
CSRTensor(const ITensor *tensor);

size_t _crow_bytes; /**< Row index size in bytes */
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See point 1 in the PR description

Copy link

@morgolock morgolock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @aestriplex

I had a quick look at this and I think it's a good starting point. Please have a look and let me know your thoughts.

Comment on lines +238 to +262
#ifdef ARM_COMPUTE_ASSERTS_ENABLED
void COOTensor::print(std::ostream &os) const
{
const uint8_t *data = static_cast<const uint8_t *>(buffer());

if(_indices.empty())
{
os << "index: [] values: []" << std::endl;
return;
}

for(size_t i = 0; i < _indices.size(); ++i)
{
const Coordinates &coord = _indices[i];
os << "index: [";
for (size_t j = 0; j < coord.num_dimensions(); ++j)
{
os << coord[j];
if (j < coord.num_dimensions() - 1) os << ", ";
}
os << "] values: ";
print_values(os, data, i, dense_volume(sparse_dim()));
}
}
#endif // ARM_COMPUTE_ASSERTS_ENABLED
Copy link

@morgolock morgolock Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have a print method in https://github.com/ARM-software/ComputeLibrary/blob/main/src/core/ITensor.cpp

We should move this in void ITensor::print(std::ostream &s, IOFormatInfo io_fmt) const

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could move both this print function and CSR's to ITensor. In particular, something like this

void ITensor::print(std::ostream &s, IOFormatInfo io_fmt) const
{
    // initialize variables and set precision
       ...
    switch (this->info()->tensor_format())
    {
        case TensorFormat::Dense:
            print_dense(s, io_fmt);
            break;
        case TensorFormat::CSR:
            print_csr(s);
            break;
        case TensorFormat::COO:
            print_coo(s);
            break;
        default:
            ARM_COMPUTE_ERROR("Unsupported tensor format");
            break;
    }

    // Restore output stream flags
    s.copyfmt(stream_status);
}

where print_coo and print_csr are the two functions moved from the two final classes.

The main problem is that those functions use some internals of the two classes, for example _indices, _crow_bytes, _col_bytes. It seems to me that having those functions defined in the base class can be misleading. Wouldn't it be cleaner if we have those two functions in their proper classes? That way, I should add the parameter IOFormatInfo io_fmt to both, so we have exactly the same signature as in ITensor, i.e. they overlap perfectly the base class method.

Comment on lines +243 to +270
void CSRTensor::print(std::ostream &os) const
{
const uint8_t *_row_offsets = _allocator.data();
const uint8_t *_col_indices = _allocator.data() + _crow_bytes;
const uint8_t *values = _allocator.data() + _crow_bytes + _col_bytes;

os << "r_offsets: [";
for(size_t i = 0; i < _crow_bytes / index_size; ++i)
{
os << *reinterpret_cast<const int32_t *>(_row_offsets + (i * index_size));
if (i < _crow_bytes / index_size - 1)
{
os << ", ";
}
}
os << "] cols: [";
for(size_t i = 0; i < _col_bytes / index_size; ++i)
{
os << *reinterpret_cast<const int32_t *>(_col_indices + (i * index_size));
if (i < _col_bytes / index_size - 1)
{
os << ", ";
}
}
os << "] values: ";
print_values(os, values, 0, nnz());
}
#endif // ARM_COMPUTE_ASSERTS_ENABLED

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to existing void ITensor::print(std::ostream &s, IOFormatInfo io_fmt) const


namespace arm_compute
{
SparseTensorAllocator::SparseTensorAllocator(IMemoryManageable *owner) : _owner(owner), _associated_memory_group(nullptr), _memory(), _values_bytes(0), _indices_bytes(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the existing TensorAllocator to allocate any buffers.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought a lot about this one. The main reason I created a new class is that the original one (i.e. TensorAllocator) allocates the memory buffer using the TensorShape contained in TensorInfo. In the case of sparse tensors, however, the amount of memory that we need to allocate is smaller than that obtained by doing dim_1 x dim_2 x ... x dim_n, because otherwise there would be no advantage at all in using sparse over dense tensors.
Since I didn't want to touch the allocation mechanism of TensorAllocator — on which the memory allocation of all other tensors is based — I thought it would be wiser to create a special class.
It has the same internal mechanism (i.e. it inherits from ITensorAllocator, and it uses Memory, MemoryGroup and IMemoryManageable), but it exposes this init method

void init(const TensorInfo &input, size_t values_bytes, size_t indices_bytes, size_t alignment = 0);

Change-Id: I3a3fca2b842702672c915e1357602bf0fe47bd47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants