-
Notifications
You must be signed in to change notification settings - Fork 803
Sparse tensor data structures #1186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
COOTensor init CSRTensor init added SparseTensorAllocator moved SparseTensor to arm_compute/core fixed to_dense method for COOTensor fixed to_sparse and to_dense methods for CSRTensor changed CSR indices; print function moved to the bae class implemented get_value to CSR and COO tensors added SparseIterator Change-Id: Id5ed095587662b750b89cbabd600a286ab9bf53b
*/ | ||
COOTensor(const ITensor *tensor); | ||
|
||
std::vector<Coordinates> _indices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See point 1 in the PR description
*/ | ||
CSRTensor(const ITensor *tensor); | ||
|
||
size_t _crow_bytes; /**< Row index size in bytes */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See point 1 in the PR description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aestriplex
I had a quick look at this and I think it's a good starting point. Please have a look and let me know your thoughts.
#ifdef ARM_COMPUTE_ASSERTS_ENABLED | ||
void COOTensor::print(std::ostream &os) const | ||
{ | ||
const uint8_t *data = static_cast<const uint8_t *>(buffer()); | ||
|
||
if(_indices.empty()) | ||
{ | ||
os << "index: [] values: []" << std::endl; | ||
return; | ||
} | ||
|
||
for(size_t i = 0; i < _indices.size(); ++i) | ||
{ | ||
const Coordinates &coord = _indices[i]; | ||
os << "index: ["; | ||
for (size_t j = 0; j < coord.num_dimensions(); ++j) | ||
{ | ||
os << coord[j]; | ||
if (j < coord.num_dimensions() - 1) os << ", "; | ||
} | ||
os << "] values: "; | ||
print_values(os, data, i, dense_volume(sparse_dim())); | ||
} | ||
} | ||
#endif // ARM_COMPUTE_ASSERTS_ENABLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have a print method in https://github.com/ARM-software/ComputeLibrary/blob/main/src/core/ITensor.cpp
We should move this in void ITensor::print(std::ostream &s, IOFormatInfo io_fmt) const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could move both this print function and CSR's to ITensor. In particular, something like this
void ITensor::print(std::ostream &s, IOFormatInfo io_fmt) const
{
// initialize variables and set precision
...
switch (this->info()->tensor_format())
{
case TensorFormat::Dense:
print_dense(s, io_fmt);
break;
case TensorFormat::CSR:
print_csr(s);
break;
case TensorFormat::COO:
print_coo(s);
break;
default:
ARM_COMPUTE_ERROR("Unsupported tensor format");
break;
}
// Restore output stream flags
s.copyfmt(stream_status);
}
where print_coo
and print_csr
are the two functions moved from the two final classes.
The main problem is that those functions use some internals of the two classes, for example _indices, _crow_bytes, _col_bytes. It seems to me that having those functions defined in the base class can be misleading. Wouldn't it be cleaner if we have those two functions in their proper classes? That way, I should add the parameter IOFormatInfo io_fmt
to both, so we have exactly the same signature as in ITensor, i.e. they overlap perfectly the base class method.
void CSRTensor::print(std::ostream &os) const | ||
{ | ||
const uint8_t *_row_offsets = _allocator.data(); | ||
const uint8_t *_col_indices = _allocator.data() + _crow_bytes; | ||
const uint8_t *values = _allocator.data() + _crow_bytes + _col_bytes; | ||
|
||
os << "r_offsets: ["; | ||
for(size_t i = 0; i < _crow_bytes / index_size; ++i) | ||
{ | ||
os << *reinterpret_cast<const int32_t *>(_row_offsets + (i * index_size)); | ||
if (i < _crow_bytes / index_size - 1) | ||
{ | ||
os << ", "; | ||
} | ||
} | ||
os << "] cols: ["; | ||
for(size_t i = 0; i < _col_bytes / index_size; ++i) | ||
{ | ||
os << *reinterpret_cast<const int32_t *>(_col_indices + (i * index_size)); | ||
if (i < _col_bytes / index_size - 1) | ||
{ | ||
os << ", "; | ||
} | ||
} | ||
os << "] values: "; | ||
print_values(os, values, 0, nnz()); | ||
} | ||
#endif // ARM_COMPUTE_ASSERTS_ENABLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to existing void ITensor::print(std::ostream &s, IOFormatInfo io_fmt) const
|
||
namespace arm_compute | ||
{ | ||
SparseTensorAllocator::SparseTensorAllocator(IMemoryManageable *owner) : _owner(owner), _associated_memory_group(nullptr), _memory(), _values_bytes(0), _indices_bytes(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use the existing TensorAllocator
to allocate any buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought a lot about this one. The main reason I created a new class is that the original one (i.e. TensorAllocator
) allocates the memory buffer using the TensorShape
contained in TensorInfo
. In the case of sparse tensors, however, the amount of memory that we need to allocate is smaller than that obtained by doing dim_1 x dim_2 x ... x dim_n
, because otherwise there would be no advantage at all in using sparse over dense tensors.
Since I didn't want to touch the allocation mechanism of TensorAllocator
— on which the memory allocation of all other tensors is based — I thought it would be wiser to create a special class.
It has the same internal mechanism (i.e. it inherits from ITensorAllocator
, and it uses Memory
, MemoryGroup
and IMemoryManageable
), but it exposes this init method
void init(const TensorInfo &input, size_t values_bytes, size_t indices_bytes, size_t alignment = 0);
This PR implements a first draft of the data structures required to implement sparse tensors support in ACL.
Here are three points worth discussing:
std::vector
(i.e., not in the same buffer provided by the tensor allocator, which is used by values instead). In CSR tensors, on the other hand, I stored them in the allocator buffer and saved only the corresponding (integer) offsets as class members. The method used in CSR is more efficient when working with the tensor (e.g., in a kernel/operator), so I thought I would extend it to the COO tensor as well. Are there any problems if this approach is too “low level”?ICpuOperator
- and other kind of base operators -, so that it can be called within the validate of classes that implement this interface and do not (yet) support sparse tensors (e.g., CpuGemmConv2d). This method, i.e., not performing the direct check!src->info()->is_sparse()
, allows these checks to be extended in the future without having to modify all operators.