When the tensor arena requirements of a given model are larger than the available SRAM, the tensor arena has to be placed in external RAM, leaving performance on the table due to being unable to use SRAM as scratch memory at all.
I created tensorflow/tflite-micro#2627 to ask TFLM to support this use case, but it doesn't seem to be happening any time soon. However, according to a collaborator, it's already possible to split the tensor arena into persistent/non-persistent arenas.
It seems that in order to support this use case, we would need to add this functionality to xformer. I can do it myself if I get some guidance.
This would allow applications to place the non-persistent arena in SRAM and the persistent arena in external RAM, or vice versa, which would allow models to perform better on the xcore.ai platform.