Add TREAD: https://arxiv.org/pdf/2501.04765
This should be a modification to the DiT class from a config parameter that defaults to None. If param is given, use it to allow tokens to be skipped.
Unclear whether or not this needs to affect the attention layers or if we can just have this be used only for the MLPs
Add TREAD: https://arxiv.org/pdf/2501.04765
This should be a modification to the DiT class from a config parameter that defaults to None. If param is given, use it to allow tokens to be skipped.
Unclear whether or not this needs to affect the attention layers or if we can just have this be used only for the MLPs