-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Hi, @ranery , This work is very promising ,and currently I apply DiffCR to Sana, which is a linear attention-based model(https://github.com/NVlabs/Sana), and now I only test the routing with a fix compression ration for all the layer, , and after the training, the results is noisy. Can you give me some suggestion?
Here is my code (same with your pesudo code.)

Metadata
Metadata
Assignees
Labels
No labels