Thank you for sharing this repository. the implementation in diffcr_pseudo.py has been very helpful.
I noticed the current code provides layer-wise compression, but I couldn’t find an implementation of timestep-wise compression.
Do you plan to release the timestep-wise compression code as well?
If possible, could you also share a brief overview of how you implemented it (e.g., timestep partitioning/assignment, batch sampling strategy, any modifications to the training loop or loss aggregation, and scheduler changes)?
Even a short description or pseudocode would be greatly appreciated.
Thanks again for your excellent work and for making the code available.
Best regards,