Currently the self forcing trainer referenced in configs/dit_v4_sf.yml does not work. Attempting to run it gives weird autograd errors and none of the models parameters receive any gradients. This seems interconnected with a bug that causes illegal memory access errors when attempting cached decoding using flex attention. Further investigation is needed, will log more details when error messages come up.
Currently the self forcing trainer referenced in configs/dit_v4_sf.yml does not work. Attempting to run it gives weird autograd errors and none of the models parameters receive any gradients. This seems interconnected with a bug that causes illegal memory access errors when attempting cached decoding using flex attention. Further investigation is needed, will log more details when error messages come up.