Debug self forcing autograd errors

Currently the self forcing trainer referenced in configs/dit_v4_sf.yml does not work. Attempting to run it gives weird autograd errors and none of the models parameters receive any gradients. This seems interconnected with a bug that causes illegal memory access errors when attempting cached decoding using flex attention. Further investigation is needed, will log more details when error messages come up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debug self forcing autograd errors #55

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Debug self forcing autograd errors #55

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions