You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’ve found that setting max_grad_norm has no effect, and we are not clipping gradients.
For verification, I ran convergence with max_grad_norm 1e-9 and saw no difference in eval loss, and checked the unscale_and_clip_grads and the self.clip_grad is set to 0 when I printed it here.
The text was updated successfully, but these errors were encountered:
Discussed in Training WG (3/28): @itayhubara is verifying if setting this value correctly affect convergence & if this can improve convergence or reduce coefficienct of variance in RCPs.
I’ve found that setting max_grad_norm has no effect, and we are not clipping gradients.
For verification, I ran convergence with max_grad_norm 1e-9 and saw no difference in eval loss, and checked the unscale_and_clip_grads and the self.clip_grad is set to 0 when I printed it here.
The text was updated successfully, but these errors were encountered: