Is this model meant for full bfloat16, AMP bfloat16 or no bfloat16?
#7
by
umarbutler
- opened
The paper does not make it clear.
Bump
We trained ModernBERT with amp_bf16. We'll add that detail to our next arxiv preprint update. I imagine ModernBERT will work fine with fp32, amp_bf16, or bf16. Although, the latter might need additional finetuning depending on the usecase.