verification of fp8 precision #1367

kraza8 · 2024-12-12T03:08:51Z

kraza8
Dec 12, 2024

Hi, I am using this code for HF transformer model inference. Is there is way I can actually verify that the model is using fp8 precision like print out debug logs or something showing fp8 precision?

fp8_recipe = recipe.DelayedScaling(fp8_format=recipe.Format.E4M3)
def generate_fp8(prompt):
with te.fp8_autocast(enabled=True, fp8_recipe=fp8_recipe):
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
print(f"Input tensor dtype: {inputs['input_ids'].dtype}") # this shows dtype: torch.int64
outputs = model.generate(**inputs, max_length=50)

return tokenizer.decode(outputs[0], skip_special_tokens=True)

thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

verification of fp8 precision #1367

{{title}}

Replies: 0 comments

Select a reply

verification of fp8 precision #1367

kraza8 Dec 12, 2024

Replies: 0 comments

kraza8
Dec 12, 2024