completions is very slow #3

anyshu · 2025-04-05T15:41:11Z

I make a test for the sample code which supported on main page on A800.
I find the speed of diffusion was very slow, am I something wrong?

the prompt is

messages = [
    {"role": "user", "content": "Say hello!"}
]

here is the time cost information:

init model cost:5.800448417663574

apply chat template cost:0.014918088912963867

diffusion generate cost:33.67275953292847

Hello! How can I assist you today?

decode cost:0.002490997314453125

full cost:39.49076247215271

The text was updated successfully, but these errors were encountered:

jiacheng-ye · 2025-04-07T11:19:02Z

Hi there, the speed is related to max_new_tokens and steps. I just ran a test on one H800 GPU, and it costs 16s when setting max_new_tokens=512 and steps=512. So, I guess your speed seems reasonable considering the hardware difference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

completions is very slow #3

completions is very slow #3

anyshu commented Apr 5, 2025 •

edited

Loading

jiacheng-ye commented Apr 7, 2025

completions is very slow #3

completions is very slow #3

Comments

anyshu commented Apr 5, 2025 • edited Loading

jiacheng-ye commented Apr 7, 2025

anyshu commented Apr 5, 2025 •

edited

Loading