Skip to content

completions is very slow #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anyshu opened this issue Apr 5, 2025 · 1 comment
Open

completions is very slow #3

anyshu opened this issue Apr 5, 2025 · 1 comment

Comments

@anyshu
Copy link

anyshu commented Apr 5, 2025

I make a test for the sample code which supported on main page on A800.
I find the speed of diffusion was very slow, am I something wrong?

the prompt is

messages = [
    {"role": "user", "content": "Say hello!"}
] 

here is the time cost information:

init model cost:5.800448417663574

apply chat template cost:0.014918088912963867

diffusion generate cost:33.67275953292847

Hello! How can I assist you today?

decode cost:0.002490997314453125

full cost:39.49076247215271
@jiacheng-ye
Copy link
Contributor

Hi there, the speed is related to max_new_tokens and steps. I just ran a test on one H800 GPU, and it costs 16s when setting max_new_tokens=512 and steps=512. So, I guess your speed seems reasonable considering the hardware difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants