-
Notifications
You must be signed in to change notification settings - Fork 18
Run an one-off benchmark for gpt-oss #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
# Not sure why this is needed on ROCm | ||
pushd gpt_oss | ||
# Low | ||
OPENAI_API_KEY="" python3 -mevals --base-url http://localhost:8000/v1 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this eval library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just gpt_oss.evals
calling inside gpt_oss
directory. I couldn't under stand why I need to do that on ROCm and don't have ssh access to the runners to check why (they are on AMD side). My ROCm devgpu doesn't have Docker to try this out locally
Signed-off-by: Huy Do <[email protected]>
Signed-off-by: Huy Do <[email protected]>
On 8xH100 and 8xB200, here are the current numbers for the two models:
gpt-oss-20b
gpt-oss-20b perf numbers
gpt-oss-120b
gpt-oss-120b perf numbers
Overall, the accuracy scores are on the same par as https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html#accuracy-evaluation-panels for H100, but
aime25
results on B200 looks like there are issues there.Here are the links to download the raw results:
gpqa
accuracy https://github.com/pytorch/pytorch-integration-testing/actions/runs/16823595241aime25
accuracy https://github.com/pytorch/pytorch-integration-testing/actions/runs/16839593742