Skip to content

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Aug 6, 2025

On 8xH100 and 8xB200, here are the current numbers for the two models:

gpt-oss-20b

+-----------------+-------+-------+--------+
|                 | b200  | h100  | mi325x |
+-----------------+-------+-------+--------+
| gpqa (low)      | 0.568 | 0.580 | n/a    |
| gpqa (medium)   | 0.662 | 0.670 | n/a    |
| gpqa (high)     | 0.729 | 0.735 | n/a    |
+-----------------+-------+-------+--------+
| aime25 (low)    | 0.333 | 0.342 | n/a    |
| aime25 (medium) | 0.717 | 0.737 | n/a    |
| aime25 (high)   | 0.858 | 0.842 | n/a    |
+-----------------+-------+-------+--------+

gpt-oss-20b perf numbers

gpt-oss-120b

+-----------------+-------+-------+--------+
|                 | b200  | h100  | mi325x |
+-----------------+-------+-------+--------+
| gpqa (low)      | 0.653 | 0.654 | n/a    |
| gpqa (medium)   | 0.718 | 0.713 | n/a    |
| gpqa (high)     | 0.785 | 0.786 | n/a    |
+-----------------+-------+-------+--------+
| aime25 (low)    | 0.512 | 0.500 | n/a    |
| aime25 (medium) | 0.754 | 0.758 | n/a    |
| aime25 (high)   | 0.883 | 0.921 | n/a    |
+-----------------+-------+-------+--------+

gpt-oss-120b perf numbers

Overall, the accuracy scores are on the same par as https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html#accuracy-evaluation-panels for H100, but aime25 results on B200 looks like there are issues there.

Here are the links to download the raw results:

@meta-cla meta-cla bot added the cla signed label Aug 6, 2025
@huydhn huydhn requested a deployment to pytorch-x-vllm August 6, 2025 08:20 — with GitHub Actions In progress
Signed-off-by: Huy Do <[email protected]>
# Not sure why this is needed on ROCm
pushd gpt_oss
# Low
OPENAI_API_KEY="" python3 -mevals --base-url http://localhost:8000/v1 \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's this eval library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just gpt_oss.evals calling inside gpt_oss directory. I couldn't under stand why I need to do that on ROCm and don't have ssh access to the runners to check why (they are on AMD side). My ROCm devgpu doesn't have Docker to try this out locally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants