Run an one-off benchmark for gpt-oss #63

huydhn · 2025-08-06T08:20:19Z

On 8xH100 and 8xB200, here are the current numbers for the two models:

gpt-oss-20b

+-----------------+-------+-------+--------+
|                 | b200  | h100  | mi325x |
+-----------------+-------+-------+--------+
| gpqa (low)      | 0.568 | 0.580 | n/a    |
| gpqa (medium)   | 0.662 | 0.670 | n/a    |
| gpqa (high)     | 0.729 | 0.735 | n/a    |
+-----------------+-------+-------+--------+
| aime25 (low)    | 0.333 | 0.342 | n/a    |
| aime25 (medium) | 0.717 | 0.737 | n/a    |
| aime25 (high)   | 0.858 | 0.842 | n/a    |
+-----------------+-------+-------+--------+

gpt-oss-20b perf numbers

gpt-oss-120b

+-----------------+-------+-------+--------+
|                 | b200  | h100  | mi325x |
+-----------------+-------+-------+--------+
| gpqa (low)      | 0.653 | 0.654 | n/a    |
| gpqa (medium)   | 0.718 | 0.713 | n/a    |
| gpqa (high)     | 0.785 | 0.786 | n/a    |
+-----------------+-------+-------+--------+
| aime25 (low)    | 0.512 | 0.500 | n/a    |
| aime25 (medium) | 0.754 | 0.758 | n/a    |
| aime25 (high)   | 0.883 | 0.921 | n/a    |
+-----------------+-------+-------+--------+

gpt-oss-120b perf numbers

Overall, the accuracy scores are on the same par as https://docs.vllm.ai/projects/recipes/en/latest/OpenAI/GPT-OSS.html#accuracy-evaluation-panels for H100, but aime25 results on B200 looks like there are issues there.

Here are the links to download the raw results:

gpqa accuracy https://github.com/pytorch/pytorch-integration-testing/actions/runs/16823595241
aime25 accuracy https://github.com/pytorch/pytorch-integration-testing/actions/runs/16839593742

Signed-off-by: Huy Do <[email protected]>

yeqcharlotte · 2025-08-08T23:07:36Z

.github/scripts/gpt-oss/run_accuracy_checks.sh

+  # Not sure why this is needed on ROCm
+  pushd gpt_oss
+  # Low
+  OPENAI_API_KEY="" python3 -mevals --base-url http://localhost:8000/v1 \


what's this eval library?

That's just gpt_oss.evals calling inside gpt_oss directory. I couldn't under stand why I need to do that on ROCm and don't have ssh access to the runners to check why (they are on AMD side). My ROCm devgpu doesn't have Docker to try this out locally

Signed-off-by: Huy Do <[email protected]>

huydhn added 3 commits August 6, 2025 00:37

Add an one-off workflow to benchmark gpt-oss

bee13ba

Signed-off-by: Huy Do <[email protected]>

Fix workflow syntax

56e4451

Signed-off-by: Huy Do <[email protected]>

Add the models

19e9e9b

Signed-off-by: Huy Do <[email protected]>

meta-cla bot added the cla signed label Aug 6, 2025

huydhn had a problem deploying to pytorch-x-vllm August 6, 2025 08:20 — with GitHub Actions Error

huydhn had a problem deploying to pytorch-x-vllm August 6, 2025 08:20 — with GitHub Actions Failure

facebook-github-bot added the module: rocm label Aug 6, 2025

huydhn had a problem deploying to pytorch-x-vllm August 6, 2025 08:20 — with GitHub Actions Error

huydhn requested a deployment to pytorch-x-vllm August 6, 2025 08:20 — with GitHub Actions In progress

huydhn had a problem deploying to pytorch-x-vllm August 6, 2025 08:20 — with GitHub Actions Error

huydhn had a problem deploying to pytorch-x-vllm August 8, 2025 19:00 — with GitHub Actions Failure

Make upload benchmark results optional

d22c815

Signed-off-by: Huy Do <[email protected]>

huydhn had a problem deploying to pytorch-x-vllm August 8, 2025 20:06 — with GitHub Actions Error

Ugh

0514931

Signed-off-by: Huy Do <[email protected]>

huydhn temporarily deployed to pytorch-x-vllm August 8, 2025 20:11 — with GitHub Actions Inactive

huydhn had a problem deploying to pytorch-x-vllm August 8, 2025 20:11 — with GitHub Actions Error

huydhn temporarily deployed to pytorch-x-vllm August 8, 2025 20:11 — with GitHub Actions Inactive

huydhn had a problem deploying to pytorch-x-vllm August 8, 2025 20:11 — with GitHub Actions Error

yeqcharlotte reviewed Aug 8, 2025

View reviewed changes

Darn it, I forgot to upload the results

c2f3dc7

Signed-off-by: Huy Do <[email protected]>

huydhn had a problem deploying to pytorch-x-vllm August 8, 2025 23:17 — with GitHub Actions Error

huydhn temporarily deployed to pytorch-x-vllm August 8, 2025 23:17 — with GitHub Actions Inactive

huydhn had a problem deploying to pytorch-x-vllm August 8, 2025 23:17 — with GitHub Actions Error

huydhn had a problem deploying to pytorch-x-vllm August 8, 2025 23:17 — with GitHub Actions Failure

Let's just keep perf run then

d641866

Signed-off-by: Huy Do <[email protected]>

huydhn temporarily deployed to pytorch-x-vllm August 9, 2025 00:27 — with GitHub Actions Inactive

huydhn had a problem deploying to pytorch-x-vllm August 9, 2025 00:27 — with GitHub Actions Failure

huydhn temporarily deployed to pytorch-x-vllm August 9, 2025 00:27 — with GitHub Actions Inactive

huydhn had a problem deploying to pytorch-x-vllm August 9, 2025 00:27 — with GitHub Actions Error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run an one-off benchmark for gpt-oss #63

Run an one-off benchmark for gpt-oss #63

Uh oh!

huydhn commented Aug 6, 2025 •

edited

Loading

Uh oh!

yeqcharlotte Aug 8, 2025

Uh oh!

huydhn Aug 8, 2025

Uh oh!

Uh oh!

Run an one-off benchmark for gpt-oss #63

Are you sure you want to change the base?

Run an one-off benchmark for gpt-oss #63

Uh oh!

Conversation

huydhn commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

gpt-oss-20b

gpt-oss-120b

Uh oh!

yeqcharlotte Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

huydhn Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

huydhn commented Aug 6, 2025 •

edited

Loading