Fix MXFP4 quantizer to support variable num_local_experts and hidden_size #41795

marksverdhei · 2025-10-22T16:29:55Z

What does this PR do?

This PR replaces hardcoded values num_local_experts and hidden_size in MXFP4Config for GPT-OSS type models.

I discovered this when experimenting with non-standard configs of GPT-OSS architecture but i'm pretty sure it'll break for openai/gpt-oss-120b as well since it's number of experts is different from the hardcoded value.

The quantizer hardcoded 32 experts and 2880 hidden_size in the reshape operations. This caused failures when quantizing models with different numbers of experts.

Changes:

Read num_local_experts and hidden_size from model.config
Use dynamic values in reshape operations instead of hardcoded constants
Defaults to 32 and 2880 for backward compatibility

This enables quantizing averaged/merged MoE models with fewer experts.
Passed all tests that I was able to run locally on 24gb of vram.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - no
Did you read the contributor guideline,
Pull Request section? - yes
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. - I looked and didn't find an issue
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings. - likely not necessary
Did you write any new necessary tests? - no, unsure if needed

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

The quantizer hardcoded 32 experts and 2880 hidden_size in the reshape operations. This caused failures when quantizing models with different numbers of experts (e.g., averaged single-expert models). Changes: - Read num_local_experts and hidden_size from model.config - Use dynamic values in reshape operations instead of hardcoded constants - Defaults to 32 and 2880 for backward compatibility This enables quantizing averaged/merged MoE models with fewer experts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Rocketknight1 · 2025-10-23T12:20:46Z

cc @MekkCyber for quantization

MekkCyber · 2025-10-23T13:15:15Z

run-slow: mxfp4

github-actions · 2025-10-23T13:16:43Z

This comment contains run-slow, running the specified jobs:

models: []
quantizations: ['quantization/mxfp4'] ...

MekkCyber

Sounds good Thanks!

github-actions · 2025-10-23T13:27:46Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: mxfp4

MekkCyber · 2025-10-23T14:15:27Z

run-slow: mxfp4

github-actions · 2025-10-23T14:16:55Z

This comment contains run-slow, running the specified jobs:

models: []
quantizations: ['quantization/mxfp4'] ...

marksverdhei · 2025-10-23T15:00:12Z

run-slow: mxfp4

Regarding the tests: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
To me it looks like it failed because of issues with the CI infra. I wasn't able to see any logs in the gh actions logs.
Otherwise, I'm curious where I'm supposed to find the result of the actual pytest run

MekkCyber · 2025-10-24T11:37:41Z

run-slow: mxfp4

MekkCyber · 2025-10-24T11:38:08Z

hey @marksverdhei, yes it seems it was a docker issue, I relaunched the tests let's see if it works

github-actions · 2025-10-24T11:39:09Z

This comment contains run-slow, running the specified jobs:

models: []
quantizations: ['quantization/mxfp4'] ...

MekkCyber · 2025-10-24T11:46:17Z

seems to be working 🥳

HuggingFaceDocBuilderDev · 2025-10-24T11:58:04Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…size (huggingface#41795) Fix MXFP4 quantizer to support variable num_local_experts

marksverdhei and others added 2 commits October 22, 2025 14:59

Merge branch 'main' into fix-mxfp4-hardcodes

efc0a0f

MekkCyber approved these changes Oct 23, 2025

View reviewed changes

Merge branch 'main' into fix-mxfp4-hardcodes

a780089

MekkCyber merged commit bb6028c into huggingface:main Oct 24, 2025
24 checks passed

i3hz pushed a commit to i3hz/transformers that referenced this pull request Oct 30, 2025

Fix MXFP4 quantizer to support variable num_local_experts and hidden_…

e0e1b50

…size (huggingface#41795) Fix MXFP4 quantizer to support variable num_local_experts

Fix MXFP4 quantizer to support variable num_local_experts and hidden_size #41795

Fix MXFP4 quantizer to support variable num_local_experts and hidden_size #41795

Uh oh!

Conversation

marksverdhei commented Oct 22, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Oct 23, 2025

Uh oh!

MekkCyber commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

MekkCyber commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

marksverdhei commented Oct 23, 2025

Uh oh!

MekkCyber commented Oct 24, 2025

Uh oh!

MekkCyber commented Oct 24, 2025

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

MekkCyber commented Oct 24, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Oct 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants