Skip to content

Conversation

@marksverdhei
Copy link
Contributor

What does this PR do?

This PR replaces hardcoded values num_local_experts and hidden_size in MXFP4Config for GPT-OSS type models.

I discovered this when experimenting with non-standard configs of GPT-OSS architecture but i'm pretty sure it'll break for openai/gpt-oss-120b as well since it's number of experts is different from the hardcoded value.

The quantizer hardcoded 32 experts and 2880 hidden_size in the reshape operations. This caused failures when quantizing models with different numbers of experts.

Changes:

  • Read num_local_experts and hidden_size from model.config
  • Use dynamic values in reshape operations instead of hardcoded constants
  • Defaults to 32 and 2880 for backward compatibility

This enables quantizing averaged/merged MoE models with fewer experts.
Passed all tests that I was able to run locally on 24gb of vram.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - no
  • Did you read the contributor guideline,
    Pull Request section? - yes
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case. - I looked and didn't find an issue
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings. - likely not necessary
  • Did you write any new necessary tests? - no, unsure if needed

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

marksverdhei and others added 2 commits October 22, 2025 14:59
The quantizer hardcoded 32 experts and 2880 hidden_size in the reshape
operations. This caused failures when quantizing models with different
numbers of experts (e.g., averaged single-expert models).

Changes:
- Read num_local_experts and hidden_size from model.config
- Use dynamic values in reshape operations instead of hardcoded constants
- Defaults to 32 and 2880 for backward compatibility

This enables quantizing averaged/merged MoE models with fewer experts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@Rocketknight1
Copy link
Member

cc @MekkCyber for quantization

@MekkCyber
Copy link
Contributor

run-slow: mxfp4

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: []
quantizations: ['quantization/mxfp4'] ...

Copy link
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good Thanks!

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: mxfp4

@MekkCyber
Copy link
Contributor

run-slow: mxfp4

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: []
quantizations: ['quantization/mxfp4'] ...

@marksverdhei
Copy link
Contributor Author

run-slow: mxfp4

Regarding the tests: Executing the custom container implementation failed. Please contact your self hosted runner administrator.
To me it looks like it failed because of issues with the CI infra. I wasn't able to see any logs in the gh actions logs.
Otherwise, I'm curious where I'm supposed to find the result of the actual pytest run

@MekkCyber
Copy link
Contributor

run-slow: mxfp4

@MekkCyber
Copy link
Contributor

hey @marksverdhei, yes it seems it was a docker issue, I relaunched the tests let's see if it works

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: []
quantizations: ['quantization/mxfp4'] ...

@MekkCyber
Copy link
Contributor

seems to be working 🥳

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@MekkCyber MekkCyber merged commit bb6028c into huggingface:main Oct 24, 2025
24 checks passed
i3hz pushed a commit to i3hz/transformers that referenced this pull request Oct 30, 2025
…size (huggingface#41795)

Fix MXFP4 quantizer to support variable num_local_experts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants