Enhance benchmark_moe.py compatibility issues across vLLM versions #27541

massif-01 · 2025-10-27T01:18:27Z

This PR introduces compatibility fixes to benchmarks/kernels/benchmark_moe.py to support multiple vLLM versions and prevent runtime import/parameter errors. These errors occurred while I was optimizing Qwen3- Coder-30B-A3B-Instruct-FP8:

ImportError: cannot import name '_get_config_dtype_str'

Added a multi-level import fallback that searches possible module locations and class methods for _get_config_dtype_str and provides a fallback implementation when unavailable.
TypeError: FusedMoEQuantConfig.make() parameter incompatibility

Implemented make_quant_config_compatible() which tries multiple parameter combinations (including quant_dtype, dtype, with/without block_quant_shape) to create FusedMoEQuantConfig across versions.
TypeError: fused_experts() parameter incompatibility

Implemented fused_experts_compatible() which inspects fused_experts signature and only passes supported parameters (quant_config, allow_deep_gemm, etc.).
Signed-off-by: massif-01 [email protected]

gemini-code-assist

Code Review

This pull request introduces several compatibility wrappers to benchmark_moe.py to handle API differences across vLLM versions, which is a solid approach. The changes are well-motivated and address specific ImportError and TypeError issues encountered during benchmarking. My main feedback is to improve the robustness of one of the new wrapper functions to avoid masking potential bugs, making it consistent with other wrappers in this PR.

gemini-code-assist · 2025-10-27T01:20:21Z

benchmarks/kernels/benchmark_moe.py

+def _get_config_dtype_str_compatible(
+    dtype: torch.dtype,
+    use_fp8_w8a8: bool = False,
+    use_int8_w8a16: bool = False,
+    use_int4_w4a16: bool = False,
+    ocp_mx_scheme: str | None = None,
+) -> str | None:
+    """
+    Multi-level import fallback for _get_config_dtype_str function.
+    Returns a string used to construct the filename for tuning info.
+    """
+    try:
+        from vllm.model_executor.layers.fused_moe.config import (
+            _get_config_dtype_str as _original_func,
+        )
+
+        return _original_func(
+            dtype,
+            use_fp8_w8a8=use_fp8_w8a8,
+            use_int8_w8a16=use_int8_w8a16,
+            use_int4_w4a16=use_int4_w4a16,
+            ocp_mx_scheme=ocp_mx_scheme,
+        )
+    except (ImportError, TypeError):
+        # Fallback implementation that mimics the original function's logic
+        if use_fp8_w8a8:
+            return "fp8_w8a8"
+        elif use_int8_w8a16:
+            return "int8_w8a16"
+        elif use_int4_w4a16:
+            return "int4_w4a16"
+        elif ocp_mx_scheme is not None:
+            # For OCP MX execution simulation
+            return None
+        elif dtype == torch.float:
+            # avoiding cases where kernel fails when float32 MoE
+            # use fp16/bfloat16 configs
+            return "float32"
+        return None


Catching TypeError here is too broad and can mask other potential bugs. For instance, if _original_func is called with an incorrect type for one of its arguments, the TypeError would be silently caught, and the code would fall back to a potentially incorrect logic. A more robust approach, consistent with fused_experts_compatible in this same PR, is to use inspect.signature to dynamically build the keyword arguments and only catch ImportError. This ensures that unexpected TypeErrors are surfaced for debugging.

def _get_config_dtype_str_compatible( dtype: torch.dtype, use_fp8_w8a8: bool = False, use_int8_w8a16: bool = False, use_int4_w4a16: bool = False, ocp_mx_scheme: str | None = None, ) -> str | None: """ Multi-level import fallback for _get_config_dtype_str function. Returns a string used to construct the filename for tuning info. """ try: from vllm.model_executor.layers.fused_moe.config import ( _get_config_dtype_str as _original_func, ) sig = inspect.signature(_original_func) kwargs = {} if "use_fp8_w8a8" in sig.parameters: kwargs["use_fp8_w8a8"] = use_fp8_w8a8 if "use_int8_w8a16" in sig.parameters: kwargs["use_int8_w8a16"] = use_int8_w8a16 if "use_int4_w4a16" in sig.parameters: kwargs["use_int4_w4a16"] = use_int4_w4a16 if "ocp_mx_scheme" in sig.parameters: kwargs["ocp_mx_scheme"] = ocp_mx_scheme return _original_func(dtype, **kwargs) except ImportError: # Fallback implementation that mimics the original function's logic if use_fp8_w8a8: return "fp8_w8a8" elif use_int8_w8a16: return "int8_w8a16" elif use_int4_w4a16: return "int4_w4a16" elif ocp_mx_scheme is not None: # For OCP MX execution simulation return None elif dtype == torch.float: # avoiding cases where kernel fails when float32 MoE # use fp16/bfloat16 configs return "float32" return None

I've improved the code.

ZJY0516 · 2025-10-27T03:04:27Z

benchmarks/kernels/benchmark_moe.py

        # Ray will set ROCR_VISIBLE_DEVICES for device visibility
-        logger.warning(
-            "Ray uses ROCR_VISIBLE_DEVICES to control device accessibility."
+        print(


please use logger here

I've revised it. Please review.

This PR fixes compatibility issues in benchmarks/kernels/benchmark_moe.py to support multiple vLLM versions: 1. Fix FusedMoEQuantConfig parameter naming - Use correct 'block_shape' parameter name instead of 'block_quant_shape' - Prevents TypeError: unexpected keyword argument 2. Add compatible wrapper for _get_config_dtype_str - Uses inspect.signature for dynamic parameter detection - Only catches ImportError (not TypeError) to surface bugs properly - Robust fallback implementation 3. Add compatible wrapper for fused_experts - Dynamic signature inspection to pass only supported parameters - Handles quant_config and allow_deep_gemm parameters gracefully These fixes resolve runtime errors that occurred when running MoE benchmarks with different vLLM versions, particularly when using block quantization. Tested with: - Python syntax: passed - Ruff linter: all checks passed - Line length: all lines < 88 chars Addresses code review feedback from gemini-code-assist bot. Signed-off-by: Alfred <[email protected]>

ZJY0516 · 2025-10-27T04:38:50Z

I am not sure do we really need this compatibility.

mergify bot added the performance Performance-related issues label Oct 27, 2025

gemini-code-assist bot reviewed Oct 27, 2025

View reviewed changes

massif-01 force-pushed the fix/benchmark-moe-compatibility branch from 2527b65 to c27760b Compare October 27, 2025 01:43

ZJY0516 reviewed Oct 27, 2025

View reviewed changes

massif-01 force-pushed the fix/benchmark-moe-compatibility branch from c27760b to 2b52444 Compare October 27, 2025 03:15

massif-01 requested a review from ZJY0516 October 27, 2025 03:28

massif-01 force-pushed the fix/benchmark-moe-compatibility branch from 2b52444 to b3d1e82 Compare October 27, 2025 04:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Enhance benchmark_moe.py compatibility issues across vLLM versions #27541

Enhance benchmark_moe.py compatibility issues across vLLM versions #27541

massif-01 commented Oct 27, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 27, 2025

Uh oh!

massif-01 Oct 27, 2025

Uh oh!

ZJY0516 Oct 27, 2025

Uh oh!

massif-01 Oct 27, 2025

Uh oh!

ZJY0516 commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Enhance benchmark_moe.py compatibility issues across vLLM versions #27541

Are you sure you want to change the base?

Enhance benchmark_moe.py compatibility issues across vLLM versions #27541

Conversation

massif-01 commented Oct 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

massif-01 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

massif-01 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

ZJY0516 commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

massif-01 commented Oct 27, 2025 •

edited by github-actions bot

Loading