Update fusion_attention to properly convert bfloat16 values #25404

justinchuby · 2025-07-15T18:49:22Z

No description provided.

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/python/tools/transformers/fusion_attention.py

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

kunal-vaishnavi · 2025-07-15T22:58:40Z

onnxruntime/python/tools/transformers/fusion_attention.py

@@ -362,15 +367,15 @@ def create_combined_qkv_bias(
        name_prefix: str,
    ) -> NodeProto | None:
        q_bias = self.model.get_initializer(q_add.input[1]) or self.model.get_initializer(q_add.input[0])
-        qb = NumpyHelper.to_array(q_bias)
+        qb = to_array(q_bias)


Instead of replacing all of the NumpyHelper references, can we instead update the APIs inside NumpyHelper to use the ONNX IR? Otherwise, there may be downstream effects where some places use ir.from_proto(tensor).numpy() and other places use NumpyHelper.to_array.

onnxruntime/onnxruntime/python/tools/transformers/fusion_utils.py

Lines 306 to 317 in 1e5fdd1

class NumpyHelper:

@staticmethod

def to_array(tensor: TensorProto, fill_zeros: bool = False) -> ndarray:

# When weights are in external data format but not presented, we can still test the optimizer with two changes:

# (1) set fill_zeros = True (2) change load_external_data=False in optimizer.py

if fill_zeros:

return ndarray(

shape=tensor.dims,

dtype=helper.tensor_dtype_to_np_dtype(tensor.data_type),

)

return numpy_helper.to_array(tensor)

Signed-off-by: Justin Chu <[email protected]>

onnxruntime/python/tools/transformers/fusion_utils.py

@@ -5,9 +5,10 @@
 from logging import getLogger

 import numpy
+import onnx


To fix the issue, we will remove the from onnx import NodeProto, helper statement and access NodeProto and helper directly using the onnx module (e.g., onnx.NodeProto and onnx.helper). This approach eliminates the redundancy and ensures all references to onnx are consistent.

Changes will be made to:

Remove the from onnx import NodeProto, helper statement.

Update all occurrences of NodeProto and helper to use onnx.NodeProto and onnx.helper.

tianleiwu · 2025-07-17T16:59:14Z

onnxruntime/python/tools/transformers/fusion_utils.py

@@ -5,9 +5,10 @@
 from logging import getLogger

 import numpy
+import onnx
+import onnx_ir as ir


Could we import it conditionally (like when the data type is bf16, fp8, fp4, int4x2, uint4x2 etc) in NumpyHelper class? In this way, user might not need to install it when they optimize models of float/fp16 data types.

I think CI pipeline need install the package. Need add it to https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/requirements/transformers-test/requirements.txt.

Also, we can add it to onnxruntime extra dependency, like add a section of "transformers" here:

onnxruntime/setup.py

Line 784 in 2911e70

extras_require = {

Update fusion_attention to properly convert bfloat16 values

ef171ab

justinchuby requested a review from kunal-vaishnavi July 15, 2025 18:50

github-actions bot reviewed Jul 15, 2025

View reviewed changes

onnxruntime/python/tools/transformers/fusion_attention.py Outdated Show resolved Hide resolved

onnxruntime/python/tools/transformers/fusion_attention.py Outdated Show resolved Hide resolved

justinchuby and others added 3 commits July 15, 2025 11:59

Update onnxruntime/python/tools/transformers/fusion_attention.py

bee2827

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update onnxruntime/python/tools/transformers/fusion_attention.py

4df0d84

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Update fusion_base.py

6715489

kunal-vaishnavi reviewed Jul 15, 2025

View reviewed changes

Fix

7a32449

Signed-off-by: Justin Chu <[email protected]>

github-advanced-security bot found potential problems Jul 15, 2025

View reviewed changes

tianleiwu reviewed Jul 17, 2025

View reviewed changes

@@ -10,3 +10,3 @@
             from numpy import array_equal, ndarray
-            from onnx import NodeProto, helper
             from onnx_model import OnnxModel
@@ -66,5 +66,5 @@
-                    cast_node = helper.make_node("Cast", inputs=inputs, outputs=[output_name])
+                    cast_node = onnx.helper.make_node("Cast", inputs=inputs, outputs=[output_name])
-                    cast_node.attribute.extend([helper.make_attribute("to", to_type)])
+                    cast_node.attribute.extend([onnx.helper.make_attribute("to", to_type)])
                     self.model.add_node(cast_node, graph_name=graph_name)
@@ -129,3 +129,3 @@
-                def get_squeeze_or_unsqueeze_axes(self, node: NodeProto) -> ndarray | None:
+                def get_squeeze_or_unsqueeze_axes(self, node: onnx.NodeProto) -> ndarray | None:
                     assert node.op_type in ["Squeeze", "Unsqueeze"]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update fusion_attention to properly convert bfloat16 values #25404

Update fusion_attention to properly convert bfloat16 values #25404

justinchuby commented Jul 15, 2025

Uh oh!

github-actions bot left a comment

Uh oh!

Uh oh!

Uh oh!

kunal-vaishnavi Jul 15, 2025

Uh oh!

justinchuby Jul 15, 2025

Uh oh!

Check notice

Copilot Autofix

tianleiwu Jul 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

	class NumpyHelper:
	@staticmethod
	def to_array(tensor: TensorProto, fill_zeros: bool = False) -> ndarray:
	# When weights are in external data format but not presented, we can still test the optimizer with two changes:
	# (1) set fill_zeros = True (2) change load_external_data=False in optimizer.py
	if fill_zeros:
	return ndarray(
	shape=tensor.dims,
	dtype=helper.tensor_dtype_to_np_dtype(tensor.data_type),
	)

	return numpy_helper.to_array(tensor)

Update fusion_attention to properly convert bfloat16 values #25404

Are you sure you want to change the base?

Update fusion_attention to properly convert bfloat16 values #25404

Conversation

justinchuby commented Jul 15, 2025

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kunal-vaishnavi Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

justinchuby Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Check notice

Copilot Autofix

tianleiwu Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tianleiwu Jul 17, 2025 •

edited

Loading