Skip to content

Update fusion_attention to properly convert bfloat16 values #25404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

justinchuby
Copy link
Contributor

No description provided.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

justinchuby and others added 3 commits July 15, 2025 11:59
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@@ -362,15 +367,15 @@ def create_combined_qkv_bias(
name_prefix: str,
) -> NodeProto | None:
q_bias = self.model.get_initializer(q_add.input[1]) or self.model.get_initializer(q_add.input[0])
qb = NumpyHelper.to_array(q_bias)
qb = to_array(q_bias)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of replacing all of the NumpyHelper references, can we instead update the APIs inside NumpyHelper to use the ONNX IR? Otherwise, there may be downstream effects where some places use ir.from_proto(tensor).numpy() and other places use NumpyHelper.to_array.

class NumpyHelper:
@staticmethod
def to_array(tensor: TensorProto, fill_zeros: bool = False) -> ndarray:
# When weights are in external data format but not presented, we can still test the optimizer with two changes:
# (1) set fill_zeros = True (2) change load_external_data=False in optimizer.py
if fill_zeros:
return ndarray(
shape=tensor.dims,
dtype=helper.tensor_dtype_to_np_dtype(tensor.data_type),
)
return numpy_helper.to_array(tensor)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Justin Chu <[email protected]>
@@ -5,9 +5,10 @@
from logging import getLogger

import numpy
import onnx

Check notice

Code scanning / CodeQL

Module is imported with 'import' and 'import from' Note

Module 'onnx' is imported with both 'import' and 'import from'.
Module 'onnxruntime.test.onnx' is imported with both 'import' and 'import from'.

Copilot Autofix

AI 6 days ago

To fix the issue, we will remove the from onnx import NodeProto, helper statement and access NodeProto and helper directly using the onnx module (e.g., onnx.NodeProto and onnx.helper). This approach eliminates the redundancy and ensures all references to onnx are consistent.

Changes will be made to:

  1. Remove the from onnx import NodeProto, helper statement.
  2. Update all occurrences of NodeProto and helper to use onnx.NodeProto and onnx.helper.

Suggested changeset 1
onnxruntime/python/tools/transformers/fusion_utils.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/onnxruntime/python/tools/transformers/fusion_utils.py b/onnxruntime/python/tools/transformers/fusion_utils.py
--- a/onnxruntime/python/tools/transformers/fusion_utils.py
+++ b/onnxruntime/python/tools/transformers/fusion_utils.py
@@ -10,3 +10,3 @@
 from numpy import array_equal, ndarray
-from onnx import NodeProto, helper
+
 from onnx_model import OnnxModel
@@ -66,5 +66,5 @@
 
-        cast_node = helper.make_node("Cast", inputs=inputs, outputs=[output_name])
+        cast_node = onnx.helper.make_node("Cast", inputs=inputs, outputs=[output_name])
 
-        cast_node.attribute.extend([helper.make_attribute("to", to_type)])
+        cast_node.attribute.extend([onnx.helper.make_attribute("to", to_type)])
         self.model.add_node(cast_node, graph_name=graph_name)
@@ -129,3 +129,3 @@
 
-    def get_squeeze_or_unsqueeze_axes(self, node: NodeProto) -> ndarray | None:
+    def get_squeeze_or_unsqueeze_axes(self, node: onnx.NodeProto) -> ndarray | None:
         assert node.op_type in ["Squeeze", "Unsqueeze"]
EOF
@@ -10,3 +10,3 @@
from numpy import array_equal, ndarray
from onnx import NodeProto, helper

from onnx_model import OnnxModel
@@ -66,5 +66,5 @@

cast_node = helper.make_node("Cast", inputs=inputs, outputs=[output_name])
cast_node = onnx.helper.make_node("Cast", inputs=inputs, outputs=[output_name])

cast_node.attribute.extend([helper.make_attribute("to", to_type)])
cast_node.attribute.extend([onnx.helper.make_attribute("to", to_type)])
self.model.add_node(cast_node, graph_name=graph_name)
@@ -129,3 +129,3 @@

def get_squeeze_or_unsqueeze_axes(self, node: NodeProto) -> ndarray | None:
def get_squeeze_or_unsqueeze_axes(self, node: onnx.NodeProto) -> ndarray | None:
assert node.op_type in ["Squeeze", "Unsqueeze"]
Copilot is powered by AI and may make mistakes. Always verify output.
Unable to commit as this autofix suggestion is now outdated
@@ -5,9 +5,10 @@
from logging import getLogger

import numpy
import onnx
import onnx_ir as ir
Copy link
Contributor

@tianleiwu tianleiwu Jul 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we import it conditionally (like when the data type is bf16, fp8, fp4, int4x2, uint4x2 etc) in NumpyHelper class? In this way, user might not need to install it when they optimize models of float/fp16 data types.

I think CI pipeline need install the package. Need add it to https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/requirements/transformers-test/requirements.txt.

Also, we can add it to onnxruntime extra dependency, like add a section of "transformers" here:

extras_require = {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants