-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Update fusion_attention to properly convert bfloat16 values #25404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can commit the suggested changes from lintrunner.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@@ -362,15 +367,15 @@ def create_combined_qkv_bias( | |||
name_prefix: str, | |||
) -> NodeProto | None: | |||
q_bias = self.model.get_initializer(q_add.input[1]) or self.model.get_initializer(q_add.input[0]) | |||
qb = NumpyHelper.to_array(q_bias) | |||
qb = to_array(q_bias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of replacing all of the NumpyHelper
references, can we instead update the APIs inside NumpyHelper
to use the ONNX IR? Otherwise, there may be downstream effects where some places use ir.from_proto(tensor).numpy()
and other places use NumpyHelper.to_array
.
onnxruntime/onnxruntime/python/tools/transformers/fusion_utils.py
Lines 306 to 317 in 1e5fdd1
class NumpyHelper: | |
@staticmethod | |
def to_array(tensor: TensorProto, fill_zeros: bool = False) -> ndarray: | |
# When weights are in external data format but not presented, we can still test the optimizer with two changes: | |
# (1) set fill_zeros = True (2) change load_external_data=False in optimizer.py | |
if fill_zeros: | |
return ndarray( | |
shape=tensor.dims, | |
dtype=helper.tensor_dtype_to_np_dtype(tensor.data_type), | |
) | |
return numpy_helper.to_array(tensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Justin Chu <[email protected]>
@@ -5,9 +5,10 @@ | |||
from logging import getLogger | |||
|
|||
import numpy | |||
import onnx |
Check notice
Code scanning / CodeQL
Module is imported with 'import' and 'import from' Note
Module 'onnxruntime.test.onnx' is imported with both 'import' and 'import from'.
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 6 days ago
To fix the issue, we will remove the from onnx import NodeProto, helper
statement and access NodeProto
and helper
directly using the onnx
module (e.g., onnx.NodeProto
and onnx.helper
). This approach eliminates the redundancy and ensures all references to onnx
are consistent.
Changes will be made to:
- Remove the
from onnx import NodeProto, helper
statement. - Update all occurrences of
NodeProto
andhelper
to useonnx.NodeProto
andonnx.helper
.
-
Copy modified line R11 -
Copy modified line R67 -
Copy modified line R69 -
Copy modified line R130
@@ -10,3 +10,3 @@ | ||
from numpy import array_equal, ndarray | ||
from onnx import NodeProto, helper | ||
|
||
from onnx_model import OnnxModel | ||
@@ -66,5 +66,5 @@ | ||
|
||
cast_node = helper.make_node("Cast", inputs=inputs, outputs=[output_name]) | ||
cast_node = onnx.helper.make_node("Cast", inputs=inputs, outputs=[output_name]) | ||
|
||
cast_node.attribute.extend([helper.make_attribute("to", to_type)]) | ||
cast_node.attribute.extend([onnx.helper.make_attribute("to", to_type)]) | ||
self.model.add_node(cast_node, graph_name=graph_name) | ||
@@ -129,3 +129,3 @@ | ||
|
||
def get_squeeze_or_unsqueeze_axes(self, node: NodeProto) -> ndarray | None: | ||
def get_squeeze_or_unsqueeze_axes(self, node: onnx.NodeProto) -> ndarray | None: | ||
assert node.op_type in ["Squeeze", "Unsqueeze"] |
@@ -5,9 +5,10 @@ | |||
from logging import getLogger | |||
|
|||
import numpy | |||
import onnx | |||
import onnx_ir as ir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we import it conditionally (like when the data type is bf16, fp8, fp4, int4x2, uint4x2 etc) in NumpyHelper class? In this way, user might not need to install it when they optimize models of float/fp16 data types.
I think CI pipeline need install the package. Need add it to https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/requirements/transformers-test/requirements.txt.
Also, we can add it to onnxruntime extra dependency, like add a section of "transformers" here:
Line 784 in 2911e70
extras_require = { |
No description provided.