Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some simplification patterns can create invalid IR due to type changes #2634

Open
christopherbate opened this issue Nov 21, 2024 · 2 comments

Comments

@christopherbate
Copy link
Contributor

christopherbate commented Nov 21, 2024

What happened?

There are several places in the simplification transforms (e.g. https://github.com/openxla/stablehlo/blob/main/stablehlo/transforms/StablehloAggressiveFolder.cpp#L744) where a rewrite pattern unconditionally replaces a Value with a Value of a potentially different type. Sometimes this is OK, sometimes not.

In this IR, folding the stablehlo.subtract and replacing it directly with a constant of type tensor<1x1xi32> is OK:

func.func @example() -> tensor<?x?xi32> {
    %c = stablehlo.constant dense<1> : tensor<1x1xi32>
    %c1 = stablehlo.constant dense<2> : tensor<1x1xi32>
    %0 = stablehlo.subtract %c, %c1 : (tensor<1x1xi32>, tensor<1x1xi32>) -> tensor<?x?xi32>
    %1 = stablehlo.negate %0 : (tensor<?x?xi32>) -> tensor<?x?xi32>
    return %1 : tensor<?x?xi32>
}

However, if the user of %0 is a return or something like stablehlo.composite, then the result is invalid IR.

It seems like it's hit-or-miss whether patterns are checking and tested against edge cases similar to the above.

Rather than submitting PRs piecemeal each time we encounter a failing pattern, I'm wondering if there's something we can do across the board to address this issue in all the simplification passes.

christopherbate added a commit to NVIDIA/TensorRT-Incubator that referenced this issue Dec 4, 2024
Added a patch that was missing from the last StableHLO upgrade. This patch
addresses some issues mentioned in openxla/stablehlo#2634.

An additional test is added to mlir-tensorrt as a regression test.
christopherbate added a commit to NVIDIA/TensorRT-Incubator that referenced this issue Dec 5, 2024
Added a patch that was missing from the last StableHLO upgrade. This patch
addresses some issues mentioned in openxla/stablehlo#2634.

An additional test is added to mlir-tensorrt as a regression test.

Also makes some minor cleanup to CI/CD config in order to fix the caching
mechanism, using the PR as a test-case.
christopherbate added a commit to NVIDIA/TensorRT-Incubator that referenced this issue Dec 5, 2024
Added a patch that was missing from the last StableHLO upgrade. This patch
addresses some issues mentioned in openxla/stablehlo#2634.

An additional test is added to mlir-tensorrt as a regression test.

Also makes some minor cleanup to CI/CD config in order to fix the caching
mechanism, using the PR as a test-case.
christopherbate added a commit to NVIDIA/TensorRT-Incubator that referenced this issue Dec 5, 2024
Added a patch that was missing from the last StableHLO upgrade. This patch
addresses some issues mentioned in openxla/stablehlo#2634.

An additional test is added to mlir-tensorrt as a regression test.

Also makes some minor cleanup to CI/CD config in order to fix the caching
mechanism, using the PR as a test-case.
christopherbate added a commit to NVIDIA/TensorRT-Incubator that referenced this issue Dec 5, 2024
Added a patch that was missing from the last StableHLO upgrade. This
patch
addresses some issues mentioned in
openxla/stablehlo#2634.

An additional test is added to mlir-tensorrt as a regression test.
@GleasonK
Copy link
Member

GleasonK commented Feb 6, 2025

Thought I replied to this in the past -- Is this example something you encounter in practice? Curious what the source of the IR is, as it isn't often we see less-precise result types than input types. This IR in particular is something that could be fixed with basic type inference.

Also I'm see your patches, feel free to upstream those! They look reasonable.

@christopherbate
Copy link
Contributor Author

@GleasonK yes we see it in practice. It's a problem because although Stablehlo has a set of rules that allow updating types in-place, that doesn't apply to other ops that are used like "func.return" or even certain Stablehlo ops that have restrictions on the operand types likes dynamic slice or the composite op.

That then implies that if you want to update a type in-place, you have to iterate over all users to check whether in-place update is valid. It might be cheaper just to insert a cast-like operation after all if there are more than 2 or so uses, just to avoid all the pointer chasing. I've never measured the cost though.

The reason you might not observe the issue is perhaps due to the front end... we have an ongoing experiment where the front end creates Stablehlo in the most general type possible, relying on the MLIR compiler to refine the types. That's where we encounter these sorts of issues/bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants