[libclc] Optimize generic CLC fmin/fmax #128506

frasercrmck · 2025-02-24T12:44:23Z

With this commit, the CLC fmin/fmax builtins use clang's _builtin_elementwise(min|max)imumnum which helps us generate LLVM minimumnum/maximumnum intrinsics directly. These intrinsics uniformly select the non-NaN input over the (quiet or signalling) NaN input, which corresponds to what the OpenCL CTS tests.

These intrinsics maintain the vector types, as opposed to scalarizing, which was previously happening. This commit therefore helps to optimize codegen for those targets.

Note that there is ongoing discussion regarding how these builtins should handle signalling NaNs in the OpenCL specification and whether they should be able to return a quiet NaN as per the IEEE behaviour. If the specification and/or CTS is ever updated to allow or mandate returning a qNAN, these builtins could/should be updated to use _builtin_elementwise(min|max)num instead which would lower to LLVM minnum/maxnum intrinsics.

The SPIR-V targets maintain the old implementations, as the LLVM -> SPIR-V translator can't currently handle the LLVM intrinsics. The implementation has been simplifies to consistently use clang builtins, as opposed to before where the half version was explicitly defined.

[1] KhronosGroup/OpenCL-CTS#2285

frasercrmck · 2025-02-24T12:46:58Z

Note @arsenm I didn't touch amdgcn's fmin/fmax or r600's as I wasn't sure if any of that could be updated or at least unified. Would you be able to help there?

I note that the comments around the use of canonicalize mention sNAN, which isn't required by the spec.

arsenm · 2025-02-24T12:49:22Z

These should use the regular builtin fmin / fmax.

I note that the comments around the use of canonicalize mention sNAN, which isn't required by the spec.

The spec is quite badly written on what's expected of snans here, and the conformance test doesn't test what is written in the spec (hoping to fix that here

frasercrmck · 2025-02-24T13:05:10Z

These should use the regular builtin fmin / fmax.

Do you mean the AMD implementations, or the CLC ones too? Note there's no vector support for __builtin_fmin which is why I chose __builtin_elementwise_min. They appear to generate the same code so maybe I'm misunderstanding the difference between the two builtins.

I note that the comments around the use of canonicalize mention sNAN, which isn't required by the spec.

The spec is quite badly written on what's expected of snans here, and the conformance test doesn't test what is written in the spec (hoping to fix that here

Thanks for the link. I was going by 7.2 but now I see there's also a footnote.

frasercrmck · 2025-03-04T09:08:12Z

I don't suppose the recent clarifications to llvm.minnum and llvm.maxnum change anything here?

frasercrmck · 2025-03-17T14:07:19Z

ping, thanks

This is an alternative to llvm#128506 which doesn't attempt to change the codegen for fmin and fmax on their way to the CLC library. The amdgcn and r600 custom definitions of fmin/fmax are now converted to custom definitions of __clc_fmin and __clc_fmax. The only codegen change is that non-standard vector/scalar overloads of fmin/fmax have been removed. We were currently (accidentally, presumably) providing overloads with mixed elment types such as fmin(double2, float), fmax(half4, double), etc. The only vector/scalar overloads in the OpenCL spec are those with scalars of the same element type as the vector in the first argument.

This is an alternative to #128506 which doesn't attempt to change the codegen for fmin and fmax on their way to the CLC library. The amdgcn and r600 custom definitions of fmin/fmax are now converted to custom definitions of __clc_fmin and __clc_fmax. For simplicity, the CLC library doesn't provide vector/scalar versions of these builtins. The OpenCL layer wraps those up to the vector/vector versions. The only codegen change is that non-standard vector/scalar overloads of fmin/fmax have been removed. We were currently (accidentally, presumably) providing overloads with mixed elment types such as fmin(double2, float), fmax(half4, double), etc. The only vector/scalar overloads in the OpenCL spec are those with scalars of the same element type as the vector in the first argument.

This is an alternative to llvm#128506 which doesn't attempt to change the codegen for fmin and fmax on their way to the CLC library. The amdgcn and r600 custom definitions of fmin/fmax are now converted to custom definitions of __clc_fmin and __clc_fmax. For simplicity, the CLC library doesn't provide vector/scalar versions of these builtins. The OpenCL layer wraps those up to the vector/vector versions. The only codegen change is that non-standard vector/scalar overloads of fmin/fmax have been removed. We were currently (accidentally, presumably) providing overloads with mixed elment types such as fmin(double2, float), fmax(half4, double), etc. The only vector/scalar overloads in the OpenCL spec are those with scalars of the same element type as the vector in the first argument.

arsenm · 2025-04-29T13:28:48Z

I don't suppose the recent clarifications to llvm.minnum and llvm.maxnum change anything here?

It depends on whether the conformance test is fixed to match the fuzzy language of the spec or not. If the decision is fmin/fmax should match the IEEE behavior, the implementation directly maps to llvm.minnum/llvm.maxnum. If the decision is the conformance test continues doing what it has been doing, it should directly map to llvm.minimumnum/maximumnum. In either case, we should not have code using canonicalizes

This is an alternative to llvm#128506 which doesn't attempt to change the codegen for fmin and fmax on their way to the CLC library. The amdgcn and r600 custom definitions of fmin/fmax are now converted to custom definitions of __clc_fmin and __clc_fmax. For simplicity, the CLC library doesn't provide vector/scalar versions of these builtins. The OpenCL layer wraps those up to the vector/vector versions. The only codegen change is that non-standard vector/scalar overloads of fmin/fmax have been removed. We were currently (accidentally, presumably) providing overloads with mixed elment types such as fmin(double2, float), fmax(half4, double), etc. The only vector/scalar overloads in the OpenCL spec are those with scalars of the same element type as the vector in the first argument.

) Addresses #112164. minimumnum and maximumnum intrinsics were added in 5bf81e5. The new built-ins can be used for implementing OpenCL math function fmax and fmin in #128506.

wenju-he · 2025-07-23T01:08:43Z

If the decision is the conformance test continues doing what it has been doing, it should directly map to llvm.minimumnum/maximumnum.

For now, @frasercrmck can we update this PR to use __builtin_elementwise_maximumnum/minimumnum so that OpenCL CTS can pass?

…#149775) Addresses llvm#112164. minimumnum and maximumnum intrinsics were added in 5bf81e5. The new built-ins can be used for implementing OpenCL math function fmax and fmin in llvm#128506.

The CLC fmin/fmax builtins now use clang's __builtin_elementwise_(min|max) which helps us generate llvm.(min|max)num intrinsics directly. These intrinsics select the non-NAN input over the NAN input, which adheres to the OpenCL specification. Note that the OpenCL specification doesn't require support for sNAN, so returning qNAN over sNAN is acceptable. Note also that the intrinsics don't differentiate between -0.0 and +0.0; this does not appear to be required - going by the OpenCL CTS, at least. These intrinsics maintain the vector types, as opposed to scalarizing, which was previously happening. This commit therefore helps to optimize codegen for those targets.

frasercrmck · 2025-07-28T17:00:17Z

If the decision is the conformance test continues doing what it has been doing, it should directly map to llvm.minimumnum/maximumnum.

For now, @frasercrmck can we update this PR to use __builtin_elementwise_maximumnum/minimumnum so that OpenCL CTS can pass?

Good idea, I've done that now.

The amdgcn/r600 versions now use llvm.maximumnum and llvm.minimumnum in the same way as other targets (@arsenm).

One caveat is that I had to make the SPIR-V targets use the old versions of CLC fmin/fmax as the Khronos SPIR-V translator can't (currently?) handle those intrinsics.

libclc/clc/lib/spirv/math/clc_fmin.cl

wenju-he

LGTM. I think llvm-spirv should be fixed, so that we can also use __builtin_elementwise_max/minimumnum for the target.

frasercrmck added the libclc libclc OpenCL library label Feb 24, 2025

frasercrmck requested a review from arsenm February 24, 2025 12:44

frasercrmck force-pushed the libclc-clc-fmin-fmax branch from 43d4d7d to 572780f Compare April 1, 2025 11:12

frasercrmck mentioned this pull request Apr 3, 2025

[libclc] Move fmin & fmax to CLC library #134218

Merged

frasercrmck force-pushed the libclc-clc-fmin-fmax branch from 572780f to 5c367b8 Compare April 29, 2025 10:18

frasercrmck changed the title ~~[libclc] Move fmin/fmax to the CLC library~~ [libclc] Optimize generic CLC fmin/fmax Apr 29, 2025

wenju-he mentioned this pull request Jun 27, 2025

[NFC][libclc] Refactor _CLC_*_VECTORIZE macros to functions in .inc files #145678

Merged

wenju-he mentioned this pull request Jul 21, 2025

[Clang] Add elementwise maximumnum/minimumnum builtin functions #149775

Merged

frasercrmck added 3 commits July 28, 2025 17:39

spirv fmin/fmax

001f427

amdgcn/r600 simplify

8cd4a8e

frasercrmck force-pushed the libclc-clc-fmin-fmax branch from 5c367b8 to 8cd4a8e Compare July 28, 2025 16:57

wenju-he reviewed Jul 29, 2025

View reviewed changes

libclc/clc/lib/spirv/math/clc_fmin.cl Outdated Show resolved Hide resolved

use builtins

ce099c8

wenju-he approved these changes Jul 29, 2025

View reviewed changes

frasercrmck merged commit 586cacd into llvm:main Jul 29, 2025
9 checks passed

frasercrmck deleted the libclc-clc-fmin-fmax branch July 29, 2025 12:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[libclc] Optimize generic CLC fmin/fmax #128506

[libclc] Optimize generic CLC fmin/fmax #128506

Uh oh!

frasercrmck commented Feb 24, 2025 •

edited

Loading

Uh oh!

frasercrmck commented Feb 24, 2025

Uh oh!

arsenm commented Feb 24, 2025

Uh oh!

frasercrmck commented Feb 24, 2025

Uh oh!

frasercrmck commented Mar 4, 2025

Uh oh!

frasercrmck commented Mar 17, 2025

Uh oh!

arsenm commented Apr 29, 2025

Uh oh!

wenju-he commented Jul 23, 2025

Uh oh!

frasercrmck commented Jul 28, 2025

Uh oh!

Uh oh!

wenju-he left a comment

Uh oh!

Uh oh!

Uh oh!

[libclc] Optimize generic CLC fmin/fmax #128506

[libclc] Optimize generic CLC fmin/fmax #128506

Uh oh!

Conversation

frasercrmck commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frasercrmck commented Feb 24, 2025

Uh oh!

arsenm commented Feb 24, 2025

Uh oh!

frasercrmck commented Feb 24, 2025

Uh oh!

frasercrmck commented Mar 4, 2025

Uh oh!

frasercrmck commented Mar 17, 2025

Uh oh!

arsenm commented Apr 29, 2025

Uh oh!

wenju-he commented Jul 23, 2025

Uh oh!

frasercrmck commented Jul 28, 2025

Uh oh!

Uh oh!

wenju-he left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

frasercrmck commented Feb 24, 2025 •

edited

Loading