[SYCL][DOC] Add phase 2 design for "if_device_has" #9127

gmlueck · 2023-04-19T21:03:00Z

Add the "phase 2" design for if_device_has and if_architecture_is extensions. Whereas the phase 1 design worked only in some limited cases for AOT compilation, the phase 2 design fully supports both of these extensions in both AOT and JIT modes.

The phase 2 design is also implementable in a 1-pass compiler.

Note that the design document presents a primary design and also two alternatives. Discussion of the alternatives is encouraged in this PR.

Attention @intel/dpcpp-spirv-doc-reviewers: The SPIR-V extension specification in this PR is used only in one of the alternate designs. Although comments on the SPIR-V spec are welcome, it may make sense to decide first whether the alternate design makes sense before doing a detailed review of the SPIR-V spec.

Add the "phase 2" design for `if_device_has` and `if_architecture_is` extensions. Whereas the phase 1 design worked only in some limited cases for AOT compilation, the phase 2 design fully supports both of these extensions in both AOT and JIT modes. The phase 2 design is also implementable in a 1-pass compiler. Note that the design document presents a primary design and also two alternatives. Discussion of the alternatives is encouraged in this PR. Attention @intel/dpcpp-spirv-doc-reviewers: The SPIR-V extension specification in this PR is used only in one of the alternate designs. Although comments on the SPIR-V spec are welcome, it may make sense to decide first whether the alternate design makes sense before doing a detailed review of the SPIR-V spec.

AerialMantis

The design looks good to me, left a few comments for things to potentially clarify.

AerialMantis · 2023-04-26T20:12:58Z

sycl/doc/design/DeviceIf.md

+  being a function pointer.  This callee is the _Conditional Action_ function
+  _FAction_.
+
+* For each call to _FCaller_, the pass adds a new parameter at the beginning of


Should this say _FAction_ here?

No. The pass adds a new parameter to each call to FCaller. That new parameter is the literal function name FAction.

The calls to FAction are in the body of FCaller, and this body is removed (see the next bullet point).

I'm still confused by this bullet. If we pass an extra argument to FCaller which is "literal function name", then why are we updating FCaller definition to have a pointer-to-function argument?

Shouldn't arguments match between call site and the definition? Am I misunderstanding what "literal function name" means?

Maybe "literal function name" isn't the right term. I mean that the call site should be changed to look like this:

call void @call_if_on_device_conditionallyXXX(@CallableXXX, ...)

where @CallableXXX is a function defined in the same LLVM IR module.

In order for this to be correct IR, the declaration of @call_if_on_device_conditionallyXXX needs to be changed so that its first argument is pointer-to-function. When I originally write this PR, I tried these IR transformation by hand, so I'm pretty sure about the parameter type.

AerialMantis · 2023-04-26T20:39:08Z

sycl/doc/design/DeviceIf.md

+
+  * If the target is SPIR-V:
+
+    * It moves the definition of `@CallableXXX` and its entire call tree to a


What would happen if the call tree of the callable includes functions also called from outside of the callable's call tree, would those functions need to be duplicated?

Yes. This is described in the detailed section below titled "Changes to the sycl-post-link tool (non-AOT)".

AerialMantis · 2023-04-26T20:46:36Z

sycl/doc/design/DeviceIf.md

+  * The call to _FCallerB_ has _Conditional Action_ named _FActionB_ and
+    _Condition Expression_ named _ExpB_.
+
+  When this happens, aspects used by _FActionA_ have the condition _ExpA_.


When the conditional expressions are combined like this would it retain separate relationships between the conditions and the aspects?

What do you mean by "separate relationship"? The next section describes the information in the !sycl_conditionally_used_aspects metadata. Each set of aspects has an associated condition. Does that answer your question?

AerialMantis · 2023-04-26T21:05:05Z

sycl/doc/design/DeviceIf.md

+functions as described in [Device Code Dynamic Linking][6].  This algorithm is
+extended to look also for _Add On Images_.
+
+If the _Main Image_ contains the "SYCL/add on images" property set, the runtime


How would this handle nested add-on images, would these all be contained in the same property set, does the linking order here matter?

Nested add-on images are handled in the last bullet point below that starts "The selected Add On Image may also contain a "SYCL/add on images" property set".

The algorithm in sycl-post-link breaks nested if_device_has calls into separate add-on images. Each of those add-on images contains its own "SYCL/device requirements" property set that defines the aspects it uses.

The link order doesn't matter.

AerialMantis · 2023-04-26T21:43:15Z

sycl/doc/design/DeviceIf.md

+resolving the _Conditional Actions_ and run the LLVM IR pipeline separately for
+each copy of the IR.
+
+It would be possible to use this alternate design for some AOT targets but not


I think this design would be preferable for Nvidia and AMD targets as there are many optimizations such as inlining which are crucial for performance which I understand would not be run across the conditional caller and conditional action boundary. I think the benefit of the additional optimizations would be worth the higher compilation times when targeting multiple AoT targets.

Yes, this seems reasonable to me. This is what I had in mind as a possibility when writing this section.

AlexeySachkov · 2023-11-23T15:19:47Z

sycl/doc/design/DeviceIf.md

+  being a function pointer.  This callee is the _Conditional Action_ function
+  _FAction_.
+
+* For each call to _FCaller_, the pass adds a new parameter at the beginning of


I'm still confused by this bullet. If we pass an extra argument to FCaller which is "literal function name", then why are we updating FCaller definition to have a pointer-to-function argument?

Shouldn't arguments match between call site and the definition? Am I misunderstanding what "literal function name" means?

AlexeySachkov · 2023-11-23T15:43:19Z

sycl/doc/design/DeviceIf.md

+This IR pass is changed to perform the following additional aspect
+propagations:
+
+* Aspects used by each _Conditional Action_ function (and by functions it


Do I understand correctly that we don't have any special markup on Conditional Action? I'm asking because currently we don't really do bottom-up propagation to set, but instead we do top-bottom call graph analysis starting from kernels.

Therefore, with the current algorithm we will mostly ignore Conditional Actions. We can rewrite it to bottom-up, but the question is where to we add the metadata - to each node in a call graph? or only to top-level entities such as kernels? or to any functions which don't have callers within a module?

This is all solvable, I'm noting it here, because it would be good to avoid that to prevent unnecessary bloat of LLVM IR modules we produce by placing that metadata everywhere.

I think kernel-bundle based approach would allow us to keep top-down approach with some extra handling/resets in handling of these nodes.

AlexeySachkov · 2023-11-24T10:15:05Z

sycl/doc/design/DeviceIf.md

+* When determining whether two kernels can be placed in the same device image,
+  the `!used_aspects` must be the same and the
+  `!sycl_conditionally_used_aspects` must be the same (the same set of
+  conditions and the same set of conditionally used aspects).


Why do we need to take !sycl_conditionally_used_aspects into account? Since actual uses of those conditionally used aspects will optionally be linked later at runtime, it should be safe to bundle two kernels which directly use the same aspects together.

AlexeySachkov · 2023-11-24T13:22:53Z

sycl/doc/design/DeviceIf.md

+* Aspect usage can be propagated through nested _Conditional Caller_ function
+  calls.  To illustrate, consider the following example:
+
+  * A _Conditional Caller_ named _FCallerA_ has the _Conditional Action_ named
+    _FActionA_ and the _Condition Expression_ named _ExpA_.
+  * The function _FActionA_ calls a different _Conditional Caller_ named
+    _FCallerB_.
+  * The call to _FCallerB_ has _Conditional Action_ named _FActionB_ and
+    _Condition Expression_ named _ExpB_.


If I understand correctly, both FActionA and FActionB will at the end reside each in their own device image with individual requirements - do we even need then to propagate conditionally used aspects through nested conditional calls?

AlexeySachkov · 2023-11-24T13:26:15Z

sycl/doc/design/DeviceIf.md

+  * Each pair of _Add On Images_ (i.e. the one with the real function
+    definitions and the one with the stub definitions) is assigned a unique
+    name.  By convention this is just an integer in string form (e.g. "1").


Simple integer may not be enough here, because application may be linked to a shared library which also contains SYCL kernels and if_device_has calls, thus both the app and the library will provide add-on device images with the same integer string "1". We probably need to include GUID in the name as well

I was thinking that the "scope" of the Add On Image numbers was limited to the ELF file that contains them. For example, if a Main Image contains a reference to Add On Image "1", we only search for that Add On Image in the same ELF file.

If you think this will be difficult and we need a globally unique name, we could use a GUID. However, I think the driver will need to pass a GUID via a command line option to sycl-post-link.

AlexeySachkov · 2023-11-24T13:35:46Z

sycl/doc/design/DeviceIf.md

+contains one property for each of its _Add On Images_.  The name of each
+property is a unique identifier for the _Add On Image_, which by convention is
+just an integer in string form (e.g "1").  The value of the property has type
+`PI_PROPERTY_TYPE_BYTE_ARRAY` containing a series of `uint32` values `N1`,


Should it be a signed integer? Encoding for logical operators and "keywords" is defined as negative numbers

Good catch. Changed in 01fa4f8.

AlexeySachkov · 2023-11-24T13:43:28Z

sycl/doc/design/DeviceIf.md

+  properties to the iteration set, causing their _Add On Images_ to be found
+  also.
+
+Once this completes, the runtime computes the union of the


We should also think about in-memory and on-disk caches that we have. I don't remember what do we have there as keys, but as long as device is in there we probably don't need any extra changes since results of all conditional expressions should be the same if evaluated more than once for the same device.

Yes, that makes sense.

AlexeySachkov · 2023-11-24T14:02:43Z

sycl/doc/design/DeviceIf.md

+The disadvantage to this design is that it increases compilation time when
+there are multiple targets.  Once the _Conditional Actions_ are resolved,
+the LLVM IR is now specialized for one particular AOT target.  If the user has
+asked to compile for multiple targets, we need to split the IR prior to
+resolving the _Conditional Actions_ and run the LLVM IR pipeline separately for
+each copy of the IR.


I think that this is what we currently do anyway: each AOT target (like spir64_gen or spir64_x86_64) causes a separate invocation of FE and thus separate optimization pipeline invocation. so, this won't be worse than what we have today.

But at the same time, we currently have only a few AOT targets because they are generic and do not represent a particular device. Once we switch to specialized targets like spir64_intel_skl running FE separately for each of them would be a disaster and we better not lock-in ourselves into such situation.

What we could do is a single FE invocation and then several opt invocations for each target to at least somehow mitigate this.

Alternatively, we could just leave a single IR for all targets, but be more tricky about the IR we emit: instead of dropping bodies of call_if_on_device_conditionallyXXX we could replace their bodies to get the following:

define void @call_if_on_device_conditionallyXXX(%callablethis, %n1, %n2, ...) { %bool = __sycl_builtin_if_on_device.mangling(%n1, %n2, ...) br %bool, label %do_call, label %exit do_call: call void @CallableXXX(%callablethis) exit: ret void }

This would be an alternative transformation which the new pass does for AOT targets. Yes, it won't allow to full optimizations and may still limit some inlining, but it is better than nothing and allows us to have unified IR for all AOT targets in the optimization pipeline. Later, in sycl-post-link we would resolve that __sycl_builtin_if_on_device.* into concrete boolean value based on the target device and run some DCE to cleanup dead branches.

Once we switch to specialized targets like spir64_intel_skl running FE separately for each of them would be a disaster and we better not lock-in ourselves into such situation.

Yes, this is what I was worried about when describing the disadvantages of this design.

What we could do is a single FE invocation and then several opt invocations for each target to at least somehow mitigate this.

Yes, this is what I meant by "split the IR prior to resolving the Conditional Actions and run the LLVM IR pipeline separately for each copy of the IR".

Alternatively, we could just leave a single IR for all targets, but be more tricky about the IR we emit: instead of dropping bodies of call_if_on_device_conditionallyXXX we could replace their bodies to get the following

I'm not sure this is a good idea. I'm worried that the body of @CallableXXX will get inlined and then some instructions from its body will be lifted above the br %bool. I saw some cases like this when experimenting with a solution like this. The nice thing about the primary design is that there is no call to @CallableXXX in the IR module, so it cannot possibly be inlined, thus avoiding the possibility of "leaking" any of its instructions above the condition check.

My current thinking is that:

We should use this alternate design for Nvidia targets because these targets run the FE separately for each target anyway.

We should use the primary design for Intel GPU targets and see if ocloc does a good job of inlining the calls to @CallableXXX and optimizing the result.

If ocloc optimizations are not good, we could try the alternate design, where we split the IR and run optimization passes separately for each Intel GPU target.

AlexeySachkov · 2023-11-24T14:06:00Z

sycl/doc/design/DeviceIf.md

+[7]: <./OptionalDeviceFeatures.md#changes-to-the-dpc-runtime>
+
+
+## Alternate design for non-AOT SPIR-V targets


Personally, I'm in favor of the main proposed design for non-AOT SPIR-V targets, because similar idea is likely to be used by virtual functions extension (#10540) and therefore we will have more code which is handled in a similar manner and less things to support.

Raising a bar for underlying backends is not a huge concern in here, because we are talking about an extension and not a core functionality. However, if we think that this would be promoted to core SYCL at some point, then it is maybe better to avoid requiring a SPIR-V extension for this.

I disagree (somewhat). I think we should be using the same implementation for specialization constants and for this (and possibly for virtual functions). The features are very similar in what they do/can provide. It makes no sense to take different code paths. I also think that an ideal implementation would be mostly library-only on top of specialization constants if they are powerful enough.

The property set that contains the condition expression must contain signed values because the condition operators are negative numbers.

gmlueck

Thanks for the review and comments. Some responses below. I'll answer the remaining comments tomorrow.

gmlueck · 2023-11-29T21:17:13Z

sycl/doc/design/DeviceIf.md

+  being a function pointer.  This callee is the _Conditional Action_ function
+  _FAction_.
+
+* For each call to _FCaller_, the pass adds a new parameter at the beginning of


Maybe "literal function name" isn't the right term. I mean that the call site should be changed to look like this:

call void @call_if_on_device_conditionallyXXX(@CallableXXX, ...)

where @CallableXXX is a function defined in the same LLVM IR module.

In order for this to be correct IR, the declaration of @call_if_on_device_conditionallyXXX needs to be changed so that its first argument is pointer-to-function. When I originally write this PR, I tried these IR transformation by hand, so I'm pretty sure about the parameter type.

gmlueck · 2023-11-29T22:04:31Z

sycl/doc/design/DeviceIf.md

+The disadvantage to this design is that it increases compilation time when
+there are multiple targets.  Once the _Conditional Actions_ are resolved,
+the LLVM IR is now specialized for one particular AOT target.  If the user has
+asked to compile for multiple targets, we need to split the IR prior to
+resolving the _Conditional Actions_ and run the LLVM IR pipeline separately for
+each copy of the IR.


Once we switch to specialized targets like spir64_intel_skl running FE separately for each of them would be a disaster and we better not lock-in ourselves into such situation.

Yes, this is what I was worried about when describing the disadvantages of this design.

What we could do is a single FE invocation and then several opt invocations for each target to at least somehow mitigate this.

Yes, this is what I meant by "split the IR prior to resolving the Conditional Actions and run the LLVM IR pipeline separately for each copy of the IR".

Alternatively, we could just leave a single IR for all targets, but be more tricky about the IR we emit: instead of dropping bodies of call_if_on_device_conditionallyXXX we could replace their bodies to get the following

I'm not sure this is a good idea. I'm worried that the body of @CallableXXX will get inlined and then some instructions from its body will be lifted above the br %bool. I saw some cases like this when experimenting with a solution like this. The nice thing about the primary design is that there is no call to @CallableXXX in the IR module, so it cannot possibly be inlined, thus avoiding the possibility of "leaking" any of its instructions above the condition check.

My current thinking is that:

We should use this alternate design for Nvidia targets because these targets run the FE separately for each target anyway.

We should use the primary design for Intel GPU targets and see if ocloc does a good job of inlining the calls to @CallableXXX and optimizing the result.

If ocloc optimizations are not good, we could try the alternate design, where we split the IR and run optimization passes separately for each Intel GPU target.

gmlueck · 2023-11-29T22:06:18Z

sycl/doc/design/DeviceIf.md

+  properties to the iteration set, causing their _Add On Images_ to be found
+  also.
+
+Once this completes, the runtime computes the union of the


Yes, that makes sense.

gmlueck · 2023-11-29T22:24:35Z

sycl/doc/design/DeviceIf.md

+contains one property for each of its _Add On Images_.  The name of each
+property is a unique identifier for the _Add On Image_, which by convention is
+just an integer in string form (e.g "1").  The value of the property has type
+`PI_PROPERTY_TYPE_BYTE_ARRAY` containing a series of `uint32` values `N1`,


Good catch. Changed in 01fa4f8.

gmlueck · 2023-11-29T22:35:05Z

sycl/doc/design/DeviceIf.md

+  * Each pair of _Add On Images_ (i.e. the one with the real function
+    definitions and the one with the stub definitions) is assigned a unique
+    name.  By convention this is just an integer in string form (e.g. "1").


I was thinking that the "scope" of the Add On Image numbers was limited to the ELF file that contains them. For example, if a Main Image contains a reference to Add On Image "1", we only search for that Add On Image in the same ELF file.

If you think this will be difficult and we need a globally unique name, we could use a GUID. However, I think the driver will need to pass a GUID via a command line option to sycl-post-link.

Revise the design for handling optional kernel features for AOT compilations. There is no need to create a separate tool "aspect-filter" because the filtering can be done in the "sycl-post-link" tool instead. This is better aligned with our long term direction because "sycl-post-link" will need to do some AOT-specific IR transformations in the future as part of the design presented in intel#9127, and those transformation will also make use of the device configuration file. This commit also addresses some issues that were missed before related to embedding several AOT-compiled offload bundles into the host executable.

aelovikov-intel

I don't have a full understanding of all the things in play here, but gut tells me that we should be making specialization constants more powerful and building this on top of them instead.

It might result in essentially the same implementation underneath but using slightly different terms, but the resulting design (IMO) would be cleaner and leaner.

aelovikov-intel · 2024-01-16T18:04:15Z

sycl/doc/design/DeviceIf.md

+// Condition is a parameter pack of int's that define a simple expression
+// language which tells the set of aspects or architectures that the device
+// must have in order to enable the call.  See the "Condition*" values below.
+template<typename T, typename ...Condition>


Should it be int... Conditions instead?

aelovikov-intel · 2024-01-16T18:17:29Z

sycl/doc/design/DeviceIf.md

+  call void @call_if_on_device_conditionallyXXX(@CallableXXX, %callablethis, N1, N2, ...)
+
+  declare void @call_if_on_device_conditionallyXXX(%callable, %callablethis, %n1, %n2, ...)
+  ```


I don't have a full understanding of this yet, but another approach would be to have some clang builtin that gets lowered to either this directly or to something similar using operand bundles instead (with the same purpose of "defeating" the optimizer).

aelovikov-intel · 2024-01-16T18:27:45Z

sycl/doc/design/DeviceIf.md

+This IR pass is changed to perform the following additional aspect
+propagations:
+
+* Aspects used by each _Conditional Action_ function (and by functions it


I think kernel-bundle based approach would allow us to keep top-down approach with some extra handling/resets in handling of these nodes.

aelovikov-intel · 2024-01-16T18:30:28Z

sycl/doc/design/DeviceIf.md

+To illustrate, consider an example where the _condition_ is "fp16 == true" and
+the _aspects_ is "fp16".  In such a case, the conditional aspect usage is
+uninteresting because any device where "fp16 == true" will definitely support
+the "fp16" aspect.


I'm not confident, but this suggests to me that there is a more generic abstraction that could be elegantly used for both purposes somewhat uniformly.

aelovikov-intel · 2024-01-16T18:42:32Z

sycl/doc/design/DeviceIf.md

+[7]: <./OptionalDeviceFeatures.md#changes-to-the-dpc-runtime>
+
+
+## Alternate design for non-AOT SPIR-V targets


I disagree (somewhat). I think we should be using the same implementation for specialization constants and for this (and possibly for virtual functions). The features are very similar in what they do/can provide. It makes no sense to take different code paths. I also think that an ideal implementation would be mostly library-only on top of specialization constants if they are powerful enough.

dm-vodopyanov · 2024-04-15T14:21:29Z

sycl/doc/design/DeviceIf.md

+// language which tells the set of aspects or architectures that the device
+// must have in order to enable the call.  See the "Condition*" values below.
+template<typename T, typename ...Condition>
+[[__sycl_detail__::add_ir_attributes_function("sycl-call-if-on-device-conditionally", true)]]


Suggested change

[[__sycl_detail__::add_ir_attributes_function("sycl-call-if-on-device-conditionally", true)]]

#ifdef __SYCL_DEVICE_ONLY__

[[__sycl_detail__::add_ir_attributes_function(

"sycl-call-if-on-device-conditionally", true)]]

#endif

github-actions · 2024-10-17T01:59:27Z

This pull request is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be automatically closed in 30 days.

gmlueck requested review from a team as code owners April 19, 2023 21:03

AerialMantis reviewed Apr 26, 2023

View reviewed changes

AerialMantis requested a review from Naghasan April 26, 2023 21:55

gmlueck mentioned this pull request Jun 1, 2023

[SYCL][Docs] Add design document for Device Config File #9371

Merged

AlexeySachkov reviewed Nov 24, 2023

View reviewed changes

gmlueck added 2 commits November 29, 2023 17:10

Merge branch 'sycl' into gmlueck/device-if-design

6b28ef6

Use "int32" for property set

01fa4f8

The property set that contains the condition expression must contain signed values because the condition operators are negative numbers.

gmlueck requested a review from a team as a code owner November 29, 2023 22:24

gmlueck requested a review from KseniyaTikhomirova November 29, 2023 22:24

gmlueck commented Nov 29, 2023

View reviewed changes

gmlueck mentioned this pull request Dec 27, 2023

Revise design for optional features and AOT #12252

Open

aelovikov-intel reviewed Jan 16, 2024

View reviewed changes

dm-vodopyanov reviewed Apr 15, 2024

View reviewed changes

gmlueck mentioned this pull request Jun 7, 2024

[SYCL][Doc] Clarify error handling in sycl_ext_oneapi_device_architecture extension #14077

Merged

github-actions bot added the Stale label Oct 17, 2024

gmlueck removed the Stale label Oct 17, 2024


		* If the target is SPIR-V:

		* It moves the definition of `@CallableXXX` and its entire call tree to a

		[7]: <./OptionalDeviceFeatures.md#changes-to-the-dpc-runtime>


		## Alternate design for non-AOT SPIR-V targets

[SYCL][DOC] Add phase 2 design for "if_device_has" #9127

Are you sure you want to change the base?

[SYCL][DOC] Add phase 2 design for "if_device_has" #9127

Conversation

gmlueck commented Apr 19, 2023

AerialMantis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmlueck left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aelovikov-intel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 17, 2024