Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][DOC] Add spec and design for "if_device" #8917

Open
wants to merge 6 commits into
base: sycl
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions sycl/ReleaseNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ plugin. [b1533c5e]
- Deprecated `group::get_global_range()`. [95338719]

### Documentation
- Updated the [`sycl_ext_oneapi_device_if`](doc/extensions/proposed/sycl_ext_oneapi_device_if.asciidoc)
- Updated the [`sycl_ext_oneapi_device_if`](doc/extensions/proposed/sycl_ext_oneapi_if_device_has.asciidoc)
extension proposal to allow chaining `if_device_has`, `else_if_device_has` and
`else_device` calls. [7f2b17ed]
- Updated the [`sycl_ext_intel_fpga_device_selector`](doc/extensions/supported/sycl_ext_intel_fpga_device_selector.asciidoc)
Expand Down Expand Up @@ -1725,7 +1725,7 @@ Release notes for commit range 6a49170027fb..962909fe9e78
[Uniform](doc/extensions/proposed/sycl_ext_oneapi_uniform.asciidoc) extensions [72e1611]
- Added [Matrix Programming Extension for DPC++ document](doc/extensions/experimental/sycl_ext_oneapi_matrix/sycl_ext_oneapi_matrix.asciidoc) [ace4c733]
- Implemented SYCL 2020 `sycl::span` [9356d53]
- Added [device-if](doc/extensions/proposed/sycl_ext_oneapi_device_if.asciidoc) extension
- Added [device-if](doc/extensions/proposed/sycl_ext_oneapi_if_device_has.asciidoc) extension
[4fb95fc]
- Added a [programming guide](doc/MultiTileCardWithLevelZero.md) for
multi-tile and multi-card under Level Zero backend [d581178a]
Expand Down
10 changes: 5 additions & 5 deletions sycl/doc/design/DeviceAspectTraitDesign.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@ corresponding compilation environment can guarantee that any and all the
supported devices support the `aspect`.

The design of these traits is inspired by the implementation of the
[sycl\_ext\_oneapi\_device\_if][2] and
[sycl\_ext\_oneapi\_if\_device\_has][2] and
[sycl\_ext\_oneapi\_device\_architecture][3] extensions as described in
[DeviceIf.md][4]. Additionally, it leverages part of the design for optional
[IfDeviceHas.md][4]. Additionally, it leverages part of the design for optional
kernel features, as described in [OptionalDeviceFeatures.md][5].

## Changes to the compiler driver

Using the `-fsycl-targets` options introduced in [DeviceIf.md][4] and the
Using the `-fsycl-targets` options introduced in [IfDeviceHas.md][4] and the
configuration file introduced in [OptionalDeviceFeatures.md][5], the compiler
driver finds the set of all aspects supported by each specified target. Note
that in this section we refer to aspects as their integral representation as
Expand Down Expand Up @@ -124,7 +124,7 @@ This relies on the fact that unspecialized variants of `any_device_has` and
`all_devices_have` are undefined.

[1]: <https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:device-aspects>
[2]: <../extensions/proposed/sycl_ext_oneapi_device_if.asciidoc>
[2]: <../extensions/proposed/sycl_ext_oneapi_if_device_has.asciidoc>
[3]: <../extensions/proposed/sycl_ext_oneapi_device_architecture.asciidoc>
[4]: <DeviceIf.md>
[4]: <IfDeviceHas.md>
[5]: <OptionalDeviceFeatures.md>
2 changes: 1 addition & 1 deletion sycl/doc/design/ESIMDDesignNotes.md
Original file line number Diff line number Diff line change
Expand Up @@ -242,7 +242,7 @@ This section lists current major ESIMD gaps/TODOs.
[design](https://github.com/intel/llvm/blob/sycl/sycl/doc/design/OptionalDeviceFeatures.md).
This might require splitting implementations into per-architecture variants.
`if_device_has`
[feature](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_device_if.asciidoc)
[feature](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/proposed/sycl_ext_oneapi_if_device_has.asciidoc)
may help avoid duplication of common parts and dispatch to
architecture-dependent code at fine-grained level from within a function.
1. As VC BE moves away from `genx.*` intrinsics replacing them with `__spirv_*`
Expand Down
179 changes: 179 additions & 0 deletions sycl/doc/design/IfDevice.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# Implementation design for "if\_device"

This document describes the design for the DPC++ implementation of the
[sycl\_ext\_oneapi\_if\_device][1] extension.

[1]: <../extensions/proposed/sycl_ext_oneapi_if_device.asciidoc>


## Phased implementation

Although the main motivation for the "if\_device" extension is to enable a
1-pass compiler, it can still be implemented in our existing multi-pass
compiler. This is useful because it allows us to gain experience using this
extension even before we implement the 1-pass compiler.

This document, therefore, describes two implementations. The first is a
trivial implementation that works in the current multi-pass compiler. The
other is the design that we will ultimately use in the 1-pass compiler.


## Multi-pass compiler implementation

This implementation requires changes only to the device headers. The
implementation is very trivial, leveraging the existing `__SYCL_DEVICE_ONLY__`
macro which is defined differently in the host compiler pass vs. the device
compiler passes.

```
namespace sycl::ext::oneapi::experimental {
namespace detail {

// Helper object used to implement "otherwise". The "MakeCall" template
// parameter tells whether the previous call to "if_device" or "if_host" called
// its "fn". When "MakeCall" is true, the previous call to "fn" did not
// happen, so the "otherwise" should call "fn".
template<bool MakeCall>
class if_device_or_host_helper {
public:
template<typename T>
void otherwise(T fn) {
if constexpr (MakeCall) {
fn();
}
}
};

} // namespace detail

template<typename T>
static auto if_device(T fn) {
#ifdef __SYCL_DEVICE_ONLY__
fn();
return detail::if_device_or_host_helper<false>{};
#else
return detail::if_device_or_host_helper<true>{};
#endif
}

template<typename T>
static auto if_host(T fn) {
#ifdef __SYCL_DEVICE_ONLY__
return detail::if_device_or_host_helper<true>{};
#else
fn();
return detail::if_device_or_host_helper<false>{};
#endif
}

} // namespace sycl::ext::oneapi::experimental
```


## Single-pass compiler implementation

This implementation requires changes to the device headers, some changes to
the error diagnostics in the front-end (CFE), and a new IR pass.

### Device headers

The device headers translate the API into calls to two functions that are
decorated with attributes named "sycl-call-if-on-device" and
"sycl-call-if-on-host".

```
namespace sycl::ext::oneapi::experimental {
namespace detail {

// Call the callable object "fn" only when this code runs on a device.
//
// IR passes recognize this function from the "sycl-call-if-on-device"
// attribute.
template<typename T>
[[clang::noinline]]
[[__sycl_detail__::add_ir_attributes_function("sycl-call-if-on-device", true)]]
void call_if_on_device(T fn) {
fn();
}

// Call the callable object "fn" only when this code runs on the host.
//
// IR passes recognize this function from the "sycl-call-if-on-host" attribute.
template<typename T>
[[clang::noinline]]
[[__sycl_detail__::add_ir_attributes_function("sycl-call-if-on-host", true)]]
void call_if_on_host(T fn) {
fn();
}

class call_if_on_device_helper {
public:
template<typename T>
void otherwise(T fn) {
call_if_on_device(fn);
}
};

class call_if_on_host_helper {
public:
template<typename T>
void otherwise(T fn) {
call_if_on_host(fn);
}
};

} // namespace detail

template<typename T>
static auto if_device(T fn) {
detail::call_if_on_device(fn);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rolandschulz

I think I do not need to use std::forward here, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the std::forward, if a r-value-ref gets passed to if_device then call_if_on_device gets passed a l-value-ref. If the callable only works with an r-value-ref this break. Always use forward for any universal reference (if you don't use it anymore afterwards).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added in f3e15a7.

return detail::call_if_on_host_helper{};
}

template<typename T>
static auto if_host(T fn) {
detail::call_if_on_host(fn);
return detail::call_if_on_device_helper{};
}

} // namespace sycl::ext::oneapi::experimental
```

Note the use of `[[clang::noinline]]`. It is important that the bodies of
these functions are not inlined until after the IR pass described below.

### Changes to the front-end (CFE)

The CFE currently diagnoses some errors that are specific to device code. To do
this, the CFE must first traverse the static call tree to determine which
functions are called from kernels. This pass of the CFE must recognize the
functions marked with the attribute "sycl-call-if-on-host" and skip the bodies
of these functions when building the static call tree of the kernels. As a
result, the CFE will not emit any diagnostics that are specific to device code
for the callable object that is passed to these functions.

In a 1-pass compiler, we expect that the CFE will emit a single stream of
LLVM IR for both host and device. This IR retains any calls to the functions
marked with "sycl-call-if-on-host" or "sycl-call-if-on-device" and retains the
full bodies of those functions. The filtering described above is used only to
determine the functions that are checked for device-specific errors.

### New IR pass

The 1-pass compiler will eventually split the LLVM IR into two parts: one that
contains the device code and one that contains the host code. We expect that
this pass will traverse the static call tree of the kernels to identify device
code. This pass also recognizes the functions marked with
"sycl-call-if-on-host" and "sycl-call-if-on-device". When generating the IR
for the device code, the bodies of functions marked "sycl-call-if-on-host" are
deleted, leaving empty functions. When generating the IR for the host code,
the bodies of functions marked "sycl-call-if-on-device" are deleted.

Alternatively, the IR pass could use metadata from the CFE to identify host vs.
device code, rather than repeating the static call tree traversal here. These
details will be resolved later as part of the 1-pass compiler design.

Up until this point, it was important to prevent inlining of the functions
marked "sycl-call-if-on-host" and "sycl-call-if-on-device". Once the IR is
split, inlining is permitted, so this IR pass also removes the LLVM IR
`noinline` attributes from these functions.
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Implementation design for "device\_if" and "device\_architecture"
# Implementation design for "if\_device\_has" and "device\_architecture"

This document describes the design for the DPC++ implementation of the
[sycl\_ext\_oneapi\_device\_if][1] and
[sycl\_ext\_oneapi\_if\_device\_has][1] and
[sycl\_ext\_oneapi\_device\_architecture][2] extensions.

[1]: <../extensions/proposed/sycl_ext_oneapi_device_if.asciidoc>
[2]: <../extensions/proposed/sycl_ext_oneapi_device_architecture.asciidoc>
[1]: <../extensions/proposed/sycl_ext_oneapi_if_device_has.asciidoc>
[2]: <../extensions/experimental/sycl_ext_oneapi_device_architecture.asciidoc>


## Phased implementation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ limitations.

This extension provides a way for device code to query the device architecture
on which it is running. This is similar to the
link:../proposed/sycl_ext_oneapi_device_if.asciidoc[sycl_ext_oneapi_device_if]
link:../proposed/sycl_ext_oneapi_if_device_has.asciidoc[sycl_ext_oneapi_if_device_has]
extension except the comparison is for the device's architecture not the
device's aspects. In some cases, low-level application code can use special
features or do specific optimizations depending on the device architecture, and
Expand Down Expand Up @@ -492,7 +492,7 @@ template<architecture ...Archs, typename T>
```

This function operates exactly like `if_device_has` from the
link:../proposed/sycl_ext_oneapi_device_if.asciidoc[sycl_ext_oneapi_device_if]
link:../proposed/sycl_ext_oneapi_if_device_has.asciidoc[sycl_ext_oneapi_if_device_has]
extension except that the condition gating execution of the callable function
`fn` is determined by the `Archs` parameter pack. This condition is `true` if
the device which executes `if_architecture_is` matches **any** of the
Expand All @@ -513,7 +513,7 @@ class /* unspecified */ {
```

The `otherwise` function behaves exactly like the `otherwise` function from the
link:../proposed/sycl_ext_oneapi_device_if.asciidoc[sycl_ext_oneapi_device_if]
link:../proposed/sycl_ext_oneapi_if_device_has.asciidoc[sycl_ext_oneapi_if_device_has]
extension. The `else_if_architecture_is` function behaves exactly like
`else_if_device_has` from that extension except that the condition gating
execution of the callable object `fn` is determined by the `Archs` parameter
Expand Down
Loading