-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Docs] Add SYCLBIN feature and format design document #16872
base: sycl
Are you sure you want to change the base?
Conversation
This commit adds a design document detailing the SYCLBIN binary format for representing SYCL device kernel binaries to be loaded dynamically at runtime. Additionally, the design document details how this is to be handled by the SYCL runtime, driver and clang tooling. Signed-off-by: Larsen, Steffen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SYCL design documentation predominantly uses Markdown format. Please, convert the document to Markdown format.
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
Apologies! Old habits die hard. It should be good now. |
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
This design document details the SYCLBIN binary format used for storing SYCL | ||
device binaries to be loaded dynamically by the SYCL runtime. It also details | ||
how the toolchain produces, links and packages these binaries, as well as how | ||
the SYCL runtime library handles files of this format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a section describing motivation and anticipated use-cases. This will greatly help with review of the remaining document, because we can't say if proposed solution is good enough without a clear idea of why we need it.
sycl/doc/design/SYCLBINDesign.md
Outdated
|
||
#### Abstract module | ||
|
||
Each abstract module represents a set of kernels, the corresponding metadata, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reference to some SYCL spec term would be appreciated here. Do I understand correctly that abstract module is an equivalent of a kernel_bundle
and therefore a single SYCLBIN
file is a collection of kernel bundles of the same state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this description should be expanded. One key part of the definition is that all the IR modules and native device code images inside an abstract module represent the same set of kernels. That's why the metadata describing the kernels is at the abstract module level, effectively shared by all IR modules and native device code images.
The idea is that a single abstract module might have IR modules in SPIR-V and PTX format (or some subset of these formats). The native device code images might include Intel native GPU format, SASS (for CUDA), native x86 code (for the native CPU backend), etc.
In addition, a single SYCLBIN might have several abstract modules. For example, this could occur if the application creates a kernel bundle with two kernels, and those kernels happened to be bundled into separate modules.
Answering @AlexeySachkov: yes, all abstract modules in a SYCLBIN have the same state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to flesh out the description a little more. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it!
| ---------- | ------------------------------------------------------------------ | -------------- | | ||
| `uint32_t` | Magic number. (0x53594249) | | | ||
| `uint32_t` | SYCLBIN version number. | | | ||
| `uint8_t` | `sycl::bundle_state` corresponding to the contents of the SYCLBIN. | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: sycl::bundle_state
type is unspecified and list of its values can be modified by extensions (like ext_oneapi_source
that we have).
Is this field expected to precisely match definition of sycl::bundle_state
in our SYCL RT implementation, or is it simply an equivalent which is close but is not required to match precisely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values are defined a couple lines down. I think having it explicitly defined here is better, as to make it less interdependent on the SYCL definitions, which as you note may change. For the currently-defined values, they correspond 1:1 with how our headers currently define these, but if they change we would need a remapping, which should luckily be cheap.
sycl/doc/design/SYCLBINDesign.md
Outdated
| `uint32_t` | Byte size of the list of kernel names. | `K` | | ||
| `K` | List of kernel names. (String list) | | | ||
| `uint32_t` | Byte size of the list of imported symbols. | `I` | | ||
| `I` | List of imported symbols. (String list) | | | ||
| `uint32_t` | Byte size of the list of exported symbols. | `E` | | ||
| `E` | List of exported symbols. (String list) | | | ||
| `uint32_t` | Byte size of property set data. | `P` | | ||
| `P` | Property set data. | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said in #16873, I think that we should merge kernel names, imported and exported symbols into properties to have a single general-purpose metadata container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, if SYCLBIN becomes the default format, I would prefer we expand the property set into metadata fields, but for now I think I agree that there's no real point in "stealing" these values from the property sets.
|
||
|
||
*NOTE:* Optional features used is embedded in the property set data. | ||
*TODO:* Consolidate and/or document the property set data in this document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need that here? I would rather say that it is the same as what we have here or there.
If we want to document complete set of properties that we have and their format - I'm all for it, but I think that it should be a separate document, because that part is not unique to SYCLBIN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point here is that if SYCLBIN becomes the way we represent SYCL binaries in the future, there may not be a reason to have property sets as a separate structure. As such, we could merge it into the SYCLBIN structure, keeping the documentation of these properties (which would then simply be module metadata) in the same location as the rest of this information.
*TODO:* Do we need a target-specific blob inside this structure? E.g. for CUDA | ||
we may want to embed the SM version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are talking about a specific CUDA property, is it still IR module? I suppose that I don't understand well enough what is PTX and what is its place in a toolchain.
Target-specific for me means "native", i.e. as if PTX is incorrectly assumed as IR module. Also, all IR modules are expected to share the same properties within an abstract module, right? If so, then maybe we should propagate that property up to the abstract module level and have PTX modules compiled for different SM versions as separate abstract modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are talking about a specific CUDA property, is it still IR module?
Because of the forward-compatibility of SM archs, my understanding is that we want PTX to be considered an IR type.
Target-specific for me means "native", i.e. as if PTX is incorrectly assumed as IR module. Also, all IR modules are expected to share the same properties within an abstract module, right? If so, then maybe we should propagate that property up to the abstract module level and have PTX modules compiled for different SM versions as separate abstract modules?
If we were to put the SM architecture information at abstract module level, I don't see how an abstract module would ever have more than one IR module and more than one native device code image. Granted, having the exact same properties is somewhat rare, but I would expect it to be the case if the user was to compile for multiple SM versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it seems like we would probably want to annotate the IR module with the CUDA virtual architecture in the case when the IR is PTX. I was thinking that we would use the IR-level metadata for stuff like this.
module AOT compiled for a specific device, identified by the architecture | ||
string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we need to specify what is the architecture string here.
Is it target triple? Is it value passed to -fsycl-targets
? Is it value from architecture
enum from our device architecture extension?
It is not clear how RT can use this field without such specifiation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is actually a good question. I am not sure what the architecture string would be for cases like SASS binaries. For example, lets say we've compiled to PTX through our compiler, then load that to a kernel-bundle, compile that kernel bundle to native device code and then serialize that to SYCLBIN. The -fsycl-targets
would not be enough to express the architecture here, I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your example, the application has compiled the PTX to native code. Wouldn't you know the native architecture when this happens? It seems like the set of possible native CUDA architectures is a fixed set which would each map to one of the -fsycl-targets
values.
I wonder if there is a reason to use a string for the architecture names. Why couldn't this be an enumeration? We use an enumeration for the device architectures in sycl_ext_oneapi_device_architecture.
In any case, I agree with @AlexeySachkov. I think the set of possible architectures should be specified in the file format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your example, the application has compiled the PTX to native code. Wouldn't you know the native architecture when this happens? It seems like the set of possible native CUDA architectures is a fixed set which would each map to one of the
-fsycl-targets
values.
I will have to do some research here. I know PTX can be associated with SM architectures, but I don't know if the same applies to the native device code produced from PTX. It may be device-specific and as such more strict than the SM version.
I wonder if there is a reason to use a string for the architecture names. Why couldn't this be an enumeration? We use an enumeration for the device architectures in sycl_ext_oneapi_device_architecture.
Since the compiler will need to know about these architectures too, I am reluctant to try and match enum values between the runtime and library for this purpose.
sycl/doc/design/SYCLBINDesign.md
Outdated
<table> | ||
<tr> | ||
<th>Option</th> | ||
<th>Description</th> | ||
</tr> | ||
<tr> | ||
<td>`-fsyclbin`</td> | ||
<td> | ||
If this option is set, the output of the invocation is a SYCLBIN file with the | ||
.syclbin file extension. This skips the host-compilation invocation of the typical | ||
`-fsycl` pipeline, instead passing the output of the clang-offloat-packager | ||
invocation to clang-linker-wrapper together with the new `--syclbin` flag. | ||
|
||
Setting this option will override `-fsycl` and `-fsycl-device-only`. | ||
|
||
This option currently requires `--offload-new-driver` to be set. | ||
</td> | ||
</tr> | ||
<tr> | ||
<td>`--offload-ir`</td> | ||
<td>*TODO*</td> | ||
</tr> | ||
<tr> | ||
<td>`--offload-rdc`</td> | ||
<td>This is an alias of `-fgpu-rdc`.</td> | ||
</tr> | ||
</table> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: GitHub has a Markdown extension to render tables without using HTML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That extension doesn't seem to help with multi-line table cells, sadly.
directly, instead of extracting it from a host binary. This should be done when | ||
a new flag, `--syclbin`, is passed. In this case, the clang-linker-wrapper is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, .syclbin
files cannot be used if the output is not .syclbin
, right?
I'm not sure if I have a use case for that, just wanted to double-check the intent.
A potential use-case, though, is ability to embed .syclbin
into an application as if that device code was originally compiled as part of the application. I.e. you had your dynamically loadable .syclbin
, but at some point decided to embed it and stop shipping it separately. But that will have some implications on the API, I assume: we need to design then how to use such embedded SYCLBIN.
If this option is set, the output of the invocation is a SYCLBIN file with the | ||
.syclbin file extension. This skips the host-compilation invocation of the typical | ||
`-fsycl` pipeline, instead passing the output of the clang-offloat-packager | ||
invocation to clang-linker-wrapper together with the new `--syclbin` flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it expected that the clang-linker-wrapper
will simply be called with the --syclbin
option and the incoming files created from the clang-offload-packager
(something like clang-linker-wrapper --syclbin file-from-packager.out -o output-file-name.syclbin
)? Typical invocation of clang-linker-wrapper
contains a full link command line as well.
clang-linker-wrapper
will also be considered the 'final step' as well, with clang-linker-wrapper
creating the .syclbin
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The device code modules should still be linked with the device libraries, so most of the flags should be the same. The ones we can exclude are the ones passed to the host compiler, including specifying the linker. The currently proposed initial implementation of this in the linker wrapper doesn't care if the options are there, but we may as well exclude them.
| ---------- | ------------------------------------------------------------------ | -------------- | | ||
| `uint32_t` | Magic number. (0x53594249) | | | ||
| `uint32_t` | SYCLBIN version number. | | | ||
| `uint8_t` | `sycl::bundle_state` corresponding to the contents of the SYCLBIN. | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uint8_t
will throw off the alignment of binary, making it more difficult to parse. Let's make it a uint32_t
like the others. Or maybe just make every field uint64_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is correct that if you wanted to make alignment-based accessing of this binary, we would have to force-align to 64 bits. However, there is nothing in the current design requiring such alignment. We can do that, but even then we can do the alignment by padding. For layouts like strings we would need this anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is supposed to describe a file format, we should look ahead and specify the alignment/padding now.. otherwise it'll be open to interpretation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. An option here is to simply say it is byte-packed, but that does mean we cannot do in-memory interpretation of the elements.
Note that this is a design document and not and extension, so an argument could be made that the alignment is an implementation detail. That is not to say I'm against documenting it, but we are not specifying a format for use outside the implementation. Just to make sure we're on the same page about what this is specifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An option here is to simply say it is byte-packed, but that does mean we cannot do in-memory interpretation of the elements.
Agreed 100%. But not necessarily a great option for the guy who has to write an optimized version that can pull an arbitrary module out of a SYCLBIN file that's storing 10,000. ;-) Or the gal juggling a giant in-memory rep that needs to be written quickly to file. We can go a long way for helping both those out by avoiding uint8_t
, or at least putting them at the end of the struct, rather than earlier , or by giving a little thought to padding/alignment .
Just to make sure we're on the same page about what this is specifying.
I'm tots fine with that. Maybe this isn't the time or place, and I know it's not the highest priority at this stage. But, realistically, are we really going to make ANOTHER specification doc for that exact file format?
| ---------- | ------------------------------------------------------------------ | -------------- | | ||
| `uint32_t` | Magic number. (0x53594249) | | | ||
| `uint32_t` | SYCLBIN version number. | | | ||
| `uint8_t` | `sycl::bundle_state` corresponding to the contents of the SYCLBIN. | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add global metadata, then this "state" could be one of the metadata items, rather than a hard-coded "state" field in the header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not against separating it into a new global metadata structure, but given that it would always be there I'm not sure I follow how that would effectively be any different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been moved to a separate section, but the question about how much difference it makes still stands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that all the metadata would be an extensible format, much like our property sets. In fact, I think we could use exactly the same representation as property sets, unifying the two concepts.
The way you have it now, the metadata is just a hard-coded struct. If we ever add new metadata, we will need to bump the file version, and readers will need to maintain code to read all old versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we confident that unknown global metadata will be safe for an implementation to not know? Seems dangerous to even allow that.
If we do, this could also be achieved by adding a size of the metadata, but I'm still not confident about allowing the consumers to be ignorant to new fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed this offline with @gmlueck - Most relevant part of the discussion was that I will investigate the possibility of making the PropertySet structure usable as the structure for all metadata.
sycl/doc/design/SYCLBINDesign.md
Outdated
|
||
#### Abstract module | ||
|
||
Each abstract module represents a set of kernels, the corresponding metadata, 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this description should be expanded. One key part of the definition is that all the IR modules and native device code images inside an abstract module represent the same set of kernels. That's why the metadata describing the kernels is at the abstract module level, effectively shared by all IR modules and native device code images.
The idea is that a single abstract module might have IR modules in SPIR-V and PTX format (or some subset of these formats). The native device code images might include Intel native GPU format, SASS (for CUDA), native x86 code (for the native CPU backend), etc.
In addition, a single SYCLBIN might have several abstract modules. For example, this could occur if the application creates a kernel bundle with two kernels, and those kernels happened to be bundled into separate modules.
Answering @AlexeySachkov: yes, all abstract modules in a SYCLBIN have the same state.
|
||
![SYCLBIN binary file format illustration](SYCLBIN_file_format_illustration.svg) | ||
|
||
### Header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest a file format that is grouped by the type of each data structure. This is similar to how ELF works. For example:
[file header]
[abstract module header # 0]
...
[abstract module header # N]
[ir module header # 0]
...
[ir module header # N]
[device code header # 0]
...
[device code header # N]
[metadata entry # 0]
...
[metadata entry # N]
[byte table, contains strings, ir module content, device code content]
The headers then reference things they contain by their index and a count. For example:
struct file_header {
uint32_t magic;
uint32_t version;
uint32_t abstract_module_count; // Total number of abstract module headers in SYCLBIN
uint32_t ir_module_count; // Total number of IR module headers in SYCLBIN
uint32_t device_image_count; // Total number of device code image headers in SYCLBIN
uint32_t metadata_count; // Total number of metadata records in SYCLBIN
uint32_t byte_table_size; // Size (bytes) of byte table
uint32_t first_metadata; // Index of first global metadata record
uint32_t metadata_count; // Number of global metadata records
};
struct abstract_module_header {
uint32_t first_metadata; // Index of first metadata record for this abstract module
uint32_t metdata_count; // Number of metadata records for this abstract module
uint32_t first_ir_module; // Index of first IR module header for this abstract module
uint32_t ir_module_count; // Number of IR module headers for this abstract module
uint32_t first_device_code; // Index of first device code header for this abstract module
uint32_t device_code_count; // Number of device code headers for this abstract module
};
(Consider whether some of these should be 64-bits instead of 32, especially byte_table_size
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I am always reluctant to doubt ELF, I don't see how that structure would be beneficial to our use-case. The currently proposed design makes the relation of metadata and modules/code a lot clearer, in my opinion. My reasoning for this is that there is a 1:1 relation between IR module metadata and their collection of raw bytes (same goes for native device code images.) That means, bundling them together means we don't need to duplicate information like number of IR modules and number of IR module metadata entries. Likewise, it means that any parser does not need to make jumps when parsing a specific IR module.
The latter can of course be negative if we wanted to do selective module extraction, i.e. if we wanted to load only some modules in a SYCLBIN, but I don't necessarily see that being the case and the difference would be minimal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that in some time in the future we'll need selective module extraction. Probably because someone has a .syclbin with 10,000 modules and we need to optimize. This is also why I think we need to be forward looking when it comes to alignment and padding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that in some time in the future we'll need selective module extraction. Probably because someone has a .syclbin with 10,000 modules and we need to optimize.
Even then, do we think there is much more benefit to making the entries leaps shorter? The number of indirections should be the same, but the structure would less associated. Granted, the indirections are closer together, meaning the cache might play ball a little better when doing selective reading, I still don't see a larger benefit to breaking the layered concept proposed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @cperkinsintel. Using an ELF-like structure would make it more efficient to read just selected IR modules / device images from the SYCLBIN. This is a likely scenario also. Imagine you are creating a kernel_bundle
from a SYCLBIN that contains device code images for Intel devices, CUDA devices, etc. The kernel_bundle
will be associated with some specific set of devices that are all from the same backend (because all of the devices must be contained in the kernel bundle's context). Therefore, we really only need to read the IR modules / device images that are compatible with those devices.
The structure you propose now requires you to walk through the whole contents of the SYCLBIN file to figure out what IR modules / device images it contains. Looking just at the IR Module representation as an example:
Type | Description | Value variable |
---|---|---|
uint8_t |
IR type. | |
uint32_t |
Byte size of the raw IR bytes. | IB |
uint8_t[IB] |
Raw IR bytes. |
To find all the IR modules of a certain type, I need a loop that reads 40 bytes of "IR type" and "byte size", then seek forward B
bytes in the file, read another 40 bytes, etc. This is very inefficient because the data is spread out across many small blocks of 40 bytes. It's much more efficient if you can read all the information you need in one contiguous block of memory.
The efficiency isn't so much at the cache level (though it exists there too). It's at the memory page level and the disk block level. It's better if all the data you need can be retrieved by doing a small number of reads from the file. Alternatively, you may mmap the file into memory. In that case, it's more efficient if all the data you need resides in a small number of memory pages.
An ELF-like structure solves this by locating all the "table of contents" information at the beginning of the file. You can read a small amount of information from the start of the file, and then you know the type and location of all the IR modules / device images. You can then seek forward in the file and read just those modules / images that you care about. Alternatively, if you mmap the file, you can compute a pointer to the modules / images that you care about, and then the O/S will page in only those memory pages that contain those modules / images.
BTW, the mmap scenario is one reason why it's advantageous to naturally align fields in the file format. If a uint32_t
is always 4-byte aligned in the file, you can mmap the file and then read the uint32_t
with an aligned load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my POV, using ELF saves maintenance costs on tools/libraries to manipulate custom format. For instance, #16873 adds new tool to examine the binary file. Using ELF format enables uses of existing tools like readelf and objdump to do the same task.
Potentially, we can re-use the code OpenMP offload team develops for Intel GPU support. FYI: llvm/llvm-project@f7b3559
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that my proposed format above is not ELF. It's "inspired by ELF". I'm not sure how a SYCLBIN format would look if it really used ELF. Would it be an ELF file with no host code (no .text section, etc.)?
Signed-off-by: Larsen, Steffen <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
| ---------- | ------------------------------------------------------------------ | -------------- | | ||
| `uint32_t` | Magic number. (0x53594249) | | | ||
| `uint32_t` | SYCLBIN version number. | | | ||
| `uint8_t` | `sycl::bundle_state` corresponding to the contents of the SYCLBIN. | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that all the metadata would be an extensible format, much like our property sets. In fact, I think we could use exactly the same representation as property sets, unifying the two concepts.
The way you have it now, the metadata is just a hard-coded struct. If we ever add new metadata, we will need to bump the file version, and readers will need to maintain code to read all old versions.
sycl/doc/design/SYCLBINDesign.md
Outdated
| Type | Description | Value variable | | ||
| ---------- | ------------------------------------------ | -------------- | | ||
| `uint64_t` | Byte size of the list of abstract modules. | `B` | | ||
| `B` | List of abstract modules. | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what B
represents here. In the first row, I assumed B
represents the number of bytes occupied by the list of abstract modules. This is an integer value. If that is the case, I don't understand the second row. How can an integer value be a type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, "Type" was just a size-specification, but I seem to have forgotten to change them to a proper type. I believe it should be something like uint8_t[B]
instead.
*TODO:* Do we need a target-specific blob inside this structure? E.g. for CUDA | ||
we may want to embed the SM version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it seems like we would probably want to annotate the IR module with the CUDA virtual architecture in the case when the IR is PTX. I was thinking that we would use the IR-level metadata for stuff like this.
module AOT compiled for a specific device, identified by the architecture | ||
string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your example, the application has compiled the PTX to native code. Wouldn't you know the native architecture when this happens? It seems like the set of possible native CUDA architectures is a fixed set which would each map to one of the -fsycl-targets
values.
I wonder if there is a reason to use a string for the architecture names. Why couldn't this be an enumeration? We use an enumeration for the device architectures in sycl_ext_oneapi_device_architecture.
In any case, I agree with @AlexeySachkov. I think the set of possible architectures should be specified in the file format.
Co-authored-by: Michael Toguchi <[email protected]>
Co-authored-by: Greg Lueck <[email protected]>
Signed-off-by: Larsen, Steffen <[email protected]>
This commit adds a design document detailing the SYCLBIN binary format for representing SYCL device kernel binaries to be loaded dynamically at runtime. Additionally, the design document details how this is to be handled by the SYCL runtime, driver and clang tooling.