-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Docs] Add SYCLBIN feature and format design document #16872
base: sycl
Are you sure you want to change the base?
Changes from 7 commits
17576dd
60ff95f
b54afa8
12a6cad
0aad200
1277bd9
1d2b5c8
73b9c07
c7c1512
edca48e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,234 @@ | ||||||||||||||
# SYCLBIN - A format for separately compiled SYCL device code | ||||||||||||||
|
||||||||||||||
This design document details the SYCLBIN binary format used for storing SYCL | ||||||||||||||
device binaries to be loaded dynamically by the SYCL runtime. It also details | ||||||||||||||
how the toolchain produces, links and packages these binaries, as well as how | ||||||||||||||
the SYCL runtime library handles files of this format. | ||||||||||||||
|
||||||||||||||
## SYCLBIN binary format | ||||||||||||||
|
||||||||||||||
The files produced by the new compilation path will follow the format described | ||||||||||||||
in this section. The intention of defining a new format for these is to give | ||||||||||||||
the DPC++ implementation an extendable and lightweight wrapper around the | ||||||||||||||
multiple modules and corresponding metadata captured in the SYCLBIN file. | ||||||||||||||
The content of the SYCLBIN may be contained as an entry in the offloading binary | ||||||||||||||
format produced by the clang-offload-packager, as described in | ||||||||||||||
[ClangOffloadPackager.rst](https://github.com/intel/llvm/blob/sycl/clang/docs/ClangOffloadPackager.rst). | ||||||||||||||
|
||||||||||||||
The following illustration gives an overview of how the file format is | ||||||||||||||
structured. | ||||||||||||||
|
||||||||||||||
![SYCLBIN binary file format illustration](SYCLBIN_file_format_illustration.svg) | ||||||||||||||
gmlueck marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
|
||||||||||||||
### Header | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd suggest a file format that is grouped by the type of each data structure. This is similar to how ELF works. For example:
The headers then reference things they contain by their index and a count. For example:
(Consider whether some of these should be 64-bits instead of 32, especially There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Though I am always reluctant to doubt ELF, I don't see how that structure would be beneficial to our use-case. The currently proposed design makes the relation of metadata and modules/code a lot clearer, in my opinion. My reasoning for this is that there is a 1:1 relation between IR module metadata and their collection of raw bytes (same goes for native device code images.) That means, bundling them together means we don't need to duplicate information like number of IR modules and number of IR module metadata entries. Likewise, it means that any parser does not need to make jumps when parsing a specific IR module. The latter can of course be negative if we wanted to do selective module extraction, i.e. if we wanted to load only some modules in a SYCLBIN, but I don't necessarily see that being the case and the difference would be minimal. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My guess is that in some time in the future we'll need selective module extraction. Probably because someone has a .syclbin with 10,000 modules and we need to optimize. This is also why I think we need to be forward looking when it comes to alignment and padding. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Even then, do we think there is much more benefit to making the entries leaps shorter? The number of indirections should be the same, but the structure would less associated. Granted, the indirections are closer together, meaning the cache might play ball a little better when doing selective reading, I still don't see a larger benefit to breaking the layered concept proposed here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @cperkinsintel. Using an ELF-like structure would make it more efficient to read just selected IR modules / device images from the SYCLBIN. This is a likely scenario also. Imagine you are creating a The structure you propose now requires you to walk through the whole contents of the SYCLBIN file to figure out what IR modules / device images it contains. Looking just at the IR Module representation as an example:
To find all the IR modules of a certain type, I need a loop that reads 40 bytes of "IR type" and "byte size", then seek forward The efficiency isn't so much at the cache level (though it exists there too). It's at the memory page level and the disk block level. It's better if all the data you need can be retrieved by doing a small number of reads from the file. Alternatively, you may mmap the file into memory. In that case, it's more efficient if all the data you need resides in a small number of memory pages. An ELF-like structure solves this by locating all the "table of contents" information at the beginning of the file. You can read a small amount of information from the start of the file, and then you know the type and location of all the IR modules / device images. You can then seek forward in the file and read just those modules / images that you care about. Alternatively, if you mmap the file, you can compute a pointer to the modules / images that you care about, and then the O/S will page in only those memory pages that contain those modules / images. BTW, the mmap scenario is one reason why it's advantageous to naturally align fields in the file format. If a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From my POV, using ELF saves maintenance costs on tools/libraries to manipulate custom format. For instance, #16873 adds new tool to examine the binary file. Using ELF format enables uses of existing tools like readelf and objdump to do the same task. Potentially, we can re-use the code OpenMP offload team develops for Intel GPU support. FYI: llvm/llvm-project@f7b3559 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that my proposed format above is not ELF. It's "inspired by ELF". I'm not sure how a SYCLBIN format would look if it really used ELF. Would it be an ELF file with no host code (no .text section, etc.)? |
||||||||||||||
|
||||||||||||||
The header segment appears as the first part of the SYCLBIN file. Like many | ||||||||||||||
other file-formats, it defines a magic number to help identify the format, which | ||||||||||||||
is 0x53594249 (or "SYBI".) Immediately following the magic number is the version | ||||||||||||||
number, which is used by SYCLBIN consumers when parsing data in the rest of the | ||||||||||||||
file. | ||||||||||||||
|
||||||||||||||
| Type | Description | Value variable | | ||||||||||||||
| ---------- | ------------------------------------------------------------------ | -------------- | | ||||||||||||||
| `uint32_t` | Magic number. (0x53594249) | | | ||||||||||||||
| `uint32_t` | SYCLBIN version number. | | | ||||||||||||||
|
||||||||||||||
#### Global metadata | ||||||||||||||
|
||||||||||||||
Immediately after the header is the global metadata segment of the SYCLBIN, | ||||||||||||||
containing information about the contained SYCLBIN file. | ||||||||||||||
|
||||||||||||||
| Type | Description | Value variable | | ||||||||||||||
| ---------- | ------------------------------------------------------------------ | -------------- | | ||||||||||||||
| `uint8_t` | `sycl::bundle_state` corresponding to the contents of the SYCLBIN. | | | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note: Is this field expected to precisely match definition of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The values are defined a couple lines down. I think having it explicitly defined here is better, as to make it less interdependent on the SYCL definitions, which as you note may change. For the currently-defined values, they correspond 1:1 with how our headers currently define these, but if they change we would need a remapping, which should luckily be cheap. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is correct that if you wanted to make alignment-based accessing of this binary, we would have to force-align to 64 bits. However, there is nothing in the current design requiring such alignment. We can do that, but even then we can do the alignment by padding. For layouts like strings we would need this anyway. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if this is supposed to describe a file format, we should look ahead and specify the alignment/padding now.. otherwise it'll be open to interpretation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alright. An option here is to simply say it is byte-packed, but that does mean we cannot do in-memory interpretation of the elements. Note that this is a design document and not and extension, so an argument could be made that the alignment is an implementation detail. That is not to say I'm against documenting it, but we are not specifying a format for use outside the implementation. Just to make sure we're on the same page about what this is specifying. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Agreed 100%. But not necessarily a great option for the guy who has to write an optimized version that can pull an arbitrary module out of a SYCLBIN file that's storing 10,000. ;-) Or the gal juggling a giant in-memory rep that needs to be written quickly to file. We can go a long way for helping both those out by avoiding
I'm tots fine with that. Maybe this isn't the time or place, and I know it's not the highest priority at this stage. But, realistically, are we really going to make ANOTHER specification doc for that exact file format? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you add global metadata, then this "state" could be one of the metadata items, rather than a hard-coded "state" field in the header. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not against separating it into a new global metadata structure, but given that it would always be there I'm not sure I follow how that would effectively be any different? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It has been moved to a separate section, but the question about how much difference it makes still stands. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking that all the metadata would be an extensible format, much like our property sets. In fact, I think we could use exactly the same representation as property sets, unifying the two concepts. The way you have it now, the metadata is just a hard-coded struct. If we ever add new metadata, we will need to bump the file version, and readers will need to maintain code to read all old versions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we confident that unknown global metadata will be safe for an implementation to not know? Seems dangerous to even allow that. If we do, this could also be achieved by adding a size of the metadata, but I'm still not confident about allowing the consumers to be ignorant to new fields. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Discussed this offline with @gmlueck - Most relevant part of the discussion was that I will investigate the possibility of making the PropertySet structure usable as the structure for all metadata. |
||||||||||||||
|
||||||||||||||
The `sycl::bundle_state` is an integer with the values as follows: | ||||||||||||||
|
||||||||||||||
| `sycl::bundle_state` | Value | | ||||||||||||||
| -------------------- | ----- | | ||||||||||||||
| `input` | 0 | | ||||||||||||||
| `object` | 1 | | ||||||||||||||
| `executable` | 2 | | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
### Body | ||||||||||||||
|
||||||||||||||
Following the global metadata is the body of the SYCLBIN file. The body consists | ||||||||||||||
of a list of abstract modules. | ||||||||||||||
|
||||||||||||||
| Type | Description | Value variable | | ||||||||||||||
| ---------- | ------------------------------------------ | -------------- | | ||||||||||||||
| `uint64_t` | Byte size of the list of abstract modules. | `B` | | ||||||||||||||
| `B` | List of abstract modules. | | | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand what There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Originally, "Type" was just a size-specification, but I seem to have forgotten to change them to a proper type. I believe it should be something like |
||||||||||||||
|
||||||||||||||
|
||||||||||||||
#### Abstract module | ||||||||||||||
|
||||||||||||||
An abstract module is a collection of device binaries that share properties, | ||||||||||||||
including, but not limited to: kernel names, imported symbols, exported symbols, | ||||||||||||||
aspect requirements, and specialization constants. | ||||||||||||||
|
||||||||||||||
The device binaries contained inside an abstract module must either be an IR | ||||||||||||||
module or a native device code image. IR modules contain device binaries in some | ||||||||||||||
known intermediate representation, such as SPIR-V, while the native device code | ||||||||||||||
images can be an architecture-specific binary format. There is no requirement | ||||||||||||||
that all device binaries in an abstract module is usable on the same device or | ||||||||||||||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
are specific to a single vendor. | ||||||||||||||
|
||||||||||||||
| Type | Description | Value variable | | ||||||||||||||
| ---------- | ----------------------------------------------- | -------------- | | ||||||||||||||
| `uint64_t` | Byte size of the list of the metadata. | `M` | | ||||||||||||||
| `M` | Module metadata. | | | ||||||||||||||
| `uint64_t` | Byte size of list of IR modules. | `IR` | | ||||||||||||||
| `IR` | List of IR modules. | | | ||||||||||||||
| `uint64_t` | Byte size of list of native device code images. | `ND` | | ||||||||||||||
| `ND` | List of native device code images. | | | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
##### Module metadata | ||||||||||||||
|
||||||||||||||
The module metadata contains the following information about the contents of the | ||||||||||||||
module. | ||||||||||||||
|
||||||||||||||
| Type | Description | Value variable | | ||||||||||||||
| ---------- | -------------------------------------------------------------- | -------------- | | ||||||||||||||
| `uint32_t` | Byte size of property set data. | `P` | | ||||||||||||||
| `P` | Property set data. | | | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
*NOTE:* Optional features used is embedded in the property set data. | ||||||||||||||
*TODO:* Consolidate and/or document the property set data in this document. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we really need that here? I would rather say that it is the same as what we have here or there. If we want to document complete set of properties that we have and their format - I'm all for it, but I think that it should be a separate document, because that part is not unique to SYCLBIN. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The point here is that if SYCLBIN becomes the way we represent SYCL binaries in the future, there may not be a reason to have property sets as a separate structure. As such, we could merge it into the SYCLBIN structure, keeping the documentation of these properties (which would then simply be module metadata) in the same location as the rest of this information. |
||||||||||||||
|
||||||||||||||
|
||||||||||||||
##### IR module | ||||||||||||||
|
||||||||||||||
An IR module contains the binary data for the corresponding module compiled to a | ||||||||||||||
given IR representation, identified by the IR type field. | ||||||||||||||
|
||||||||||||||
| Type | Description | Value variable | | ||||||||||||||
| ---------- | ------------------------------ | -------------- | | ||||||||||||||
| `uint8_t` | IR type. | | | ||||||||||||||
| `uint32_t` | Byte size of the raw IR bytes. | `IB` | | ||||||||||||||
| `IB` | Raw IR bytes. | | | ||||||||||||||
|
||||||||||||||
*TODO:* Do we need a target-specific blob inside this structure? E.g. for CUDA | ||||||||||||||
we may want to embed the SM version. | ||||||||||||||
Comment on lines
+116
to
+117
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we are talking about a specific CUDA property, is it still IR module? I suppose that I don't understand well enough what is PTX and what is its place in a toolchain. Target-specific for me means "native", i.e. as if PTX is incorrectly assumed as IR module. Also, all IR modules are expected to share the same properties within an abstract module, right? If so, then maybe we should propagate that property up to the abstract module level and have PTX modules compiled for different SM versions as separate abstract modules? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Because of the forward-compatibility of SM archs, my understanding is that we want PTX to be considered an IR type.
If we were to put the SM architecture information at abstract module level, I don't see how an abstract module would ever have more than one IR module and more than one native device code image. Granted, having the exact same properties is somewhat rare, but I would expect it to be the case if the user was to compile for multiple SM versions. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that it seems like we would probably want to annotate the IR module with the CUDA virtual architecture in the case when the IR is PTX. I was thinking that we would use the IR-level metadata for stuff like this. |
||||||||||||||
|
||||||||||||||
|
||||||||||||||
##### IR types | ||||||||||||||
|
||||||||||||||
The IR types must be one of the following values: | ||||||||||||||
|
||||||||||||||
| IR type | Value | | ||||||||||||||
| ------- | ----- | | ||||||||||||||
| SPIR-V | 0 | | ||||||||||||||
| PTX | 1 | | ||||||||||||||
| AMDGCN | 2 | | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
##### Native device code image | ||||||||||||||
|
||||||||||||||
An native device code image contains the binary data for the corresponding | ||||||||||||||
module AOT compiled for a specific device, identified by the architecture | ||||||||||||||
string. | ||||||||||||||
Comment on lines
+146
to
+147
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that we need to specify what is the architecture string here. Is it target triple? Is it value passed to It is not clear how RT can use this field without such specifiation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is actually a good question. I am not sure what the architecture string would be for cases like SASS binaries. For example, lets say we've compiled to PTX through our compiler, then load that to a kernel-bundle, compile that kernel bundle to native device code and then serialize that to SYCLBIN. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In your example, the application has compiled the PTX to native code. Wouldn't you know the native architecture when this happens? It seems like the set of possible native CUDA architectures is a fixed set which would each map to one of the I wonder if there is a reason to use a string for the architecture names. Why couldn't this be an enumeration? We use an enumeration for the device architectures in sycl_ext_oneapi_device_architecture. In any case, I agree with @AlexeySachkov. I think the set of possible architectures should be specified in the file format. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I will have to do some research here. I know PTX can be associated with SM architectures, but I don't know if the same applies to the native device code produced from PTX. It may be device-specific and as such more strict than the SM version.
Since the compiler will need to know about these architectures too, I am reluctant to try and match enum values between the runtime and library for this purpose. |
||||||||||||||
|
||||||||||||||
| Type | Description | Value variable | | ||||||||||||||
| ---------- | ------------------------------------------------ | -------------- | | ||||||||||||||
| `uint32_t` | Byte size of the architecture string. | `A` | | ||||||||||||||
| `A` | Architecture string. | | | ||||||||||||||
| `uint32_t` | Byte size of the native device code image bytes. | `NB` | | ||||||||||||||
| NB | Native device code image bytes. | | | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
### SYCLBIN version changelog | ||||||||||||||
|
||||||||||||||
The SYCLBIN format is subject to change, but any such changes must come with an | ||||||||||||||
increment to the version number in the header and a subsection to this section | ||||||||||||||
describing the change. | ||||||||||||||
|
||||||||||||||
#### Version 1 | ||||||||||||||
|
||||||||||||||
* Initial version of the layout. | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
## Clang driver changes | ||||||||||||||
|
||||||||||||||
The clang driver needs to accept the following new flags: | ||||||||||||||
|
||||||||||||||
<table> | ||||||||||||||
<tr> | ||||||||||||||
<th>Option</th> | ||||||||||||||
<th>Description</th> | ||||||||||||||
</tr> | ||||||||||||||
<tr> | ||||||||||||||
<td>`-fsyclbin`</td> | ||||||||||||||
<td> | ||||||||||||||
If this option is set, the output of the invocation is a SYCLBIN file with the | ||||||||||||||
.syclbin file extension. This skips the host-compilation invocation of the typical | ||||||||||||||
`-fsycl` pipeline, instead passing the output of the clang-offload-packager | ||||||||||||||
invocation to clang-linker-wrapper together with the new `--syclbin` flag. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it expected that the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The device code modules should still be linked with the device libraries, so most of the flags should be the same. The ones we can exclude are the ones passed to the host compiler, including specifying the linker. The currently proposed initial implementation of this in the linker wrapper doesn't care if the options are there, but we may as well exclude them. |
||||||||||||||
|
||||||||||||||
Setting this option will override `-fsycl` and `-fsycl-device-only`. | ||||||||||||||
|
||||||||||||||
This option currently requires `--offload-new-driver` to be set. | ||||||||||||||
steffenlarsen marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
</td> | ||||||||||||||
</tr> | ||||||||||||||
<tr> | ||||||||||||||
<td>`--offload-ir`</td> | ||||||||||||||
<td>*TODO*</td> | ||||||||||||||
</tr> | ||||||||||||||
<tr> | ||||||||||||||
<td>`--offload-rdc`</td> | ||||||||||||||
<td>This is an alias of `-fgpu-rdc`.</td> | ||||||||||||||
</tr> | ||||||||||||||
</table> | ||||||||||||||
|
||||||||||||||
Additionally, `-fsycl-link` should work with .syclbin files. Semantics of how | ||||||||||||||
SYCLBIN files are linked together is yet to be specified. | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
## clang-linker-wrapper changes | ||||||||||||||
|
||||||||||||||
The clang-linker-wrapper is responsible for doing post-processing and linking of | ||||||||||||||
device binaries, as described in [OffloadDesign.md](OffloadDesign.md). | ||||||||||||||
However, to support SYCLBIN files, the clang-linker-wrapper must be able to | ||||||||||||||
unpack an offload binary (as described in | ||||||||||||||
[ClangOffloadPackager.rst](https://github.com/intel/llvm/blob/sycl/clang/docs/ClangOffloadPackager.rst)) | ||||||||||||||
directly, instead of extracting it from a host binary. This should be done when | ||||||||||||||
a new flag, `--syclbin`, is passed. In this case, the clang-linker-wrapper is | ||||||||||||||
Comment on lines
+213
to
+214
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So, I'm not sure if I have a use case for that, just wanted to double-check the intent. A potential use-case, though, is ability to embed |
||||||||||||||
responsible to package the resulting device binaries and produced metadata into | ||||||||||||||
the format described in [SYCLBIN binary format section](#syclbin-binary-format). | ||||||||||||||
Additionally, in this case the clang-linker-wrapper will skip the wrapping of | ||||||||||||||
the device code and the host code linking stage, as there is no host code to | ||||||||||||||
wrap the device code in and link. | ||||||||||||||
|
||||||||||||||
*TODO:* Describe the details of linking SYCLBIN files. | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
## SYCL runtime library changes | ||||||||||||||
|
||||||||||||||
Using the interfaces from the | ||||||||||||||
[sycl_ext_oneapi_syclbin](../extensions/proposed/sycl_ext_oneapi_syclbin.asciidoc) | ||||||||||||||
extension, the runtime must be able to parse the SYCLBIN format, as described in | ||||||||||||||
the [SYCLBIN binary format section](#syclbin-binary-format). To avoid large | ||||||||||||||
amounts of code duplication, the runtime uses the implementation of SYCLBIN | ||||||||||||||
reading and writing implemented in LLVM. | ||||||||||||||
|
||||||||||||||
When creating a `kernel_bundle` from a SYCLBIN file, the runtime reads the | ||||||||||||||
contents of the SYCLBIN file and creates the corresponding data structure from | ||||||||||||||
it. In order for the SYCL runtime library's existing logic to use the binaries, | ||||||||||||||
the runtime then creates a collection of `sycl_device_binary_struct` objects and | ||||||||||||||
its constituents, pointing to the data in the parsed SYCLBIN object. Passing | ||||||||||||||
these objects to the runtime library's `ProgramManager` allows it to reuse the | ||||||||||||||
logic for compiling, linking and building SYCL binaries. | ||||||||||||||
|
||||||||||||||
In the other direction, users can request the "contents" of a `kernel_bundle`. | ||||||||||||||
When this is done, the runtime library must ensure that a SYCLBIN file is | ||||||||||||||
available for the contents of the `kernel_bundle` and must then write the | ||||||||||||||
SYCLBIN object to the corresponding binary representation in the format | ||||||||||||||
described in the [SYCLBIN binary format section](#syclbin-binary-format). In cases | ||||||||||||||
where the `kernel_bundle` was created with a SYCLBIN file, the SYCLBIN | ||||||||||||||
representation is immediately available and can be serialized directly. In other | ||||||||||||||
cases, the runtime library creates a new SYCLBIN object from the binaries | ||||||||||||||
associated with the `kernel_bundle`, then serializes it and returns the result. | ||||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a section describing motivation and anticipated use-cases. This will greatly help with review of the remaining document, because we can't say if proposed solution is good enough without a clear idea of why we need it.