[BACKEND] Add opt-in TileIR backend integration#703
Conversation
There was a problem hiding this comment.
Is there performance results on Blackwell?
There was a problem hiding this comment.
If you can specify the Blackwell GPUs you’re interested in (e.g., B200 or RTX PRO 6000), I can add the benchmark results accordingly
There was a problem hiding this comment.
If you can specify the Blackwell GPUs you’re interested in (e.g., B200 or RTX PRO 6000), I can add the benchmark results accordingly
B200, please.
sunnycase
left a comment
There was a problem hiding this comment.
Thanks for the change. One question about the API placement: since tileir does not seem to expose GPU-specific details here, would it be a better fit under tle rather than tle.gpu? That may keep the namespace aligned with the abstraction level, unless there is a planned GPU-specific surface that I am missing.
I put the load-view-token related APIs under tle.gpu.tile based on a suggestion from @Vincent-Xiao. I’m not sure this is the best approach, but it’s relatively easy to change since it is just an alias. |
Add upstream TileIR docs, tests, and assets. Record the FlagTree vendor provenance and drop the unused backend name.conf file.
Remove internal release markers from the ported TileIR tutorial benchmark sources while preserving SPDX headers and ordinary section comments.
47b0477 to
01680a4
Compare
|
@Vincent-Xiao Can you please approve CI workflows for my latest commit? |
01680a4 to
cbf1dc5
Compare
Summary
This PR integrates Triton-to-Tile-IR into FlagTree as an independent
tileirbackend.TileIR has its own compiler and driver and is installed alongside the existing NVIDIA and AMD backends. The common
python/tritonlayer only provides backend-neutral routing, compiler, driver, and language-extension hooks. TileIR-specific policy and implementation remain underthird_party/tileir.Runtime behavior is unchanged unless
FLAGTREE_USE_TILEIR=1is set.Design
TileIR is installed as an independent backend with its own
TileIRBackendandTileIRDriver. On NVIDIA systems,CudaDriverremains the active hardware driver and produces the initialcudatarget. Routing may select atileirtarget for an individual kernel without replacing or modifying the NVIDIA backend.The shared Python changes are backend-neutral hooks:
The common interfaces under
python/triton/backendsdo not import TileIR implementation code. TileIR-specific routing, compilation, and driver behavior remain underthird_party/tileir.With
FLAGTREE_USE_TILEIR=1:tle.gpu.tileview/token subset route to TileIR.The policy is implemented in the TileIR backend router.
Backend language APIs use the generic
tl.extregistry.tle.gpu.tile.<name>lazily forwards totl.ext.<name>, while the TileIR implementation remains inextend_core.py,extend_semantic.py, andtriton_tileir.cc. Ordinary TLE imports therefore do not depend on TileIR.Implementation
This PR:
tl.ext;The TileIR source is based on upstream commit
a3befd959b02410cfbdac08d91d817b0ec0b3e33.cuda-tile is pinned at commit
2e5ccba66fb3afdba34b26cf358418283027c248.The upstream baseline, dependency pins, build requirements, LLVM compatibility handling, and FlagTree-local vendor changes are recorded in the TileIR backend README.
Validation
Load View Token Ordering
01-load-view-token-ordering.pyvalidates:load_view_tkoandstore_view_tko;Mixed Kernel Routing
02-mixed-kernel-routing.pyvalidates in one process:Triton TileIR Benchmarks
03-triton-tileir-benchmarks.pyprovides:The available benchmark families are:
Tutorial usage and reference H100 performance results are documented in the TileIR tutorials.
CI
The dedicated TileIR CI workflow:
tileiras;Existing backend CI continues to use the common Triton frontend without importing TileIR implementation code.