[BACKEND] Add opt-in TileIR backend integration by KingsleyLiu-NV · Pull Request #703 · flagos-ai/FlagTree

KingsleyLiu-NV · 2026-06-17T02:41:11Z

Summary

This PR integrates Triton-to-Tile-IR into FlagTree as an independent tileir backend.

TileIR has its own compiler and driver and is installed alongside the existing NVIDIA and AMD backends. The common python/triton layer only provides backend-neutral routing, compiler, driver, and language-extension hooks. TileIR-specific policy and implementation remain under third_party/tileir.

Runtime behavior is unchanged unless FLAGTREE_USE_TILEIR=1 is set.

Design

TileIR is installed as an independent backend with its own TileIRBackend and TileIRDriver. On NVIDIA systems, CudaDriver remains the active hardware driver and produces the initial cuda target. Routing may select a tileir target for an individual kernel without replacing or modifying the NVIDIA backend.

The shared Python changes are backend-neutral hooks:

python/triton/runtime/jit.py
├── python/triton/backends/__init__.py::route_target()
│   └── TileIRBackend.route_target()
│       └── third_party/tileir/backend/router.py
└── python/triton/compiler/compiler.py
    ├── get_backend(final_target)
    ├── backend.make_ir(...)
    └── get_driver(final_target, active_driver)

The common interfaces under python/triton/backends do not import TileIR implementation code. TileIR-specific routing, compilation, and driver behavior remain under third_party/tileir.

With FLAGTREE_USE_TILEIR=1:

CUDA kernels without TLE route to TileIR.
Kernels using only the supported tle.gpu.tile view/token subset route to TileIR.
Other or unknown TLE usage remains on native NVIDIA.
Non-CUDA targets remain unchanged.

The policy is implemented in the TileIR backend router.

Backend language APIs use the generic tl.ext registry. tle.gpu.tile.<name> lazily forwards to tl.ext.<name>, while the TileIR implementation remains in extend_core.py, extend_semantic.py, and triton_tileir.cc. Ordinary TLE imports therefore do not depend on TileIR.

Implementation

This PR:

registers TileIR as an independent FlagTree backend;
adds generic per-kernel target routing;
selects the compiler and kernel driver from the final routed target;
adds a generic backend language-extension registry through tl.ext;
adds TileIR-specific routing, frontend, lowering, and driver integration;
adds TLE view/token operations and their C++ builder bindings;
adds compatibility handling for FlagTree's Triton 3.6 and LLVM versions;
adds tutorials, correctness checks, benchmarks, and TileIR CI coverage.

The TileIR source is based on upstream commit a3befd959b02410cfbdac08d91d817b0ec0b3e33.

cuda-tile is pinned at commit 2e5ccba66fb3afdba34b26cf358418283027c248.

The upstream baseline, dependency pins, build requirements, LLVM compatibility handling, and FlagTree-local vendor changes are recorded in the TileIR backend README.

Validation

Load View Token Ordering

01-load-view-token-ordering.py validates:

TLE tensor-view operations;
memory-token creation and chaining;
load_view_tko and store_view_tko;
successful TileIR execution;
expected native NVIDIA rejection when TileIR routing is disabled.

Mixed Kernel Routing

02-mixed-kernel-routing.py validates in one process:

a plain Triton kernel routed to TileIR;
a non-TileIR TLE kernel routed to native NVIDIA;
correct results from both paths;
expected TileIR and native cache artifacts.

Triton TileIR Benchmarks

03-triton-tileir-benchmarks.py provides:

self-contained Triton kernels;
native NVIDIA and TileIR execution;
correctness checks;
CUPTI kernel-time measurements;
seven benchmark families;
a curated CI subset with three representative pairs per case.

The available benchmark families are:

bmm
fmha
linear_bias_act
mla
mla_decoding
matmul
rope

Tutorial usage and reference H100 performance results are documented in the TileIR tutorials.

CI

The dedicated TileIR CI workflow:

builds FlagTree with TileIR and CTK 13.3;
verifies tileiras;
runs the load-view token-ordering tutorial;
runs the mixed-routing tutorial;
runs the curated native NVIDIA and TileIR benchmark subset.

Existing backend CI continues to use the common Triton frontend without importing TileIR implementation code.

CLAassistant · 2026-06-17T02:41:18Z

All committers have signed the CLA.

zhzhcookie · 2026-06-18T10:09:44Z

Is there performance results on Blackwell?

If you can specify the Blackwell GPUs you’re interested in (e.g., B200 or RTX PRO 6000), I can add the benchmark results accordingly

If you can specify the Blackwell GPUs you’re interested in (e.g., B200 or RTX PRO 6000), I can add the benchmark results accordingly

B200, please.

sunnycase

Thanks for the change. One question about the API placement: since tileir does not seem to expose GPU-specific details here, would it be a better fit under tle rather than tle.gpu? That may keep the namespace aligned with the abstraction level, unless there is a planned GPU-specific surface that I am missing.

KingsleyLiu-NV · 2026-06-25T03:07:26Z

Thanks for the change. One question about the API placement: since tileir does not seem to expose GPU-specific details here, would it be a better fit under tle rather than tle.gpu? That may keep the namespace aligned with the abstraction level, unless there is a planned GPU-specific surface that I am missing.

I put the load-view-token related APIs under tle.gpu.tile based on a suggestion from @Vincent-Xiao. I’m not sure this is the best approach, but it’s relatively easy to change since it is just an alias.

Add upstream TileIR docs, tests, and assets. Record the FlagTree vendor provenance and drop the unused backend name.conf file.

Remove internal release markers from the ported TileIR tutorial benchmark sources while preserving SPDX headers and ordinary section comments.

KingsleyLiu-NV · 2026-06-29T06:06:35Z

@Vincent-Xiao Can you please approve CI workflows for my latest commit?

KingsleyLiu-NV requested review from Galaxy1458, sunnycase and zhzhcookie as code owners June 17, 2026 02:41

github-actions Bot added the triton_v3.6.x label Jun 17, 2026

KingsleyLiu-NV marked this pull request as draft June 17, 2026 08:09

zhzhcookie reviewed Jun 18, 2026

View reviewed changes

sunnycase reviewed Jun 23, 2026

View reviewed changes

Vincent-Xiao marked this pull request as ready for review June 23, 2026 08:47

KingsleyLiu-NV added 9 commits June 28, 2026 20:43

[tileir] Integrate TileIR backend

3f0d499

[tileir] Add per-kernel TileIR routing

6ce1c87

[tileir] Support TLE load-view token ops

09b9f8f

[tileir] Add tutorial examples

38e843a

[tileir] Sync TileIR vendor files

c096f19

Add upstream TileIR docs, tests, and assets. Record the FlagTree vendor provenance and drop the unused backend name.conf file.

[tileir] Clean tutorial benchmark comments

d5a5c87

Remove internal release markers from the ported TileIR tutorial benchmark sources while preserving SPDX headers and ordinary section comments.

[tileir] Polish tutorials and vendor notes

ba8dc54

[tileir] Sync vendor and fix TensorDescriptor ABI

babe605

[tileir] Remove compile env and polish benchmarks

123f565

KingsleyLiu-NV force-pushed the feature/flagtree-tileir-integration branch 2 times, most recently from 47b0477 to 01680a4 Compare June 29, 2026 05:20

[tileir] Address backend CI review feedback

cbf1dc5

KingsleyLiu-NV force-pushed the feature/flagtree-tileir-integration branch from 01680a4 to cbf1dc5 Compare June 29, 2026 07:25

[tle] Resolve backend language extensions lazily

a473bab

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BACKEND] Add opt-in TileIR backend integration#703

[BACKEND] Add opt-in TileIR backend integration#703
KingsleyLiu-NV wants to merge 11 commits into
flagos-ai:triton_v3.6.xfrom
KingsleyLiu-NV:feature/flagtree-tileir-integration

KingsleyLiu-NV commented Jun 17, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Jun 17, 2026 •

edited

Loading

Uh oh!

zhzhcookie Jun 18, 2026

Uh oh!

KingsleyLiu-NV Jun 25, 2026

Uh oh!

zhzhcookie Jun 25, 2026

Uh oh!

sunnycase left a comment

Uh oh!

KingsleyLiu-NV commented Jun 25, 2026

Uh oh!

KingsleyLiu-NV commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

KingsleyLiu-NV commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Implementation

Validation

Load View Token Ordering

Mixed Kernel Routing

Triton TileIR Benchmarks

CI

Uh oh!

CLAassistant commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhzhcookie Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

KingsleyLiu-NV Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

zhzhcookie Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

sunnycase left a comment

Choose a reason for hiding this comment

Uh oh!

KingsleyLiu-NV commented Jun 25, 2026

Uh oh!

KingsleyLiu-NV commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

KingsleyLiu-NV commented Jun 17, 2026 •

edited

Loading

CLAassistant commented Jun 17, 2026 •

edited

Loading