Skip to content

Conversation

@srkreddy1238
Copy link
Contributor

@srkreddy1238 srkreddy1238 commented Nov 28, 2025

Introduces the below features over texture annotation

  • Lowering, codegen and runtime for texture.
  • image2d_array_t support - Added depth dimension allows more allocations using texture instead of falling back to buffer when the texture limits exceeds.
  • A comprehensive set of schedules for Adreno textures.
  • Texture packing of arbitrary types up to 128 bit (FP16-NCHW8c, INT8-NCHW16c ...etc.).
  • A clBufferDescriptor debug dump controlled by cmake options.
  • Pipeline definition for adreno target.

While covering these features the below interfaces or passes or enhanced which need a review.

  • alloc_tensor: VDevice information is passed across these API's. The way of texture allocation is alloc_storage allocates buffer/image objects as requested followed by alloc_tensor being a view of any scope. This takes care of optimum utilization backing memory across different image objects or scopes.
  • Constants Saving: Handled by adding memory scope section in executable. This introduces a new header magic to retain the backward compatibility.
  • Static Memory Planing: Mostly port from Relay static memory planner with mixed mode allocator.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @srkreddy1238, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant overhaul to TVM's Adreno backend, primarily by enabling and optimizing texture-based lowering. The changes span across runtime, Relax, and TIR components to ensure that GPU texture memory can be effectively utilized for various operations like convolutions, pooling, and layout transformations. This aims to improve memory efficiency and performance on Adreno devices by providing dedicated schedules and memory management for texture objects, alongside robust mechanisms for propagating memory scope information throughout the compilation pipeline.

Highlights

  • Texture Annotation and Lowering: Introduces comprehensive support for texture annotation, lowering, codegen, and runtime specifically for Adreno GPUs. This enables more efficient memory utilization by leveraging texture memory instead of falling back to buffers when limits are exceeded.
  • image2d_array_t Support: Adds support for image2d_array_t which includes a depth dimension, allowing for more flexible and larger texture allocations, particularly beneficial for NCHW layouts.
  • Adreno Texture Schedules: A comprehensive set of DLight schedules for Adreno textures has been added, including specialized rules for Conv2d, LayoutTransform, Pool2D, and a Fallback mechanism for general operations.
  • Texture Packing: Enables texture packing of arbitrary data types up to 128 bits, supporting formats like FP16-NCHW8c and INT8-NCHW16c, which are crucial for optimizing performance on Adreno GPUs.
  • Memory Scope Propagation: Enhances runtime.Tensor with SetScope and GetScope methods, and updates SaveDLTensor/Load to preserve memory scope information. This ensures that memory allocation decisions, especially for textures, are correctly propagated through the Relax and TIR pipelines.
  • Static Memory Planning Integration: The static memory planner has been updated to account for texture memory scopes and sizes, porting concepts from Relay's static memory planner with a mixed-mode allocator to better manage device-specific memory.
  • New TIR Passes: Introduces InjectTextureAlloc and TextureFlatten TIR passes. InjectTextureAlloc inserts texture allocation intrinsics, while TextureFlatten transforms multi-dimensional buffer accesses into 2D (width, height, depth) texture accesses based on storage scope.
  • OpenCL Codegen and Runtime Updates: Updates the OpenCL codegen to correctly handle image2d_array_t types and texture2d_load/store intrinsics, using int4 for coordinates and managing channel sizes. The OpenCL runtime now supports allocating image2d_array_t with depth and calculates texture memory sizes based on device attributes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant features for Adreno GPU texture-based lowering, including new runtime and compilation passes, scheduling rules, and analysis capabilities. The changes are extensive and well-structured, particularly the refactoring of scheduling rules and the introduction of more robust analysis for memory scopes and buffer information. However, there are several areas that need attention. I've identified some potential runtime errors due to unsafe assumptions about symbolic shapes and struct info, which should be addressed. Additionally, there are instances of dead code, typos in public headers, and use of bare excepts that should be cleaned up to improve code quality and maintainability.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* \brief The memory scope
* represents the underlaying scope information of device
*/
ffi::String scope = "global";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it is somewhat very intrusive to introduce scope to runtime tensor class here, is it possible to avoid it in runtime? but instead ensure compiler allocate the right scope and call right ops implicitly?

Note that per runtime, the scope can also implicit attaches to data field

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is primarily for the consts / params load/store. The scope information should go at the time of loading consts (runtime::Tensor) to avoid copies.

I think, it can be achieved by altering Load/Store ConstSection of Executable. Let me try this way...

@srkreddy1238 srkreddy1238 force-pushed the texture-lower-ffi branch 2 times, most recently from 5f4e0a6 to 9283ef2 Compare December 3, 2025 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants