Skip to content

Commit fe35988

Browse files
trivedivivekfacebook-github-bot
authored andcommitted
Using buffer for weight tensors for quantized mat mul op. (pytorch#15990)
Summary: This change affects the performance and memory usage of the quantized matrix multiplication operation in the Executorch Vulkan backend. By using a buffer for weight tensors, the operation may become more efficient and use less memory, especially for large matrices. Reviewed By: yipjustin Differential Revision: D87911255
1 parent b6342c6 commit fe35988

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

backends/vulkan/runtime/graph/ops/impl/Staging.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -285,7 +285,7 @@ ValueRef prepack_int4_linear_weight_transposed_interleaved(
285285
const int64_t N = qmat2_orig_sizes.at(ndim - 2);
286286
const int64_t N_div2 = N / int64_t(2);
287287

288-
utils::StorageType storage_type = utils::kTexture2D;
288+
utils::StorageType storage_type = utils::kBuffer;
289289
uint32_t max_extent = graph.context()->adapter_ptr()->max_texture2d_dim();
290290
if (N_div2 > max_extent * 4 || K > max_extent) {
291291
storage_type = utils::kBuffer;

0 commit comments

Comments
 (0)