Skip to content

Conversation

@jchen10
Copy link
Contributor

@jchen10 jchen10 commented Nov 28, 2025

Prepack Conv kernels with path-aware transpose decisions, store the transposed kernels for reuse, and add ComputeContextBase helpers for node access and GPU buffer unmapping.

@jchen10
Copy link
Contributor Author

jchen10 commented Nov 28, 2025

Perf data on LNL:

model variance(%)
sd-turbo-unet-fp16-demo-layernorm -23.72%
modnet-fp32 -22.99%
sd-turbo-text-encoder-fp16-demo-layernorm -17.58%
efficientnet-lite-f16-demo -15.28%
mobilenetv2-12-f16-demo -14.18%
jina-clip-v1-version -12.61%
gazenet -12.22%
sdunet-v1.5-demo-layernorm -11.43%
modnet-fp16 -10.06%
resnet50-v1-f16-demo -8.14%
florence-2-base-decoder-fp16 -7.95%
movenet-singlepose-thunder-fp32 -7.61%
jina-clip-v1-version-fp16 -7.54%
depth-anything-base-fp32 -7.45%
detr-resnet-50-fp16 -6.55%
detr-resnet-50 -6.33%
jina-clip-v1-text -6.32%
movenet-singlepose-thunder-fp16 -6.04%
mobileclip_s0_vision_fp32 -5.14%

@jchen10
Copy link
Contributor Author

jchen10 commented Nov 28, 2025

@fs-eire @qjia7 @guschmue PTAL

PrePack Conv kernels with path-aware transpose decisions, store the
transposed kernels for reuse, and add ComputeContextBase helpers for
node access and GPU buffer unmapping.
@guschmue guschmue added the ep:WebGPU ort-web webgpu provider label Dec 2, 2025
@jchen10
Copy link
Contributor Author

jchen10 commented Dec 3, 2025

Found the CI error log below. Not quite sure if it is really caused by this PR.

2025-12-02T20:34:21.9671092Z 2: [ FAILED ] CudaNhwcTypedTest/0.ConvNhwcBias, where TypeParam = float (186 ms)

2025-12-02T20:34:21.8768402Z 2: �[1;31m2025-12-02 20:34:21.8759375 [E:onnxruntime:Conv, sequential_executor.cc:572 onnxruntime::ExecuteKernel] Non-zero status code returned while running Conv node. Name:'node1' Status Message: CUDA error cudaErrorNotSupported:operation not supported�[m
2025-12-02T20:34:21.8769974Z 2: E:\_work\onnxruntime\onnxruntime\onnxruntime\test\providers\compare_provider_test_utils.cc(172): error: Value of: _tmp_status.IsOK()
2025-12-02T20:34:21.8770496Z 2:   Actual: false
2025-12-02T20:34:21.8770652Z 2: Expected: true
2025-12-02T20:34:21.8771227Z 2: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'node1' Status Message: CUDA error cudaErrorNotSupported:operation not supported
2025-12-02T20:34:21.8771728Z 2: 

@jchen10
Copy link
Contributor Author

jchen10 commented Dec 3, 2025

Found the CI error log below. Not quite sure if it is really caused by this PR.

2025-12-02T20:34:21.9671092Z 2: [ FAILED ] CudaNhwcTypedTest/0.ConvNhwcBias, where TypeParam = float (186 ms)

2025-12-02T20:34:21.8768402Z 2: �[1;31m2025-12-02 20:34:21.8759375 [E:onnxruntime:Conv, sequential_executor.cc:572 onnxruntime::ExecuteKernel] Non-zero status code returned while running Conv node. Name:'node1' Status Message: CUDA error cudaErrorNotSupported:operation not supported�[m
2025-12-02T20:34:21.8769974Z 2: E:\_work\onnxruntime\onnxruntime\onnxruntime\test\providers\compare_provider_test_utils.cc(172): error: Value of: _tmp_status.IsOK()
2025-12-02T20:34:21.8770496Z 2:   Actual: false
2025-12-02T20:34:21.8770652Z 2: Expected: true
2025-12-02T20:34:21.8771227Z 2: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'node1' Status Message: CUDA error cudaErrorNotSupported:operation not supported
2025-12-02T20:34:21.8771728Z 2: 

Tried the case locally with CUDA EP. It didn't reproduce with this PR.

@guschmue
Copy link
Contributor

guschmue commented Dec 3, 2025

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebGPU ort-web webgpu provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants