Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The performance of ResNet50 using CUDNN and CUBLAS is significantly slower compared to Cutlass. #5

Closed
umiswing opened this issue May 12, 2023 · 1 comment

Comments

@umiswing
Copy link

umiswing commented May 12, 2023

Hello! I am currently testing the ResNet50 model on NVIDIA's 40G A100 platform. After utilizing Cutlass, the compiled ResNet50 is 2.9 times faster compared to compiling it with CUDNN and CUBLAS. However, I find it strange that Cutlass can achieve such a significant speed-up. Is there anything I am overlooking? Below are the specifics of my testing process:

In order to run the resnet50/run.py using TVM's main branch API, I made some modifications to the code:https://github.com/umiswing/tvm-cutlass-eval/commit/3b4bd377763d8d8eb3a0817fbed6cde9e6708bf3
Link to reproduce the testing (python run.py):https://github.com/umiswing/tvm-cutlass-eval/blob/master/resnet50/run.py

Performance using cutlass:

Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
    1.7029       1.5734       1.8596       1.5677       0.1418

Performance using cudnn+cublas:

Execution time summary:
  mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)
    4.7330       4.6858       5.0248       4.6459       0.0859

cuda version

11.7

tvm version

commit 1d145f112115ca20a0cd2e37a726b1d1519cac4b

config.cmake

@@ -46,7 +46,7 @@
 # - ON: enable CUDA with cmake's auto search
 # - OFF: disable CUDA
 # - /path/to/cuda: use specific path to cuda toolkit
-set(USE_CUDA OFF)
+set(USE_CUDA ON)

 # Whether enable ROCM runtime
 #
@@ -142,7 +142,7 @@ set(USE_MICRO_STANDALONE_RUNTIME OFF)
 # - OFF: disable llvm, note this will disable CPU codegen
 #        which is needed for most cases
 # - /path/to/llvm-config: enable specific LLVM when multiple llvm-dev is available.
-set(USE_LLVM OFF)
+set(USE_LLVM /usr/bin/llvm-config-11)

 #---------------------------------------------
 # Contrib libraries
@@ -217,10 +217,10 @@ set(USE_EDGETPU OFF)
 # - ON: enable cuDNN with cmake's auto search in CUDA directory
 # - OFF: disable cuDNN
 # - /path/to/cudnn: use specific path to cuDNN path
-set(USE_CUDNN OFF)
+set(USE_CUDNN ON)

 # Whether use cuBLAS
-set(USE_CUBLAS OFF)
+set(USE_CUBLAS ON)

 # Whether use MIOpen
 set(USE_MIOPEN OFF)
@@ -416,7 +416,7 @@ set(USE_GTEST AUTO)

 # Enable using CUTLASS as a BYOC backend
 # Need to have USE_CUDA=ON
-set(USE_CUTLASS OFF)
+set(USE_CUTLASS ON)
@umiswing
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant