resnet50 cutlass time 3.47 #3

pianogGG · 2022-04-28T08:59:39Z

Hi masahi，I have a question about : resnet50 cutlass time=3.47.Did you use cutlass tensorcore conv2d for all layers of resetNet50? Is 3.47 the sum of the time of all layers using cutlass_profiler? For the case where the input and output channels do not meet the multiple of 8, do you padding to 8 or modify AlianmentA/AlianmenB？For example the first conv inchannel is 3,modify the AlianmentA/AlianmenB to 1?
using cutlass_tensorop_h16816fprop_optimized_256x128_32x3_nhwc_align8_base =
typename cutlass::conv::kernel::DefaultConv2dFprop<
cutlass::half_t,
cutlass::layout::TensorNHWC,
cutlass::half_t,
cutlass::layout::TensorNHWC,
cutlass::half_t,
cutlass::layout::TensorNHWC,
cutlass::half_t,
cutlass::arch::OpClassTensorOp,
cutlass::arch::Sm80,
cutlass::gemm::GemmShape<256, 128, 32>,
cutlass::gemm::GemmShape<64, 64, 32 >,
cutlass::gemm::GemmShape<16, 8, 16>,
cutlass::epilogue::thread::LinearCombination<
cutlass::half_t,
8,
cutlass::half_t,
cutlass::half_t
>,
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<4>, // cutlass::gemm::threadblock::GemmSplitKIdentityThreadblockSwizzle<>,
3,
cutlass::arch::OpMultiplyAdd,
cutlass::conv::IteratorAlgorithm::kOptimized,
cutlass::conv::StrideSupport::kStrided,
8,
8

::Kernel;

masahi · 2022-04-28T11:09:24Z

That time is e2e time, measured using TVM. It includes all layers, including pooling, softmax etc.

Yes, all conv2d ops are offloaded to cutlass tensorcore, including the first layer. Actually cutlass doesn't have any alignment restrictions, so it can operate on 3-channel directly.

masahi closed this as completed Apr 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resnet50 cutlass time 3.47 #3

resnet50 cutlass time 3.47 #3

pianogGG commented Apr 28, 2022

masahi commented Apr 28, 2022

resnet50 cutlass time 3.47 #3

resnet50 cutlass time 3.47 #3

Comments

pianogGG commented Apr 28, 2022

masahi commented Apr 28, 2022