You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi masahi,I have a question about : resnet50 cutlass time=3.47.Did you use cutlass tensorcore conv2d for all layers of resetNet50? Is 3.47 the sum of the time of all layers using cutlass_profiler? For the case where the input and output channels do not meet the multiple of 8, do you padding to 8 or modify AlianmentA/AlianmenB?For example the first conv inchannel is 3,modify the AlianmentA/AlianmenB to 1?
using cutlass_tensorop_h16816fprop_optimized_256x128_32x3_nhwc_align8_base =
typename cutlass::conv::kernel::DefaultConv2dFprop<
cutlass::half_t,
cutlass::layout::TensorNHWC,
cutlass::half_t,
cutlass::layout::TensorNHWC,
cutlass::half_t,
cutlass::layout::TensorNHWC,
cutlass::half_t,
cutlass::arch::OpClassTensorOp,
cutlass::arch::Sm80,
cutlass::gemm::GemmShape<256, 128, 32>,
cutlass::gemm::GemmShape<64, 64, 32 >,
cutlass::gemm::GemmShape<16, 8, 16>,
cutlass::epilogue::thread::LinearCombination<
cutlass::half_t,
8,
cutlass::half_t,
cutlass::half_t
>,
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<4>, // cutlass::gemm::threadblock::GemmSplitKIdentityThreadblockSwizzle<>,
3,
cutlass::arch::OpMultiplyAdd,
cutlass::conv::IteratorAlgorithm::kOptimized,
cutlass::conv::StrideSupport::kStrided,
8,
8
::Kernel;
The text was updated successfully, but these errors were encountered:
That time is e2e time, measured using TVM. It includes all layers, including pooling, softmax etc.
Yes, all conv2d ops are offloaded to cutlass tensorcore, including the first layer. Actually cutlass doesn't have any alignment restrictions, so it can operate on 3-channel directly.
Hi masahi,I have a question about : resnet50 cutlass time=3.47.Did you use cutlass tensorcore conv2d for all layers of resetNet50? Is 3.47 the sum of the time of all layers using cutlass_profiler? For the case where the input and output channels do not meet the multiple of 8, do you padding to 8 or modify AlianmentA/AlianmenB?For example the first conv inchannel is 3,modify the AlianmentA/AlianmenB to 1?
using cutlass_tensorop_h16816fprop_optimized_256x128_32x3_nhwc_align8_base =
typename cutlass::conv::kernel::DefaultConv2dFprop<
cutlass::half_t,
cutlass::layout::TensorNHWC,
cutlass::half_t,
cutlass::layout::TensorNHWC,
cutlass::half_t,
cutlass::layout::TensorNHWC,
cutlass::half_t,
cutlass::arch::OpClassTensorOp,
cutlass::arch::Sm80,
cutlass::gemm::GemmShape<256, 128, 32>,
cutlass::gemm::GemmShape<64, 64, 32 >,
cutlass::gemm::GemmShape<16, 8, 16>,
cutlass::epilogue::thread::LinearCombination<
cutlass::half_t,
8,
cutlass::half_t,
cutlass::half_t
>,
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<4>, // cutlass::gemm::threadblock::GemmSplitKIdentityThreadblockSwizzle<>,
3,
cutlass::arch::OpMultiplyAdd,
cutlass::conv::IteratorAlgorithm::kOptimized,
cutlass::conv::StrideSupport::kStrided,
8,
8
The text was updated successfully, but these errors were encountered: