Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resnet50 cutlass time 3.47 #3

Closed
pianogGG opened this issue Apr 28, 2022 · 1 comment
Closed

resnet50 cutlass time 3.47 #3

pianogGG opened this issue Apr 28, 2022 · 1 comment

Comments

@pianogGG
Copy link

Hi masahi,I have a question about : resnet50 cutlass time=3.47.Did you use cutlass tensorcore conv2d for all layers of resetNet50? Is 3.47 the sum of the time of all layers using cutlass_profiler? For the case where the input and output channels do not meet the multiple of 8, do you padding to 8 or modify AlianmentA/AlianmenB?For example the first conv inchannel is 3,modify the AlianmentA/AlianmenB to 1?
using cutlass_tensorop_h16816fprop_optimized_256x128_32x3_nhwc_align8_base =
typename cutlass::conv::kernel::DefaultConv2dFprop<
cutlass::half_t,
cutlass::layout::TensorNHWC,
cutlass::half_t,
cutlass::layout::TensorNHWC,
cutlass::half_t,
cutlass::layout::TensorNHWC,
cutlass::half_t,
cutlass::arch::OpClassTensorOp,
cutlass::arch::Sm80,
cutlass::gemm::GemmShape<256, 128, 32>,
cutlass::gemm::GemmShape<64, 64, 32 >,
cutlass::gemm::GemmShape<16, 8, 16>,
cutlass::epilogue::thread::LinearCombination<
cutlass::half_t,
8,
cutlass::half_t,
cutlass::half_t
>,
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<4>, // cutlass::gemm::threadblock::GemmSplitKIdentityThreadblockSwizzle<>,
3,
cutlass::arch::OpMultiplyAdd,
cutlass::conv::IteratorAlgorithm::kOptimized,
cutlass::conv::StrideSupport::kStrided,
8,
8

::Kernel;

@masahi
Copy link
Owner

masahi commented Apr 28, 2022

That time is e2e time, measured using TVM. It includes all layers, including pooling, softmax etc.

Yes, all conv2d ops are offloaded to cutlass tensorcore, including the first layer. Actually cutlass doesn't have any alignment restrictions, so it can operate on 3-channel directly.

@masahi masahi closed this as completed Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants