For CuTe dsl, the kernels seem fast enough such that timeout's wouldn't be likely. I'm wondering if you're including compile times in your timeout calculation.
testing: 2, 3, 4
max diff: 1.1920928955078125e-07, at index: 7
allclose: True
kernel time: 0.598 ms
testing: 3, 3, 5
max diff: 2.384185791015625e-07, at index: 8
allclose: True
kernel time: 0.149 ms
testing: 4, 4, 3
max diff: 8.195638656616211e-08, at index: 6
allclose: True
kernel time: 0.194 ms
testing: 64, 128, 32
max diff: 2.905726432800293e-07, at index: 923
allclose: True
kernel time: 0.331 ms
testing: 512, 256, 128
max diff: 3.8743019104003906e-07, at index: 53468
allclose: True
kernel time: 0.249 ms
For CuTe dsl, the kernels seem fast enough such that timeout's wouldn't be likely. I'm wondering if you're including compile times in your timeout calculation.