If a kernel with cub/thrust/cccl/constexpr stl is slower than hand-written code, it will naturally be reflected in the benchmark ranking of the submission. On the other hand, if cub/thrust/ccl/constexpr stl usage is truly zero cost, and results in more-readable code that performs on-par or better with hand-written unmaintainable equivalents, then it's just better code in all respects.
Stop discouraging cccl/cub/thrust/constexpr stl.
If a kernel with cub/thrust/cccl/constexpr stl is slower than hand-written code, it will naturally be reflected in the benchmark ranking of the submission. On the other hand, if cub/thrust/ccl/constexpr stl usage is truly zero cost, and results in more-readable code that performs on-par or better with hand-written unmaintainable equivalents, then it's just better code in all respects.
Stop discouraging cccl/cub/thrust/constexpr stl.