I can get decent parallel speed-up for sparse matmul and sparse matvec, but the dot product between two vectors seems very slow: ```julia using SuiteSparseGraphBLAS using BenchmarkTools gbset(:nthreads, 16) b = ones(10000) b_gb = GBVector(b) @btime b' * b # 1 μs @btime b_gb' * b_gb # 15 μs ``` Is this expected? Or it can be tuned to be faster? Version: SuiteSparseGraphBLAS@0.7.0