-
Notifications
You must be signed in to change notification settings - Fork 7
Robust Benchmarking #2
Comments
Accuracy testing would also be useful (most functions here should also be defined in MPFR, so we can use that as a reference). My previous experience has been that the OS X libm is one of the fastest (and also fairly accurate), so might be a good candidate as a reference (though obviously we can't run that on nanosoldier). |
Reliable performance regression testing will also be necessary as we start trying to optimise things. Based on my previous experience, small changes can often have large effects (unforeseen) effects. |
I've been playing around a bit with performance benchmarking. It seems that the current approach of benchmarking the vectorised ops (e.g. It is possible to benchmark the functions directly, e.g.
but the problem here is then that we're (a) testing only one value, and (b) we hit the nanosecond resolution problem (if each function call takes only 9 nanoseconds, it is hard to detect small performance changes). We can partially address (a) by doing something like
however this still only tests 1 value per sample, which might be misleading for things like branch prediction, and does not address (b). The best I have come up with is using reduction operators with Julia's new generator synatx:
An easy choice here is We could be even more clever, and so something like
which reduces the overhead even further, however we need to be careful that LLVM in future doesn't just optimise the whole lot away as a no-op. |
Shouldn't we compare against extended precision implementations for correctness as well? |
Yes, but that discussion is probably better in #16. We can keep this thread for performance benchmarking. |
The lowest overhead function that I've found which isn't optimised away is: @benchmark mapreduce(x -> reinterpret(Unsigned,x), |, Libm.erf(x) for x in $X) |
@jrevles has put in a lot of work to make benchmarking as robust as possible for base and it is very tricky to get this right. Jarrett how tricky would it be to teach nanosoldier about Libm.jl? |
I'd have to make Nanosoldier capable of tracking multiple repos at a time if you'd want to use our current hardware. JuliaCI/Nanosoldier.jl#18 is probably a good issue for more discussion of this.
Benchmarking can definitely be a headache-inducing endeavor. Differences between platforms isn't unexpected. The benchmarks should be consistent between runs on a single platform, though - are you experiencing problems in that vein? This document might help if you're benchmarking on Linux.
I'd be interested to hear more, if you're able to provide specifics here. It might be more useful to do your benchmarking interactively instead of running all of them in the benchmark definition script. You also might check out https://github.com/JuliaCI/PkgBenchmark.jl. Since I'm already here: I hit a roadblock in my other research, so I'm taking a break from it today by reviving and finishing up JuliaCI/BenchmarkTools.jl#12, which will enable picosecond timing resolution. |
Indeed. Especially, considering parts of this lib's perf depend strongly on whether hardware fma is available, which adds another variable in the mix. Typically, I don't see huge differences between multiple benchmarks on the same computers, but occasionally I do between different days. I'll have to look at PkgBenchmark and think about using that instead of the script I have.
Very glad to hear! This'll make benchmarking so much easier at single test points. So I'm looking forward to it's merge. |
It'd be great to have a robust benchmarking suite/process where the pure Julia versions could be compared against a C library implementation (or two or three). I wonder if @jrevels could help us get something setup.
The text was updated successfully, but these errors were encountered: