I've been playing around a bit with performance benchmarking.

It seems that the current approach of benchmarking the vectorised ops (e.g. @benchmark Libm.erf.($X)) incurs a bit of array overhead, and induces a lot of gc noise.

It is possible to benchmark the functions directly, e.g.

@benchmark Libm.erf(1.0)

but the problem here is then that we're (a) testing only one value, and (b) we hit the nanosecond resolution problem (if each function call takes only 9 nanoseconds, it is hard to detect small performance changes).

We can partially address (a) by doing something like

@benchmark Libm.erf(x) setup=(x = rand())

however this still only tests 1 value per sample, which might be misleading for things like branch prediction, and does not address (b).

The best I have come up with is using reduction operators with Julia's new generator synatx:

@benchmark foo(Libm.erf(x) for x in $X)

An easy choice here is sum: the cost of a floating point addition is fairly minor, however it can be problematic if we hit weird regions (such as subnormals, and I think NaNs can be slow on some processors).

We could be even more clever, and so something like

@benchmark mapreduce(identity, (x,y) -> nothing, Libm.erf(x) for x in X)

which reduces the overhead even further, however we need to be careful that LLVM in future doesn't just optimise the whole lot away as a no-op.

Robust Benchmarking #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions