Skip to content

Conversation

@marshallpierce
Copy link

The current benchmark (as you've seen in the comments on the blog post) is quite flawed: it's mostly a test of the compiler's ability to remove bounds checks and boxing.

This adds another benchmark that avoids these issues:

  • It has fast but nontrivial work at for each array element (computing hashCode() of byte[])
  • Doesn't use volatile on fields
  • Uses primitive streams where applicable

This is more representative of realworld workloads, I would claim. It also shows results much more inline with what makes sense, namely that they're all about the same except for parallel getting a good speedup.

ByteArrayLoopBenchmark.forEachLambdaMax            avgt   10  10.668 ± 0.161  ms/op
ByteArrayLoopBenchmark.forEachLoopMax              avgt   10  10.513 ± 0.008  ms/op
ByteArrayLoopBenchmark.forMax                      avgt   10  10.578 ± 0.073  ms/op
ByteArrayLoopBenchmark.iteratorMax                 avgt   10  10.507 ± 0.094  ms/op
ByteArrayLoopBenchmark.parallelStreamMax           avgt   10   1.919 ± 0.124  ms/op
ByteArrayLoopBenchmark.streamMax                   avgt   10  10.601 ± 0.153  ms/op
ByteArrayLoopBenchmark.streamReduceMax             avgt   10  10.695 ± 0.295  ms/op
ByteArrayLoopBenchmark.streamReduceMaxWithInitial  avgt   10  10.505 ± 0.149  ms/op

As further color on the volatile issue, your original benchmark produces this:

LoopBenchmarkMain.forEachLambdaMaxInteger          avgt   10   0.513 ± 0.034  ms/op
LoopBenchmarkMain.forEachLoopMaxInteger            avgt   10   0.123 ± 0.006  ms/op
LoopBenchmarkMain.forMaxInteger                    avgt   10   0.221 ± 0.043  ms/op
LoopBenchmarkMain.iteratorMaxInteger               avgt   10   0.104 ± 0.003  ms/op
LoopBenchmarkMain.lambdaMaxInteger                 avgt   10   0.468 ± 0.018  ms/op
LoopBenchmarkMain.parallelStreamMaxInteger         avgt   10   0.208 ± 0.028  ms/op
LoopBenchmarkMain.streamMaxInteger                 avgt   10   0.588 ± 0.020  ms/op

Just by removing the volatile modifier, forMaxInteger becomes the fastest, as is expected:

LoopBenchmarkMain.forEachLambdaMaxInteger   avgt   10  0.510 ± 0.018  ms/op
LoopBenchmarkMain.forEachLoopMaxInteger     avgt   10  0.123 ± 0.003  ms/op
LoopBenchmarkMain.forMaxInteger             avgt   10  0.099 ± 0.002  ms/op
LoopBenchmarkMain.iteratorMaxInteger        avgt   10  0.112 ± 0.015  ms/op
LoopBenchmarkMain.lambdaMaxInteger          avgt   10  0.466 ± 0.013  ms/op
LoopBenchmarkMain.parallelStreamMaxInteger  avgt   10  0.207 ± 0.014  ms/op
LoopBenchmarkMain.streamMaxInteger          avgt   10  0.579 ± 0.012  ms/op

…en iteration approaches: less prone to confusing results due to compiler's ability (or inability) to remove boxing
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use seed. Otherwise your test is flaky as hell :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original test didn't provide a seed, so I didn't care to change that without a good reason. That said, if you can provide a good motivation, I'm game... What effect do you see providing a seed having? Either way, Arrays.hashCode() will have to traverse the full contents of every byte[], and the math in Math.max() will be unpredictable no matter what. If your point is simply "it could be predictable, but it's not right now", I agree with that -- I just don't see it actually having much of an effect, especially given that this setup runs once per iteration, not once per trial, so from the CPU's perspective it's going to run for a few billion cycles on the exact same set of data each time.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compare "max" algorithm for this data sets :)

1 2 3 4 5 6 7 8 9 10
1 1 1 1 1 1 1 1 1 1
10 9 8 7 6 5 4 3 2 1
3 2 5 1 2 6 3 8 7 6 5

Now imagine that each benchmark case will get it's own random sequence. And sequence for one could be much simpler compared to another

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I still think that's pretty unlikely to ever occur, but I also don't ever want to worry about it. :) After all, "pretty unlikely" isn't good enough for thread safety!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants