I made some benchmarks for my presentation about Aeron, but I found out that if I use different tools for the same transport I got slightly different results.
For example, if I use HDR histograms I got the results that align with that numbers the maintainers are getting with their tests:
Also, I tried another cool library for java benchmarks – JLBH
But the results confuse me a bit…
First of all, I got 2 different versions of the benchmark:
Seems like JLBH encourage to use another thread for a listener, at least in that way some of the settings(such as throughput) make more sense, and the initial warmup prints some statistic. But I could be terribly wrong, please correct me if I am.
But more importantly, the results of those benchmarks are completely different and doesn’t align with what I saw with HDR:
There is a big chance that I messed up somewhere but for now, all 3 benchmarks look more or less similar to me but with a different toolset.
Thank you a lot!
If someone would like to try this his own, you have to just run this script https://github.com/easy-logic/transport-benchmarks/blob/master/run-aeron.sh
to choose which version you want to run change the parameter
mainClassName over here:
there are 3 options:
- io.easylogic.benchmarks.AeronPingBenchmarkJlbhSingleThread (the default one)
You are seeing different results because the benchmarks are not measuring the same thing.
The AeronPingBenchmarkHdrHistogram is measuring just the ideal case, i.e. one message is being sent and then immediately consumed. There are no queuing effects as sender and receiver run in a lockstep. When a new message is created it gets a timestamp for this particular send attempt. However there is no limit as to how long the entire benchmark should run and hence no send rate can be defined. Imagine that one of the sends takes a long GC pause (e.g. 1 second) then only this send result will be bad but the rest will be unaffected.
JLBH benchmarks are different, because they add a notion of time. For example in your results a single run has a duration of 5 seconds, e.g.:
Run time: 5.0s
Correcting for co-ordinated:true
Target throughput:10000/s = 1 message every 100us
End to End: (50,000) 50/90 99/99.9 99.99 - worst was 14 / 16 20 / 1,660 3,770 - 4,010
OS Jitter (5,853) 50/90 99/99.9 99.99 - worst was 1.8 / 7.0 30 / 181 3,770 - 3,770
This changes the benchmark from send 50K messages to send 50K messages in 5 seconds. From the same example, JLBH determined that a target rate is 10K/sec and it will use this information to compute the message start time (
startTimeNS). In this case a GC pause of 1 second would affect all messages after this event, because at least 10K messages won’t be sent on time but also all further messages will be delayed by this pause. Thus the JLBH is trying to avoid the Coordinated Omission Problem. It seems it also has some logic to correct for CO and it was active in your benchmarks (e.g.
Correcting for co-ordinated:true) which might also skew the results.
Finally, the AeronPingBenchmarkJlbhSeparateThread benchmark has even worse results, because now you are seeing the queuing effects. The sender is sending faster then receiver can consume the queues build up, everything runs at max capacity and latency goes down the drain. Also your back-pressure handling code is not correct, i.e. you cannot use the same
IdleStrategy instance for both threads. You need to have two of them.
Have a look at the real-logic/benchmarks project which contains send/receive-style benchmarks for Aeron, gRPC and Kafka. It has its own benchmarking harness LoadTestRig that takes care or warmup, measurement, histograms etc. And it is trivial to add benchmarks for other systems.