Description
While testing for speed some PRNG methods (notably, nextInt(int)) using JMH, results for GraalVM were, as expected, significantly better than HotSpot's, in some cases spectacularly. However, there is a class of methods for which GraalVM is 50% slower than HotSpot.
The JHM benchmarks are available at https://github.com/vigna/dsiutils/tree/master/prngperf
OpenJDK Runtime Environment GraalVM CE 22.3.0-dev (build 17.0.5+3-jvmci-22.3-b04)
Linux Fedora 35
CPU Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
In the table below I highlighted the 6 tests in which GraalVM (timings on the left) is 50% slower than HotSpot (timings on the right). In all other cases GraalVM is much faster.
The involved methods contain 64-bit arithmetic (in particular, moduli).
Note that the method above is nextLong(long), but nextInt(int) just delegates to that.
The problematic generators use 4 longs of state. Identical methods for generators with 2 words of state do not have this problem:
Please let me know if there's any additional tests I can do to clarify the matter.
BenchmarkRandom.nextDouble avgt 25 17.096 ± 0.060 ns/op Benchmark Mode Cnt Score Error Units
BenchmarkRandom.nextInt100000 avgt 25 8.727 ± 0.052 ns/op BenchmarkRandom.nextDouble avgt 25 16.993 ± 0.081 ns/op
BenchmarkRandom.nextInt2301 avgt 25 19.316 ± 0.095 ns/op BenchmarkRandom.nextInt100000 avgt 25 8.493 ± 0.032 ns/op
BenchmarkRandom.nextLong avgt 25 17.062 ± 0.080 ns/op BenchmarkRandom.nextInt2301 avgt 25 19.489 ± 0.109 ns/op
BenchmarkSplitMix64.nextDouble avgt 25 1.935 ± 0.008 ns/op BenchmarkRandom.nextLong avgt 25 17.002 ± 0.074 ns/op
BenchmarkSplitMix64.nextInt100000 avgt 25 2.329 ± 0.010 ns/op BenchmarkSplitMix64.nextDouble avgt 25 1.942 ± 0.009 ns/op
BenchmarkSplitMix64.nextInt2301 avgt 25 2.477 ± 0.010 ns/op BenchmarkSplitMix64.nextInt100000 avgt 25 2.185 ± 0.007 ns/op
BenchmarkSplitMix64.nextLong avgt 25 0.979 ± 0.005 ns/op BenchmarkSplitMix64.nextInt2301 avgt 25 2.409 ± 0.012 ns/op
BenchmarkSplittableRandom.nextDouble avgt 25 1.937 ± 0.007 ns/op BenchmarkSplitMix64.nextLong avgt 25 1.348 ± 0.006 ns/op
BenchmarkSplittableRandom.nextInt100000 avgt 25 1.904 ± 0.007 ns/op BenchmarkSplittableRandom.nextDouble avgt 25 1.936 ± 0.008 ns/op
BenchmarkSplittableRandom.nextInt2301 avgt 25 11.656 ± 0.052 ns/op BenchmarkSplittableRandom.nextInt100000 avgt 25 2.138 ± 0.009 ns/op
BenchmarkSplittableRandom.nextLong avgt 25 0.848 ± 0.005 ns/op BenchmarkSplittableRandom.nextInt2301 avgt 25 13.321 ± 0.053 ns/op
BenchmarkThreadLocalRandom.nextDouble avgt 25 2.721 ± 0.016 ns/op BenchmarkSplittableRandom.nextLong avgt 25 1.348 ± 0.006 ns/op
BenchmarkThreadLocalRandom.nextInt100000 avgt 25 2.164 ± 0.008 ns/op BenchmarkThreadLocalRandom.nextDouble avgt 25 3.073 ± 0.010 ns/op
BenchmarkThreadLocalRandom.nextInt2301 avgt 25 12.909 ± 0.144 ns/op BenchmarkThreadLocalRandom.nextInt100000 avgt 25 2.612 ± 0.010 ns/op
BenchmarkThreadLocalRandom.nextLong avgt 25 1.056 ± 0.006 ns/op BenchmarkThreadLocalRandom.nextInt2301 avgt 25 13.446 ± 0.082 ns/op
BenchmarkXoRoShiRo128Plus.nextDouble avgt 25 1.939 ± 0.010 ns/op BenchmarkThreadLocalRandom.nextLong avgt 25 1.458 ± 0.008 ns/op
BenchmarkXoRoShiRo128Plus.nextInt100000 avgt 25 2.128 ± 0.009 ns/op BenchmarkXoRoShiRo128Plus.nextDouble avgt 25 1.940 ± 0.008 ns/op
BenchmarkXoRoShiRo128Plus.nextInt2301 avgt 25 2.290 ± 0.010 ns/op BenchmarkXoRoShiRo128Plus.nextInt100000 avgt 25 2.290 ± 0.010 ns/op
BenchmarkXoRoShiRo128Plus.nextLong avgt 25 0.909 ± 0.003 ns/op BenchmarkXoRoShiRo128Plus.nextInt2301 avgt 25 2.324 ± 0.010 ns/op
BenchmarkXoRoShiRo128PlusPlus.nextDouble avgt 25 1.941 ± 0.009 ns/op BenchmarkXoRoShiRo128Plus.nextLong avgt 25 1.801 ± 0.009 ns/op
BenchmarkXoRoShiRo128PlusPlus.nextInt100000 avgt 25 2.280 ± 0.009 ns/op BenchmarkXoRoShiRo128PlusPlus.nextDouble avgt 25 1.962 ± 0.007 ns/op
BenchmarkXoRoShiRo128PlusPlus.nextInt2301 avgt 25 2.372 ± 0.012 ns/op BenchmarkXoRoShiRo128PlusPlus.nextInt100000 avgt 25 2.357 ± 0.004 ns/op
BenchmarkXoRoShiRo128PlusPlus.nextLong avgt 25 1.056 ± 0.004 ns/op BenchmarkXoRoShiRo128PlusPlus.nextInt2301 avgt 25 2.442 ± 0.008 ns/op
BenchmarkXoRoShiRo128StarStar.nextDouble avgt 25 1.936 ± 0.007 ns/op BenchmarkXoRoShiRo128PlusPlus.nextLong avgt 25 1.884 ± 0.007 ns/op
BenchmarkXoRoShiRo128StarStar.nextInt100000 avgt 25 2.563 ± 0.017 ns/op BenchmarkXoRoShiRo128StarStar.nextDouble avgt 25 2.063 ± 0.009 ns/op
BenchmarkXoRoShiRo128StarStar.nextInt2301 avgt 25 2.601 ± 0.007 ns/op BenchmarkXoRoShiRo128StarStar.nextInt100000 avgt 25 2.557 ± 0.009 ns/op
BenchmarkXoRoShiRo128StarStar.nextLong avgt 25 1.122 ± 0.005 ns/op BenchmarkXoRoShiRo128StarStar.nextInt2301 avgt 25 2.597 ± 0.014 ns/op
BenchmarkXoShiRo256Plus.nextDouble avgt 25 1.939 ± 0.009 ns/op BenchmarkXoRoShiRo128StarStar.nextLong avgt 25 1.891 ± 0.007 ns/op
BenchmarkXoShiRo256Plus.nextInt100000 avgt 25 3.428 ± 0.014 ns/op BenchmarkXoShiRo256Plus.nextDouble avgt 25 1.994 ± 0.009 ns/op*
BenchmarkXoShiRo256Plus.nextInt2301 avgt 25 3.642 ± 0.144 ns/op BenchmarkXoShiRo256Plus.nextInt100000 avgt 25 2.473 ± 0.006 ns/op*
BenchmarkXoShiRo256Plus.nextLong avgt 25 1.608 ± 0.009 ns/op BenchmarkXoShiRo256Plus.nextInt2301 avgt 25 2.597 ± 0.013 ns/op
BenchmarkXoShiRo256PlusPlus.nextDouble avgt 25 1.943 ± 0.010 ns/op BenchmarkXoShiRo256Plus.nextLong avgt 25 1.953 ± 0.008 ns/op
BenchmarkXoShiRo256PlusPlus.nextInt100000 avgt 25 3.503 ± 0.021 ns/op BenchmarkXoShiRo256PlusPlus.nextDouble avgt 25 2.079 ± 0.010 ns/op*
BenchmarkXoShiRo256PlusPlus.nextInt2301 avgt 25 3.874 ± 0.026 ns/op BenchmarkXoShiRo256PlusPlus.nextInt100000 avgt 25 2.580 ± 0.010 ns/op*
BenchmarkXoShiRo256PlusPlus.nextLong avgt 25 1.619 ± 0.010 ns/op BenchmarkXoShiRo256PlusPlus.nextInt2301 avgt 25 2.662 ± 0.008 ns/op
BenchmarkXoShiRo256StarStar.nextDouble avgt 25 1.941 ± 0.013 ns/op BenchmarkXoShiRo256PlusPlus.nextLong avgt 25 1.919 ± 0.010 ns/op
BenchmarkXoShiRo256StarStar.nextInt100000 avgt 25 3.642 ± 0.016 ns/op BenchmarkXoShiRo256StarStar.nextDouble avgt 25 2.287 ± 0.009 ns/op*
BenchmarkXoShiRo256StarStar.nextInt2301 avgt 25 3.759 ± 0.020 ns/op BenchmarkXoShiRo256StarStar.nextInt100000 avgt 25 2.675 ± 0.015 ns/op*
BenchmarkXoShiRo256StarStar.nextLong avgt 25 1.648 ± 0.008 ns/op BenchmarkXoShiRo256StarStar.nextInt2301 avgt 25 2.641 ± 0.010 ns/op
BenchmarkXorShift1024StarPhi.nextDouble avgt 25 1.942 ± 0.008 ns/op BenchmarkXoShiRo256StarStar.nextLong avgt 25 2.042 ± 0.008 ns/op
BenchmarkXorShift1024StarPhi.nextInt100000 avgt 25 2.466 ± 0.006 ns/op BenchmarkXorShift1024StarPhi.nextDouble avgt 25 2.261 ± 0.008 ns/op
BenchmarkXorShift1024StarPhi.nextInt2301 avgt 25 2.610 ± 0.010 ns/op BenchmarkXorShift1024StarPhi.nextInt100000 avgt 25 3.286 ± 0.019 ns/op
BenchmarkXorShift1024StarPhi.nextLong avgt 25 1.337 ± 0.007 ns/op BenchmarkXorShift1024StarPhi.nextInt2301 avgt 25 3.433 ± 0.012 ns/op