Unexplicably slow (WRT HotSpot) PRNG methods

While testing for speed some PRNG methods (notably, nextInt(int)) using JMH, results for GraalVM were, as expected, significantly better than HotSpot's, in some cases spectacularly. However, there is a class of methods for which GraalVM is 50% slower than HotSpot.

The JHM benchmarks are available at https://github.com/vigna/dsiutils/tree/master/prngperf

OpenJDK Runtime Environment GraalVM CE 22.3.0-dev (build 17.0.5+3-jvmci-22.3-b04)
Linux Fedora 35
CPU Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz

In the table below I highlighted the 6 tests in which GraalVM (timings on the left) is 50% slower than HotSpot (timings on the right). In all other cases GraalVM is much faster.

The involved methods contain 64-bit arithmetic (in particular, moduli).

https://github.com/vigna/dsiutils/blob/924bbb27bc2494f0c4a032cbc6b5e83ff9868628/src/it/unimi/dsi/util/XoShiRo256PlusPlusRandomGenerator.java#L145

Note that the method above is nextLong(long), but nextInt(int) just delegates to that.

The problematic generators use 4 longs of state. Identical methods for generators with 2 words of state do not have this problem:

https://github.com/vigna/dsiutils/blob/924bbb27bc2494f0c4a032cbc6b5e83ff9868628/src/it/unimi/dsi/util/XoRoShiRo128PlusPlusRandomGenerator.java#L138

Please let me know if there's any additional tests I can do to clarify the matter.

```
BenchmarkRandom.nextDouble                   avgt   25  17.096 ± 0.060  ns/op	Benchmark                                    Mode  Cnt   Score   Error  Units
BenchmarkRandom.nextInt100000                avgt   25   8.727 ± 0.052  ns/op	BenchmarkRandom.nextDouble                   avgt   25  16.993 ± 0.081  ns/op
BenchmarkRandom.nextInt2301                  avgt   25  19.316 ± 0.095  ns/op	BenchmarkRandom.nextInt100000                avgt   25   8.493 ± 0.032  ns/op
BenchmarkRandom.nextLong                     avgt   25  17.062 ± 0.080  ns/op	BenchmarkRandom.nextInt2301                  avgt   25  19.489 ± 0.109  ns/op
BenchmarkSplitMix64.nextDouble               avgt   25   1.935 ± 0.008  ns/op	BenchmarkRandom.nextLong                     avgt   25  17.002 ± 0.074  ns/op
BenchmarkSplitMix64.nextInt100000            avgt   25   2.329 ± 0.010  ns/op	BenchmarkSplitMix64.nextDouble               avgt   25   1.942 ± 0.009  ns/op
BenchmarkSplitMix64.nextInt2301              avgt   25   2.477 ± 0.010  ns/op	BenchmarkSplitMix64.nextInt100000            avgt   25   2.185 ± 0.007  ns/op
BenchmarkSplitMix64.nextLong                 avgt   25   0.979 ± 0.005  ns/op	BenchmarkSplitMix64.nextInt2301              avgt   25   2.409 ± 0.012  ns/op
BenchmarkSplittableRandom.nextDouble         avgt   25   1.937 ± 0.007  ns/op	BenchmarkSplitMix64.nextLong                 avgt   25   1.348 ± 0.006  ns/op
BenchmarkSplittableRandom.nextInt100000      avgt   25   1.904 ± 0.007  ns/op	BenchmarkSplittableRandom.nextDouble         avgt   25   1.936 ± 0.008  ns/op
BenchmarkSplittableRandom.nextInt2301        avgt   25  11.656 ± 0.052  ns/op	BenchmarkSplittableRandom.nextInt100000      avgt   25   2.138 ± 0.009  ns/op
BenchmarkSplittableRandom.nextLong           avgt   25   0.848 ± 0.005  ns/op	BenchmarkSplittableRandom.nextInt2301        avgt   25  13.321 ± 0.053  ns/op
BenchmarkThreadLocalRandom.nextDouble        avgt   25   2.721 ± 0.016  ns/op	BenchmarkSplittableRandom.nextLong           avgt   25   1.348 ± 0.006  ns/op
BenchmarkThreadLocalRandom.nextInt100000     avgt   25   2.164 ± 0.008  ns/op	BenchmarkThreadLocalRandom.nextDouble        avgt   25   3.073 ± 0.010  ns/op
BenchmarkThreadLocalRandom.nextInt2301       avgt   25  12.909 ± 0.144  ns/op	BenchmarkThreadLocalRandom.nextInt100000     avgt   25   2.612 ± 0.010  ns/op
BenchmarkThreadLocalRandom.nextLong          avgt   25   1.056 ± 0.006  ns/op	BenchmarkThreadLocalRandom.nextInt2301       avgt   25  13.446 ± 0.082  ns/op
BenchmarkXoRoShiRo128Plus.nextDouble         avgt   25   1.939 ± 0.010  ns/op	BenchmarkThreadLocalRandom.nextLong          avgt   25   1.458 ± 0.008  ns/op
BenchmarkXoRoShiRo128Plus.nextInt100000      avgt   25   2.128 ± 0.009  ns/op	BenchmarkXoRoShiRo128Plus.nextDouble         avgt   25   1.940 ± 0.008  ns/op
BenchmarkXoRoShiRo128Plus.nextInt2301        avgt   25   2.290 ± 0.010  ns/op	BenchmarkXoRoShiRo128Plus.nextInt100000      avgt   25   2.290 ± 0.010  ns/op
BenchmarkXoRoShiRo128Plus.nextLong           avgt   25   0.909 ± 0.003  ns/op	BenchmarkXoRoShiRo128Plus.nextInt2301        avgt   25   2.324 ± 0.010  ns/op
BenchmarkXoRoShiRo128PlusPlus.nextDouble     avgt   25   1.941 ± 0.009  ns/op	BenchmarkXoRoShiRo128Plus.nextLong           avgt   25   1.801 ± 0.009  ns/op
BenchmarkXoRoShiRo128PlusPlus.nextInt100000  avgt   25   2.280 ± 0.009  ns/op	BenchmarkXoRoShiRo128PlusPlus.nextDouble     avgt   25   1.962 ± 0.007  ns/op
BenchmarkXoRoShiRo128PlusPlus.nextInt2301    avgt   25   2.372 ± 0.012  ns/op	BenchmarkXoRoShiRo128PlusPlus.nextInt100000  avgt   25   2.357 ± 0.004  ns/op
BenchmarkXoRoShiRo128PlusPlus.nextLong       avgt   25   1.056 ± 0.004  ns/op	BenchmarkXoRoShiRo128PlusPlus.nextInt2301    avgt   25   2.442 ± 0.008  ns/op
BenchmarkXoRoShiRo128StarStar.nextDouble     avgt   25   1.936 ± 0.007  ns/op	BenchmarkXoRoShiRo128PlusPlus.nextLong       avgt   25   1.884 ± 0.007  ns/op
BenchmarkXoRoShiRo128StarStar.nextInt100000  avgt   25   2.563 ± 0.017  ns/op	BenchmarkXoRoShiRo128StarStar.nextDouble     avgt   25   2.063 ± 0.009  ns/op
BenchmarkXoRoShiRo128StarStar.nextInt2301    avgt   25   2.601 ± 0.007  ns/op	BenchmarkXoRoShiRo128StarStar.nextInt100000  avgt   25   2.557 ± 0.009  ns/op
BenchmarkXoRoShiRo128StarStar.nextLong       avgt   25   1.122 ± 0.005  ns/op	BenchmarkXoRoShiRo128StarStar.nextInt2301    avgt   25   2.597 ± 0.014  ns/op
BenchmarkXoShiRo256Plus.nextDouble           avgt   25   1.939 ± 0.009  ns/op	BenchmarkXoRoShiRo128StarStar.nextLong       avgt   25   1.891 ± 0.007  ns/op
BenchmarkXoShiRo256Plus.nextInt100000        avgt   25   3.428 ± 0.014  ns/op	BenchmarkXoShiRo256Plus.nextDouble           avgt   25   1.994 ± 0.009  ns/op*
BenchmarkXoShiRo256Plus.nextInt2301          avgt   25   3.642 ± 0.144  ns/op	BenchmarkXoShiRo256Plus.nextInt100000        avgt   25   2.473 ± 0.006  ns/op*
BenchmarkXoShiRo256Plus.nextLong             avgt   25   1.608 ± 0.009  ns/op	BenchmarkXoShiRo256Plus.nextInt2301          avgt   25   2.597 ± 0.013  ns/op
BenchmarkXoShiRo256PlusPlus.nextDouble       avgt   25   1.943 ± 0.010  ns/op	BenchmarkXoShiRo256Plus.nextLong             avgt   25   1.953 ± 0.008  ns/op
BenchmarkXoShiRo256PlusPlus.nextInt100000    avgt   25   3.503 ± 0.021  ns/op	BenchmarkXoShiRo256PlusPlus.nextDouble       avgt   25   2.079 ± 0.010  ns/op*
BenchmarkXoShiRo256PlusPlus.nextInt2301      avgt   25   3.874 ± 0.026  ns/op	BenchmarkXoShiRo256PlusPlus.nextInt100000    avgt   25   2.580 ± 0.010  ns/op*
BenchmarkXoShiRo256PlusPlus.nextLong         avgt   25   1.619 ± 0.010  ns/op	BenchmarkXoShiRo256PlusPlus.nextInt2301      avgt   25   2.662 ± 0.008  ns/op
BenchmarkXoShiRo256StarStar.nextDouble       avgt   25   1.941 ± 0.013  ns/op	BenchmarkXoShiRo256PlusPlus.nextLong         avgt   25   1.919 ± 0.010  ns/op
BenchmarkXoShiRo256StarStar.nextInt100000    avgt   25   3.642 ± 0.016  ns/op	BenchmarkXoShiRo256StarStar.nextDouble       avgt   25   2.287 ± 0.009  ns/op*
BenchmarkXoShiRo256StarStar.nextInt2301      avgt   25   3.759 ± 0.020  ns/op	BenchmarkXoShiRo256StarStar.nextInt100000    avgt   25   2.675 ± 0.015  ns/op*
BenchmarkXoShiRo256StarStar.nextLong         avgt   25   1.648 ± 0.008  ns/op	BenchmarkXoShiRo256StarStar.nextInt2301      avgt   25   2.641 ± 0.010  ns/op
BenchmarkXorShift1024StarPhi.nextDouble      avgt   25   1.942 ± 0.008  ns/op	BenchmarkXoShiRo256StarStar.nextLong         avgt   25   2.042 ± 0.008  ns/op
BenchmarkXorShift1024StarPhi.nextInt100000   avgt   25   2.466 ± 0.006  ns/op	BenchmarkXorShift1024StarPhi.nextDouble      avgt   25   2.261 ± 0.008  ns/op
BenchmarkXorShift1024StarPhi.nextInt2301     avgt   25   2.610 ± 0.010  ns/op	BenchmarkXorShift1024StarPhi.nextInt100000   avgt   25   3.286 ± 0.019  ns/op
BenchmarkXorShift1024StarPhi.nextLong        avgt   25   1.337 ± 0.007  ns/op	BenchmarkXorShift1024StarPhi.nextInt2301     avgt   25   3.433 ± 0.012  ns/op
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexplicably slow (WRT HotSpot) PRNG methods #5165

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexplicably slow (WRT HotSpot) PRNG methods #5165

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions