Skip to content

Unexplicably slow (WRT HotSpot) PRNG methods #5165

Open
@vigna

Description

@vigna

While testing for speed some PRNG methods (notably, nextInt(int)) using JMH, results for GraalVM were, as expected, significantly better than HotSpot's, in some cases spectacularly. However, there is a class of methods for which GraalVM is 50% slower than HotSpot.

The JHM benchmarks are available at https://github.com/vigna/dsiutils/tree/master/prngperf

OpenJDK Runtime Environment GraalVM CE 22.3.0-dev (build 17.0.5+3-jvmci-22.3-b04)
Linux Fedora 35
CPU Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz

In the table below I highlighted the 6 tests in which GraalVM (timings on the left) is 50% slower than HotSpot (timings on the right). In all other cases GraalVM is much faster.

The involved methods contain 64-bit arithmetic (in particular, moduli).

https://github.com/vigna/dsiutils/blob/924bbb27bc2494f0c4a032cbc6b5e83ff9868628/src/it/unimi/dsi/util/XoShiRo256PlusPlusRandomGenerator.java#L145

Note that the method above is nextLong(long), but nextInt(int) just delegates to that.

The problematic generators use 4 longs of state. Identical methods for generators with 2 words of state do not have this problem:

https://github.com/vigna/dsiutils/blob/924bbb27bc2494f0c4a032cbc6b5e83ff9868628/src/it/unimi/dsi/util/XoRoShiRo128PlusPlusRandomGenerator.java#L138

Please let me know if there's any additional tests I can do to clarify the matter.

BenchmarkRandom.nextDouble                   avgt   25  17.096 ± 0.060  ns/op	Benchmark                                    Mode  Cnt   Score   Error  Units
BenchmarkRandom.nextInt100000                avgt   25   8.727 ± 0.052  ns/op	BenchmarkRandom.nextDouble                   avgt   25  16.993 ± 0.081  ns/op
BenchmarkRandom.nextInt2301                  avgt   25  19.316 ± 0.095  ns/op	BenchmarkRandom.nextInt100000                avgt   25   8.493 ± 0.032  ns/op
BenchmarkRandom.nextLong                     avgt   25  17.062 ± 0.080  ns/op	BenchmarkRandom.nextInt2301                  avgt   25  19.489 ± 0.109  ns/op
BenchmarkSplitMix64.nextDouble               avgt   25   1.935 ± 0.008  ns/op	BenchmarkRandom.nextLong                     avgt   25  17.002 ± 0.074  ns/op
BenchmarkSplitMix64.nextInt100000            avgt   25   2.329 ± 0.010  ns/op	BenchmarkSplitMix64.nextDouble               avgt   25   1.942 ± 0.009  ns/op
BenchmarkSplitMix64.nextInt2301              avgt   25   2.477 ± 0.010  ns/op	BenchmarkSplitMix64.nextInt100000            avgt   25   2.185 ± 0.007  ns/op
BenchmarkSplitMix64.nextLong                 avgt   25   0.979 ± 0.005  ns/op	BenchmarkSplitMix64.nextInt2301              avgt   25   2.409 ± 0.012  ns/op
BenchmarkSplittableRandom.nextDouble         avgt   25   1.937 ± 0.007  ns/op	BenchmarkSplitMix64.nextLong                 avgt   25   1.348 ± 0.006  ns/op
BenchmarkSplittableRandom.nextInt100000      avgt   25   1.904 ± 0.007  ns/op	BenchmarkSplittableRandom.nextDouble         avgt   25   1.936 ± 0.008  ns/op
BenchmarkSplittableRandom.nextInt2301        avgt   25  11.656 ± 0.052  ns/op	BenchmarkSplittableRandom.nextInt100000      avgt   25   2.138 ± 0.009  ns/op
BenchmarkSplittableRandom.nextLong           avgt   25   0.848 ± 0.005  ns/op	BenchmarkSplittableRandom.nextInt2301        avgt   25  13.321 ± 0.053  ns/op
BenchmarkThreadLocalRandom.nextDouble        avgt   25   2.721 ± 0.016  ns/op	BenchmarkSplittableRandom.nextLong           avgt   25   1.348 ± 0.006  ns/op
BenchmarkThreadLocalRandom.nextInt100000     avgt   25   2.164 ± 0.008  ns/op	BenchmarkThreadLocalRandom.nextDouble        avgt   25   3.073 ± 0.010  ns/op
BenchmarkThreadLocalRandom.nextInt2301       avgt   25  12.909 ± 0.144  ns/op	BenchmarkThreadLocalRandom.nextInt100000     avgt   25   2.612 ± 0.010  ns/op
BenchmarkThreadLocalRandom.nextLong          avgt   25   1.056 ± 0.006  ns/op	BenchmarkThreadLocalRandom.nextInt2301       avgt   25  13.446 ± 0.082  ns/op
BenchmarkXoRoShiRo128Plus.nextDouble         avgt   25   1.939 ± 0.010  ns/op	BenchmarkThreadLocalRandom.nextLong          avgt   25   1.458 ± 0.008  ns/op
BenchmarkXoRoShiRo128Plus.nextInt100000      avgt   25   2.128 ± 0.009  ns/op	BenchmarkXoRoShiRo128Plus.nextDouble         avgt   25   1.940 ± 0.008  ns/op
BenchmarkXoRoShiRo128Plus.nextInt2301        avgt   25   2.290 ± 0.010  ns/op	BenchmarkXoRoShiRo128Plus.nextInt100000      avgt   25   2.290 ± 0.010  ns/op
BenchmarkXoRoShiRo128Plus.nextLong           avgt   25   0.909 ± 0.003  ns/op	BenchmarkXoRoShiRo128Plus.nextInt2301        avgt   25   2.324 ± 0.010  ns/op
BenchmarkXoRoShiRo128PlusPlus.nextDouble     avgt   25   1.941 ± 0.009  ns/op	BenchmarkXoRoShiRo128Plus.nextLong           avgt   25   1.801 ± 0.009  ns/op
BenchmarkXoRoShiRo128PlusPlus.nextInt100000  avgt   25   2.280 ± 0.009  ns/op	BenchmarkXoRoShiRo128PlusPlus.nextDouble     avgt   25   1.962 ± 0.007  ns/op
BenchmarkXoRoShiRo128PlusPlus.nextInt2301    avgt   25   2.372 ± 0.012  ns/op	BenchmarkXoRoShiRo128PlusPlus.nextInt100000  avgt   25   2.357 ± 0.004  ns/op
BenchmarkXoRoShiRo128PlusPlus.nextLong       avgt   25   1.056 ± 0.004  ns/op	BenchmarkXoRoShiRo128PlusPlus.nextInt2301    avgt   25   2.442 ± 0.008  ns/op
BenchmarkXoRoShiRo128StarStar.nextDouble     avgt   25   1.936 ± 0.007  ns/op	BenchmarkXoRoShiRo128PlusPlus.nextLong       avgt   25   1.884 ± 0.007  ns/op
BenchmarkXoRoShiRo128StarStar.nextInt100000  avgt   25   2.563 ± 0.017  ns/op	BenchmarkXoRoShiRo128StarStar.nextDouble     avgt   25   2.063 ± 0.009  ns/op
BenchmarkXoRoShiRo128StarStar.nextInt2301    avgt   25   2.601 ± 0.007  ns/op	BenchmarkXoRoShiRo128StarStar.nextInt100000  avgt   25   2.557 ± 0.009  ns/op
BenchmarkXoRoShiRo128StarStar.nextLong       avgt   25   1.122 ± 0.005  ns/op	BenchmarkXoRoShiRo128StarStar.nextInt2301    avgt   25   2.597 ± 0.014  ns/op
BenchmarkXoShiRo256Plus.nextDouble           avgt   25   1.939 ± 0.009  ns/op	BenchmarkXoRoShiRo128StarStar.nextLong       avgt   25   1.891 ± 0.007  ns/op
BenchmarkXoShiRo256Plus.nextInt100000        avgt   25   3.428 ± 0.014  ns/op	BenchmarkXoShiRo256Plus.nextDouble           avgt   25   1.994 ± 0.009  ns/op*
BenchmarkXoShiRo256Plus.nextInt2301          avgt   25   3.642 ± 0.144  ns/op	BenchmarkXoShiRo256Plus.nextInt100000        avgt   25   2.473 ± 0.006  ns/op*
BenchmarkXoShiRo256Plus.nextLong             avgt   25   1.608 ± 0.009  ns/op	BenchmarkXoShiRo256Plus.nextInt2301          avgt   25   2.597 ± 0.013  ns/op
BenchmarkXoShiRo256PlusPlus.nextDouble       avgt   25   1.943 ± 0.010  ns/op	BenchmarkXoShiRo256Plus.nextLong             avgt   25   1.953 ± 0.008  ns/op
BenchmarkXoShiRo256PlusPlus.nextInt100000    avgt   25   3.503 ± 0.021  ns/op	BenchmarkXoShiRo256PlusPlus.nextDouble       avgt   25   2.079 ± 0.010  ns/op*
BenchmarkXoShiRo256PlusPlus.nextInt2301      avgt   25   3.874 ± 0.026  ns/op	BenchmarkXoShiRo256PlusPlus.nextInt100000    avgt   25   2.580 ± 0.010  ns/op*
BenchmarkXoShiRo256PlusPlus.nextLong         avgt   25   1.619 ± 0.010  ns/op	BenchmarkXoShiRo256PlusPlus.nextInt2301      avgt   25   2.662 ± 0.008  ns/op
BenchmarkXoShiRo256StarStar.nextDouble       avgt   25   1.941 ± 0.013  ns/op	BenchmarkXoShiRo256PlusPlus.nextLong         avgt   25   1.919 ± 0.010  ns/op
BenchmarkXoShiRo256StarStar.nextInt100000    avgt   25   3.642 ± 0.016  ns/op	BenchmarkXoShiRo256StarStar.nextDouble       avgt   25   2.287 ± 0.009  ns/op*
BenchmarkXoShiRo256StarStar.nextInt2301      avgt   25   3.759 ± 0.020  ns/op	BenchmarkXoShiRo256StarStar.nextInt100000    avgt   25   2.675 ± 0.015  ns/op*
BenchmarkXoShiRo256StarStar.nextLong         avgt   25   1.648 ± 0.008  ns/op	BenchmarkXoShiRo256StarStar.nextInt2301      avgt   25   2.641 ± 0.010  ns/op
BenchmarkXorShift1024StarPhi.nextDouble      avgt   25   1.942 ± 0.008  ns/op	BenchmarkXoShiRo256StarStar.nextLong         avgt   25   2.042 ± 0.008  ns/op
BenchmarkXorShift1024StarPhi.nextInt100000   avgt   25   2.466 ± 0.006  ns/op	BenchmarkXorShift1024StarPhi.nextDouble      avgt   25   2.261 ± 0.008  ns/op
BenchmarkXorShift1024StarPhi.nextInt2301     avgt   25   2.610 ± 0.010  ns/op	BenchmarkXorShift1024StarPhi.nextInt100000   avgt   25   3.286 ± 0.019  ns/op
BenchmarkXorShift1024StarPhi.nextLong        avgt   25   1.337 ± 0.007  ns/op	BenchmarkXorShift1024StarPhi.nextInt2301     avgt   25   3.433 ± 0.012  ns/op

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions