Skip to content

Measure JVM pauses #1781

@henrikno

Description

@henrikno

Is your feature request related to a problem?

Sometimes we see spans that should take milliseconds take multiple seconds and it's difficult to know if it was just a slow operation, or the JVM was not responding. This is often caused by GC (which we could correlate, but it's not particularly easy), but it's not the only reason for pauses. It'd be great to track the pause time with would catch the StopTheWorld GC time and other hiccups.

Describe the solution you'd like

I'd like to have a metric that shows when the JVM is not responsive, for how long (was it 100ms, or 10 seconds) and how often.

There's some some existing implementations that resolve around the same idea of sleeping and measuring how much longer than you're supposed to sleep you actually slept (e.g. sleep for 10ms, but wake up 3 seconds later you know something's up)
https://github.com/giltene/jHiccup/blob/master/src/main/java/org/jhiccup/HiccupMeter.java
https://github.com/apache/zookeeper/blob/c74658d398cdc1d207aa296cb6e20de00faec03e/zookeeper-server/src/main/java/org/apache/zookeeper/server/util/JvmPauseMonitor.java
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java

Describe alternatives you've considered

I considered using -XX:+PrintGCApplicationStoppedTime, but would need to parse the log, and it's quite verbose, while I'd like just a few metrics that's easy to graph.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions