-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
I saw this problem in version 2.3.1, via VisualVm, when after a few hours that a bunch of ForkJoinWorkers would be stuck on the WriteBuffer.poll() function in RUNNABLE state. Even hit an OOM when the cpu utilization became too high and my events started to get backed up. I recently upgraded to 2.3.3 and thought the problem was fixed until the below screenshots.
When I started up the application (10/19, 7:34 am) there is one FJW in the writebuffer.poll, which is fine.

This morning (10/20, 9:34am) and this is in constant RUNNABLE state. The number of workers that will be in this stuck runnable state will slowly grow over time too because I have seen this behavior happen with 2.3.1.

The cpu utilization and cpu load metrics in Grafana support an overall increase work as if the workers are busy waiting/spinlooping or something.
There are maybe 1k or so instances of the cache data structure in the application. This server has the most instances of the cache and the issue appears to manifest faster than the other servers. All of the servers have a 64GB heap.
I will have to switch back to Guava considering this is a production application.
Thanks for your effort though!