Skip to content

Cache Workers getting stuck? #127

@ghost

Description

I saw this problem in version 2.3.1, via VisualVm, when after a few hours that a bunch of ForkJoinWorkers would be stuck on the WriteBuffer.poll() function in RUNNABLE state. Even hit an OOM when the cpu utilization became too high and my events started to get backed up. I recently upgraded to 2.3.3 and thought the problem was fixed until the below screenshots.

When I started up the application (10/19, 7:34 am) there is one FJW in the writebuffer.poll, which is fine.
screen shot 2016-10-19 at 7 34 41 am

This morning (10/20, 9:34am) and this is in constant RUNNABLE state. The number of workers that will be in this stuck runnable state will slowly grow over time too because I have seen this behavior happen with 2.3.1.
screen shot 2016-10-20 at 9 34 02 am

The cpu utilization and cpu load metrics in Grafana support an overall increase work as if the workers are busy waiting/spinlooping or something.

There are maybe 1k or so instances of the cache data structure in the application. This server has the most instances of the cache and the issue appears to manifest faster than the other servers. All of the servers have a 64GB heap.

I will have to switch back to Guava considering this is a production application.

Thanks for your effort though!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions