3.13.0: some HTTP API requests fail with 500 errors after a complete cluster restart #11303
Replies: 3 comments 2 replies
-
@daveofthedogs there are certain (completely unrelated to the management plugin) tests that form clusters in parallel from scratch and this does not happen. Our team has been running some of them many times a day against My best guess is that this is #10901 and that as soon as the underlying condition clears (for example, the virtual host is seeded), the HTTP API will work just as it always does. According to the stack trace, a single HTTP request should have failed and that's it. Anyhow, we need an executable way to reproduce with |
Beta Was this translation helpful? Give feedback.
-
Thinks MK
…On Wed, May 22, 2024 at 5:26 PM Michael Klishin ***@***.***> wrote:
Specifically the exception means that some stats for a certain node (
***@***.***) were not available in a map of node stats.
That's not at all surprising after a parallel restart of all nodes since
after booting those stats are emitted eventually, and so they will be
missing
at first and endpoints like GET /api/overview will run into exceptions
because of that or at best could render an empty list of node stats.
After some 10-15 seconds the stats would be in place for subsequent
requests to use them.
—
Reply to this email directly, view it on GitHub
<#11303 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAF7KWBHEBEUG3ANZVOPE2LZDUEOXAVCNFSM6AAAAABIENSRE6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4TKMRXGMZDS>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
|
Beta Was this translation helpful? Give feedback.
-
@michaelklishin ressurecting this thread. Karl or one of the other devs told me that the http api was being deprecated. In the past, I had seen cowboy errors and some other errors. I saw that in the last version of 3.13, 3.13.7, you were still making updates to the http api. Has the api become more stable (an in 14)? RIght now, I have the console stats turned off in all my environments (about 200+ clusters) in favor of prometheus. All of the support teams hate this, and I would love to turn console stats back on. thanks, Dave |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
After all servers in a three-node cluster were rebooted, we were able to log into the console, but received 500 errors from each host. The engineer that rebooted said he rebooted all three at the same time. Errors received on all three nodes were similar:
Reproduction steps
...
Expected behavior
I would not reboot all nodes at once, but if therre were some kind of outage, that _could_happen. I would exoect RMQ to recover without errors.
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions