outputs.graphite: Retry sending metrics immediately after reconnect#3680
outputs.graphite: Retry sending metrics immediately after reconnect#3680danielnelson merged 1 commit intoinfluxdata:masterfrom
Conversation
If writing to Graphite would fail the plugin would reconnect, but not retry to send the metrics until the next interval. In a situation when the connection would break before the next interval the metrics would never reach the Graphite server. Obviously this is a network issue which but Telegraf should handle these kind of situations better. This patch retries to send the metrics one time immediately after reconnecting.
|
Can you check if your disconnects are detected by the code in |
|
Yes it does detect them, but I wasn't sure if I have to check all connections and then reconnect the broken ones, reconnect all connections or be happy with at least one connection. I suspect I would have to refactor connect() then as well, as it now reconnects all connections. Or add some way to mark a connection as broken so it won't try to send on it. I thought this would be safe as well, there might be situations which are not detected by checkEOF. I'll think a bit about it, if you have any suggestions let me know. And this doesn't double send, the second send is only called when the first failed. |
|
Okay, I think we will want to refactor this code in the future so that we can reconnect individually to each server. This way the logic can basically be: This is still an improvement, so I'm going to merge. |
(cherry picked from commit f374a29)
* master: Update changelog Reconnect before sending graphite metrics if disconnected (influxdata#3680) Update changelog Add support for using globs in devices list of diskio input plugin (influxdata#3687) Use go-redis for the redis input (influxdata#3661)
If writing to Graphite would fail the plugin would reconnect, but not retry to send the metrics until the next interval. In a situation when the connection would break before the next interval the metrics would never reach the Graphite server. Obviously this is a network issue which but Telegraf should handle these kind of situations better.
This patch retries to send the metrics one time immediately after reconnecting.
Required for all PRs: