Bug report
After a certain amount of time (usually 5-10 minutes) under a load-test, the telegraf agent stops sending metrics onward via wavefront output. In addition to that, the log fills up with "took to long to collect" messages from all plugins.
No relevant error message is available in the logs.
This issue seems similar to #3629 however no aggregation plugin is in use.
There is a suspicion that the output may be too slow, even though the buffer doesn't seem to overfill and drop metrics.
Load test description
Measurements are submitted via http_listener from a load-generator host that replays a snapshot of data at selectable throughput. At 5000pps (fields), the error happens within minutes of starting the agent.
Relevant telegraf.conf:
http-proxy.conf.txt
telegraf.conf.txt
System info:
Telegraf version: 1.4.5
OS: Centos 7.3
Expected behavior:
Telegraf logs a meaningful error to describe a reason for overload and keeps buffering metrics.
Actual behavior:
Telegraf seizes operations silently and then starts logging failures to collect on time.
Additional info:
Stacktrace:
telegraf.stacktrace.txt
Hardware and software metrics screenshots for the failure window:


