-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Description
Bug report
On timeout in a script executed by the inputs.exec plugin (and possibly others) Telegraf does not first send SIGTERM (SIGHUP, SIGINT, SIGQUIT) and then SIGKILL but directly sends SIGKILL.
While this kills the script, any cleanup that the script needs to do does not get a chance to run.
This leads to orphaned processes when a script has started child processes.
Relevant telegraf.conf:
[[inputs.exec]]
commands = [
"/etc/telegraf/sqlscripts/oracle_metrics.sh"
]
interval = "1m"
timeout = "5s"
data_format = "influx"
System info:
Oracle Linux Server release 6.3
Linux xxx.tld 2.6.39-400.17.1.el6uek.x86_64 #1 SMP Fri Feb 22 18:16:18 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
Telegraf v1.2.0 (git: release-1.2 b2c1d98)
Steps to reproduce:
- configure Telegraf to run a script with inputs.exec Plugin
- have script call other process or script that reaches timout or hangs
- watch script die and init inherit the started process from the script
Expected behavior:
script gets sent SIGTERM (SIGHUP, SIGINT, SIGQUIT), cleans up after itself then terminates itself.
Actual behavior:
script is killed hard without chance to run signal handler.
Additional info:
Have a look at how SIGKILL works and on best practices on terminating processes.
TL;DR: don't indiscriminately use SIGKILL
http://stackoverflow.com/questions/395877/are-child-processes-created-with-fork-automatically-killed-when-the-parent-is
http://stackoverflow.com/questions/690415/in-what-order-should-i-send-signals-to-gracefully-shutdown-processes
ftp://ftp.gnu.org/old-gnu/Manuals/glibc-2.2.3/html_chapter/libc_24.html#SEC472
Proposal:
Send a non-fatal signal first, then after a grace period (of, say, 5 seconds) send SIGKILL.
Current behavior:
script dies, init inherits childrend processes.
Desired behavior:
script cleans up after itself and commits suicide.
Use case: [Why is this important (helps with prioritizing requests)]
it fu**s up servers running Telegraf by racking up hundreds of orphaned processes.