Releases: troglobit/watchdogd
watchdogd v4.1
Changes
- Add
watchdogctl list-clientscommand to display currently subscribed
clients to the process supervisor. Outputs to stdout in either table
format (default) with colored headers, or JSON format with-j/--json - New global
-j, --jsonoption for machine-readable output, currently
supported bylist-clientsandstatuscommands - New API:
wdog_clients()returns array ofwdog_client_tstructs for
programmatic access to subscribed clients. See API documentation at
https://codedocs.xyz/troglobit/watchdogd/wdog_8h.html - Enhance
watchdogctl statuscommand to display formatted output by
default, with device information, capabilities, and reset history in
a human-readable table format. Use-j/--jsonfor JSON output
Fixes
- Fix #48: generic scripts running more than 1 second would fail with
false "critical error" reports and cause unwanted system reboots due
to uninitialized exit status variable. Found and fixed by Fiona Klute - Fix #55:
watchdogctl reloadwith tempmon crashes watchdogd - Fix #56: issue causing unwanted reboot when
watchdogctl reloadwas
called while a generic monitor script was running. The script_watcher
timer was not stopped during reload, leading to an invalid PID (-1)
being used, which triggered a system reset - Fix memory leak in generic monitor with optional script path, would
be triggered onwatchdogctl reload
watchdogd v4.0
Breaking changes: the
genericscript monitor has new syntax, the
status files have moved, and the format has changed. Also, the
default value forsafe-exitin the .conf file has been changed.
Changes
-
Support for multiple watchdog devices added, issue #26
-
The format of
watchdogctl statusand/run/watchdogd/statushas
been changed to JSON and includes more information about the currently
running daemon and the capabilities of watchdog devices in use -
The
configure --with-$MONITOR=SECflag has been changed to not
take an argument (this was never used). To change the poll interval
of a system monitor, use the configuration file -
A new file system monitor:
fsmon /var { ... }, multiple monitors,
fsmon /path, are supported -
A new temperature monitor:
tempmon /path/to/sensor {...}. It
supports multiple sensors, both thermal and hwmon type. See the
documentation for details -
The syntax for the generic monitor script has changed. This is a
breaking change, everyone must update. New syntax:generic /path/to/montor-script.sh { ... } -
The generic scripts monitor now supports running multiple scripts
-
Documentation of the libwdog supervisor API by Andreas Helbech Kleist
-
API docs at https://codedocs.xyz/troglobit/watchdogd/wdog_8h.html
-
State file location changed from
/var/lib/to/var/lib/misc/.
This is the recommended location in the Linux FHS, and what most
systems use. Both the defaultwatchdogd.confand documentation has
been updated. Unless a file is specified by the user, the daemon will
automatically relocate to the new location at runtime. If the new
directory does not exist, the daemon will fall back to use the old
path, if it exists, issue #36 -
The default
watchdogd.confnow enables reset reason by default.
This is a strong recommendation since it is then possible to trace
the reset cause also for system monitors -
Simplified README by splitting it into multiple files, some text even
moved entirely to man pages instead -
The status files cluttering up
/runhave been moved to their own
subdirectory,/run/watchdogd. This includes the PID file, last boot
status, and the socket forwatchdogctl. The latter remains the
recommended tool to query status and interact with the daemon -
The configure script flags for enabling system monitors have been
simplified. None of the monitors take an argument (poll seconds),
this because that is configured inwatchdogd.conf
Fixes
- Fix #28:
watchdogdcrash in case "Label" or "Reset date" field in
reset reason is empty. Found and fixed by Christian Theiss - Fix #30: replace Finit compile-time detection with runtime check, this
allows synchronized reboot usingwatchdogdwith Finit in Buildroot - Fix #39: generic monitoring script with runtime > 1 second cause
system to reboot. Found and fixed by Senthil Nathan Thangaraj - Fix #41: calling custom supervisor script cause
watchdogdto disable
monitoring, regardless of script exit code. - Fix #43:
watchdogctl clear, andwdog_reset_reason_clr()API, does
not work. Regression introduced in v3.4. - The generic script plugin can now be disabled at runtime. Prior to
this release, it was not possible when once enabled. - The label (cause) of the system monitor forcing a reset is now saved in
the reset reason file. Previously only "forced reset" was the only
message, which without persistent logs did not say much.
watchdogd v3.5
Minor compat release; integration with Finit and new libite.
Changes
- Migrate from Travis-CI to GitHub Actions
- Use SIGTERM to signal PID 1, SIGINT Stops working in Finit v4.1
- Updated examples and manual page(s) with new 'enabled' setting
- Updated README with exact build example for correct paths
- Add support for new libite namespace, as of libite v2.5.0
watchdogd v3.4
Changes
- Clarify nomenclature: reset cause vs. reset reason
- Change layout and formatting of
watchdogctlstatus output - Change defaults for supervisor, still disabled by default but now also with priority set to zero by default. This allows running the supervisor in cgroups v2 systems without realtime priority.
Fixes
- Fix missing pidfile touch on
SIGHUP - Fix problem with plugins being enabled (but incomplete) by default. Now all sections have an
enabled = [true|false]setting, and all are disabled by default. You need to uncomment end enable.
watchdogd v3.4-rc1
Changes
- Clarify nomenclature: reset cause vs. reset reason
- Change layout and formatting of watchdogd status output
Fixes
- Fix missing pidfile touch on
SIGHUP
watchdogd v3.3
Changes
- Increased severity of syslog messages preceding reboot, instead of
LOG_ERRORall messages that result in a reboot useLOG_EMERGbecause manysyslogdservices default to log emerg to console - Add handy summary of options to
configurescript
Fixes
- Fix possible garbled
next_ackfor users oflibwdogdue to badly handled timeout inpoll()when connecting towatchdogd - Fix
configurescript defaults for the following settings:--enable-compat, was always enabled--enable-exampels, were always enabled--enable-syslog-mark, was always enabled
- Fix use-after-free bug in new script monitor, introduced in v3.2
watchdogd v3.2
Changes
- Issue #17: When the process supervisor is enabled
watchdogdnow always runs with elevated RT priority. Previous releases changed toSCHED_RRonly when the first supervised process connected, and conversely disbled RT prio when the last process disconnected. This change gives a more predictable behavior and also meanswatchdogd
can be relied upon until the system has been properly diagnosed - If the (optional) supervisor script returns OK (0) the timer for the offending process is now disarmed and the system is not rebooted.
- Retry handover from Finit buit-in watchdog if first attempt fails
- New generic script monitor, thanks to Tom Deblauwe. Can periodically call a site specific script, with timeout in case the script hangs
Fixes
- Fix #16: Only force reboot on exit if
watchdogdis enabled - When disabling and the re-enabling
watchdogdusing the API the daemon was sometimes stopped by Finit. This happened because the daemon re-issued a watchdog handover signal to Finit. The fix is to only do the handover once. - When re-enabling
watchdogdthe supervisor was not properly elevating the RT priority, instead it remained as aSCHED_OTHERprocess. This fix makes sure to save and re-use the configured RT priority.
watchdogd v3.2-rc1
Changes
- Issue #17: When the process supervisor is enabled
watchdogdnow always runs with elevated RT priority. Previous releases changed toSCHED_RRonly when the first supervised process connected, and conversely disbled RT prio when the last process disconnected. This change gives a more predictable behavior and also meanswatchdogd
can be relied upon until the system has been properly diagnosed - If the (optional) supervisor script returns OK (0) the timer for the offending process is now disarmed and the system is not rebooted.
- Retry handover from Finit buit-in watchdog if first attempt fails
- New generic script monitor, thanks to Tom Deblauwe. Can periodically call a site specific script, with timeout in case the script hangs
Fixes
- Fix #16: Only force reboot on exit if
watchdogdis enabled - When disabling and the re-enabling
watchdogdusing the API the daemon was sometimes stopped by Finit. This happened because the daemon re-issued a watchdog handover signal to Finit. The fix is to only do the handover once. - When re-enabling
watchdogdthe supervisor was not properly elevating the RT priority, instead it remained as aSCHED_OTHERprocess. This fix makes sure to save and re-use the configured RT priority.
watchdogd v3.1
Changes
- Supervised processes can now also cause reset if the ACK sequence is wrong when kicking or unsubscribing
- Issue #7: Add support for callback script to the process supervisor:
script = /path/to/script.shin thesupervisor {}section enables it. When enabled all action is delegated to the script, which is called as:script.sh supervisor CAUSE PID LABEL. For more information, see the manual forwatchdogd.conf - A new command 'fail' has been added to
watchdogctl. It can be used with the supervisor script to record the reset cause and do a WDT reset. The resetCAUSEcan be forwarded by the script to record the correct (or another) reset cause - Add
-p PIDtowatchdogctl. Works with reset and fail commands - Always warn at startup if driver/WDT does not support safe exit, i.e. "magic close"
- Issue #4: Add warning if
.conffile cannot be found - Issue #5: Add recorded time of reset to reset cause state file
Fixes
- Omitting critical/reboot level from a checker plugin causes default value of 95% to be set, causing reboot by loadavg plugin. Fixed by defaulting to 'off' for checker/monitor critical/reboot level
- Issue #6: mismatch in label length between supervised processes and that in
wdog_reason_t=> increase from 16 to 48 chars - Issue #11: problem disabling the process supervisor at runtime, it always caused a reboot
watchdogd v3.0
This release includes major changes to both the build system and the watchdogd command line interface, making it incompatible with previous versions. Therefore the major version number has been bumped.
Application writes can now ask pkg-config for CFLAGS and LIBS to use the process supervisor interface in libwdog.so
Reset cause is now queried and saved in /var/lib/watchdogd.state at boot. Use the new watchdogctl tool to interact with and query status from the daemon.
A configuration file, /etc/watchdogd.conf, with many more options for the health monitor plugins, the process supervisor, and the reset cause.
Changes
- A configuration file,
/etc/watchdogd.conf, has been added - A new tool,
watchdogctl, to interact with daemon has been added - New official Watch Dog Detective logo, courtesy of Ron Leishman, licensed for use with the watchdogd project
- New or updated manual pages for daemon, ctrl tool, and the .conf file
- Health monitor plugins now support running external script instead of default reboot action
- Health monitor plugins no longer need critical/reboot level set, only warning is required to enable a monitor
- Completely overhauled
watchdogdcommand line options and arguments. Some options in previous releases were not options but optional arguments, while others were useless options for a daemon:- Watchdog device node is now an argument not a
-doption - No more
--logfile=FILEoption, redirectstderrinstead -nnow prevents the daemon from forking to the background-fis now used by the--configfile option- When running in the foreground, output syslog also to
stderr, unless the-s, or--syslog, option is given -l, --loglevelreplaces--verboseoption- Use BusyBox options
-Tand-tfor WDT timeout and kick, this replaces the previous-wand-koptions
- Watchdog device node is now an argument not a
- No more support for attaching an external supervisor process using
SIGUSR1andSIGUSR2 - Conversion to GNU Configure and Build system
- Native support for building Debian packages
- Default install prefix changed, from
/usr/localto/ - Added
pkg-configsupport tolibwdog - Save reset cause in
/var/lib/watchdogd.state, by default disabled enable with the .conf file - Possible to disble default reset cause backend and plug in your own. See
src/rc.hfor the API required of your own backend - Updates to
libwdogAPI, including a compatiblity mode for current customer(s) usingwatchdogd2.0 with a supervisor patch - Added
libwdogexample clients - Added customer specific compat
/var/run/supervisor.status - Support for delayed reboot in user API,
wdog_reset_timeout() - Fully integrated with Finit, PID 1. Both
reboot(1)and reset viawatchdogd, e.g.watchdogctl reset, is delegated via Finit to properly shut down the system, sync and unmount all file systems before delegating the actual reset to the WDT.