|
| 1 | +:warning: This will be marged with the original [bustd](https://github.com/vrmiguel/bustd) repository. |
| 2 | + |
| 3 | +# `buztd`: Available memory or bust! |
| 4 | + |
| 5 | +`buztd` is a lightweight process killer daemon for out-of-memory scenarios for Linux! |
| 6 | + |
| 7 | +This particular project is a Zig version of the [original `bustd` project](https://github.com/vrmiguel/bustd). |
| 8 | + |
| 9 | +## Features |
| 10 | + |
| 11 | +### Extremely thin memory usage |
| 12 | + |
| 13 | +The Zig version of `bustd` makes no heap allocations and relies solely on a single 128-byte buffer in the stack for all its allocation needs. |
| 14 | + |
| 15 | +### Small CPU usage |
| 16 | + |
| 17 | +Much like `earlyoom` and `nohang`, `buztd` uses adaptive sleep times during its memory polling. |
| 18 | + |
| 19 | +Unlike these two, however, `buztd` does not read from `/proc/meminfo`, instead opting for the `sysinfo` syscall. |
| 20 | + |
| 21 | +This approach has its up- and downsides. The amount of free RAM that `sysinfo` reads does not account for cached memory, while `MemAvailable` in `/proc/meminfo` does. |
| 22 | + |
| 23 | +However, the `sysinfo` syscall is one order of magnitude faster than parsing `/proc/meminfo`, at least according to [this kernel patch](https://sourceware.org/legacy-ml/libc-alpha/2015-08/msg00512.html) (granted, from 2015). |
| 24 | + |
| 25 | +As `buztd` can't solely rely on the free RAM readings of `sysinfo`, we check for memory stress through [Pressure Stall Information](https://www.kernel.org/doc/html/v5.8/accounting/psi.html). |
| 26 | + |
| 27 | +More on that below. |
| 28 | + |
| 29 | +### `buztd` will try to lock all pages mapped into its address space |
| 30 | + |
| 31 | +Much like `earlyoom`, `buztd` uses [`mlockall`](https://www.ibm.com/docs/en/aix/7.2?topic=m-mlockall-munlockall-subroutine) to avoid being sent to swap, which allows the daemon to remain responsive even when the system memory is under heavy load and susceptible to [thrashing](https://en.wikipedia.org/wiki/Thrashing_(computer_science)). |
| 32 | + |
| 33 | +### Checks for Pressure Stall Information |
| 34 | + |
| 35 | +The Linux kernel, since version 4.20 (and built with `CONFIG_PSI=y`), presents canonical new pressure metrics for memory, CPU, and IO. |
| 36 | +In the words of [Facebook Incubator](https://facebookmicrosites.github.io/psi/docs/overview): |
| 37 | + |
| 38 | +``` |
| 39 | +PSI stats are like barometers that provide fair warning of impending resource |
| 40 | +shortages, enabling you to take more proactive, granular, and nuanced steps |
| 41 | +when resources start becoming scarce. |
| 42 | +``` |
| 43 | + |
| 44 | +More specifically, `buztd` checks for how long, in microseconds, processes have stalled in the last 10 seconds. By default, `buztd` will kill a process when processes have stalled for 25 microseconds in the last ten seconds. |
| 45 | + |
| 46 | +Example: |
| 47 | +``` |
| 48 | + some avg10=0.00 avg60=0.00 avg300=0.00 total=11220657 |
| 49 | + full avg10=0.00 avg60=0.00 avg300=0.00 total=10947429 |
| 50 | +``` |
| 51 | + |
| 52 | +These ratios are percentages of recent trends over ten, sixty, and three hundred second windows. |
| 53 | + |
| 54 | +The `some` row indicates the percentage of time n that given time frame in which _any_ process has stalled due to memory thrashing. |
| 55 | + |
| 56 | +`buztd` allows you to configure the value of `some avg10` in which, if surpassed, some process will be killed. |
| 57 | + |
| 58 | +The ideal value for this cutoff varies a lot between systems. |
| 59 | + |
| 60 | +Try messing around with `tools/mem-eater.c` to guesstimate a value that works well for you. |
| 61 | + |
| 62 | +## Building |
| 63 | + |
| 64 | +Requirements: |
| 65 | +* [Zig 0.10](https://ziglang.org/) |
| 66 | +* Linux 4.20+ built with `CONFIG_PSI=y` |
| 67 | + |
| 68 | +```shell |
| 69 | +git clone https://github.com/vrmiguel/buztd |
| 70 | +cd buztd |
| 71 | + |
| 72 | +# Choose which compilation mode you'd like: |
| 73 | +zig build -Drelease-fast # Turns on optimization and disables safety checks |
| 74 | +zig build -Drelease-safe # Turns on optimization and keeps safety checks |
| 75 | +zig build -Drelease-small # Turns on size optimizations and disables safety checks |
| 76 | +``` |
| 77 | + |
| 78 | +## Configuration |
| 79 | + |
| 80 | +As of the time of writing, this version of `buztd` offers no command-line argument parsing, but allows easy configuration through the `src/config.zig` file. |
| 81 | + |
| 82 | + |
| 83 | +```zig |
| 84 | +/// Sets whether or not buztd should daemonize |
| 85 | +/// itself. Don't use this if running buztd as a systemd |
| 86 | +/// service or something of the sort. |
| 87 | +pub const should_daemonize: bool = false; |
| 88 | +
|
| 89 | +/// Free RAM percentage figures below this threshold are considered to be near terminal, meaning |
| 90 | +/// that buztd will start to check for Pressure Stall Information whenever the |
| 91 | +/// free RAM figures go below this. |
| 92 | +/// However, this free RAM amount is what the sysinfo syscall gives us, which does not take in consideration |
| 93 | +/// reclaimable or cached pages. The true free RAM amount available to the OS is bigger than what it indicates. |
| 94 | +pub const free_ram_threshold: u8 = 15; |
| 95 | +
|
| 96 | +/// The Linux kernel presents canonical pressure metrics for memory, found in `/proc/pressure/memory`. |
| 97 | +/// Example: |
| 98 | +/// some avg10=0.00 avg60=0.00 avg300=0.00 total=11220657 |
| 99 | +/// full avg10=0.00 avg60=0.00 avg300=0.00 total=10947429 |
| 100 | +/// These ratios are percentages of recent trends over ten, sixty, and |
| 101 | +/// three hundred second windows. The `some` row indicates the percentage of time |
| 102 | +// in that given time frame in which _any_ process has stalled due to memory thrashing. |
| 103 | +/// |
| 104 | +/// This value configured here is the value of `some avg10` in which, if surpassed, some |
| 105 | +/// process will be killed. |
| 106 | +/// |
| 107 | +/// The ideal value for this cutoff varies a lot between systems. |
| 108 | +/// Try messing around with `tools/mem-eater.c` to guesstimate a value that works well for you. |
| 109 | +pub const cutoff_psi: f32 = 0.05; |
| 110 | +
|
| 111 | +/// Sets processes that buztd must never kill. |
| 112 | +/// The values expected here are the `comm` values of the process you don't want to have terminated. |
| 113 | +/// A comm-value is the filename of the executable truncated to 16 characters.. |
| 114 | +pub const unkillables = std.ComptimeStringMap(void, .{ |
| 115 | + .{ "firefox", void }, |
| 116 | + .{ "rustc", void }, |
| 117 | + .{ "electron", void }, |
| 118 | +}); |
| 119 | +
|
| 120 | +
|
| 121 | +/// If any error occurs, restarts the monitoring instead of exiting with an unsuccesful status code |
| 122 | +pub const retry: bool = true; |
| 123 | +``` |
| 124 | + |
| 125 | + |
0 commit comments