We should track the size of a compiled image (maybe for the kernel and loader independently) so that we can catch large code bloating changes (or size reduction improvements!) over time.
Similar to benchmarks we would save a baseline on every push to main and then in PRs load that baseline and compare. If a significant regression (TODO determine a good threshold) ie increase in size is detected, this would mark the PR check as failed and ideally pull in a tool like bloaty to print the largest symbols in the artifact. Maybe this could even use diff to compare the baseline bloaty output to the PR bloaty output to surface changes.
We should track the size of a compiled image (maybe for the kernel and loader independently) so that we can catch large code bloating changes (or size reduction improvements!) over time.
Similar to benchmarks we would save a baseline on every push to
mainand then in PRs load that baseline and compare. If a significant regression (TODO determine a good threshold) ie increase in size is detected, this would mark the PR check as failed and ideally pull in a tool likebloatyto print the largest symbols in the artifact. Maybe this could even usediffto compare the baselinebloatyoutput to the PRbloatyoutput to surface changes.