Since day 1, stalld has had a limitation: it consumes too much CPU time on very large systems.
The main culprit was parsing the sched/debug file, which is also one of the main characteristics of stalld: it offloads all the work to the user space without touching the monitored CPUs.
Also, since day 1, I thought about using tracing to collect the wakeups in the monitored CPUs, but I would prefer not to have the overhead of tracing processing, as it could consume as much CPU time as parsing sched/debug.
So, to have the best balance, I had to use eBPF.
Instead of tracing, stalld can now use an eBPF program to track the queue/dequeue of tasks in the per-CPU runqueue, saving the minimum required information into a map. This map is processed in user space so that stalld can detect stalls in a housekeeping CPU.
I will write a post about the challenges of integrating eBPF on stalld soon, probably after vacations.
As some distros might not support eBPF well, I will keep stalld 1.17 as a long-term version. It is the last version before adding eBPF to stalld and will receive fixes for a while.