rtla histograms

Tell rtla timerlat to save a histogram of the possible causes of noise, so we can use it to compose bad scenarios, e.g., the worst observed timer IRQ execution time + the worst observed thread noise… and so on.

rtla timerlat top -t -e osnoise:nmi_noise --trigger="hist:key=cpu,duration/1000:sort=cpu,duration/1000:vals=hitcount" -e osnoise:irq_noise --trigger="hist:key=desc,cpu,duration/1000:sort=cpu,duration/1000:vals=hitcount" -e osnoise:softirq_noise --trigger="hist:key=vector,cpu,duration/1000:sort=cpu,duration/1000:vals=hitcount" -e osnoise:thread_noise --trigger="hist:key=comm,cpu,duration/1000:sort=cpu,duration/1000:vals=hitcount"

Two new stalld releases: 1.18 and 1.17.2

Since day 1, stalld has had a limitation: it consumes too much CPU time on very large systems.

The main culprit was parsing the sched/debug file, which is also one of the main characteristics of stalld: it offloads all the work to the user space without touching the monitored CPUs.

Also, since day 1, I thought about using tracing to collect the wakeups in the monitored CPUs, but I would prefer not to have the overhead of tracing processing, as it could consume as much CPU time as parsing sched/debug.

So, to have the best balance, I had to use eBPF.

Instead of tracing, stalld can now use an eBPF program to track the queue/dequeue of tasks in the per-CPU runqueue, saving the minimum required information into a map. This map is processed in user space so that stalld can detect stalls in a housekeeping CPU.

I will write a post about the challenges of integrating eBPF on stalld soon, probably after vacations.

As some distros might not support eBPF well, I will keep stalld 1.17 as a long-term version. It is the last version before adding eBPF to stalld and will receive fixes for a while.

Linux 6.0 and RV

It is somehow hard to realize things when you are doing the work. But it just came to my mind that:

Part of my Ph.D. is now an integral part of Linux, Classified as a new subsystem by the community.

This is not the first part of my thesis that got into the kernel. Also, timerlat/osnoise and rtla were part of my Ph.D. thesis, but they were seen as extensions to the tracing infrastructure, while RV stands on its own in the community’s eyes.

I got three maintainers entries with my Ph.D.: Latency tracers, RTLA and RV.

As a scientist, my mind is always in the future, so I am already working on things for the next one, two, three … five years ahead, and I am already feeling late.

But this news made me stop and admire it.

Well done, Daniel.

And now Linux has a Real-Time Linux Analysis (RTLA) tool!

RTLA starts as the user-space interface for the timerlat and osnoise tracers, presenting an intuitive interface to use and process data. It transforms these tracers into a benchmark-like tool, aided by tracing.

rtla initial commit merged for the Linux 5.17.

RTLA makes it easier for users and developers to collect performance and trace data, helping fine-tune their systems/algorithms.

It is also helpful for those kernel developers that are not familiar with the debugging of the PREEMPT_RT but will have to evaluate their changes with this config on as it will soon land in the mainline tree.

For example, running:

# rtla timerlat top -P f:95 -T 150 -t trace_output.txt

The user can get a summary of the latency experienced by a high priority task (-P f:95), and in the case of a latency higher than 150 us (-T 150), the trace with debug information will be saved in the trace_output.txt. This trace is a good starting point for a kernel-rt developer to find the root cause of the problem.

But there are more to come, like the real-time Linux scheduling analysis tool (which will be fundamental for safety-critical systems) and other tools to evaluate the properties of the real-time features/schedulers of the penguin. It is just starting!

Using perf probe to measure execution time of user-space code on Linux

Some time ago, I used perf probe to capture some execution time on a user-space application to write the paper An Empirical Study on the Adequacy of MBPTA for Tasks Executed on a Complex Computer Architecture with Linux. End every once in a while someone asks me how to measure the execution time of a function, or any kind of “block of code” in user-space. So, this is a mental note about how to do this.

Here is the sample program that I will instrument, it is named evenodd.c:

void even(void)
{
     printf("it is even\n");
}

void odd(void)
{
     printf("it is odd\n");
}
int main()
{
     int i;
     for (i = 0; ;i++) {
          if (i % 2)
               odd();
          else
               even();
     }
}

Compile even odd with debug symbols:

gcc -o evenodd evenodd.c

The command below shows the function void even(void), from the evenodd binary in the same directory:

# perf probe --line even -x evenodd
  <even@/home/bristot/evenodd.c:0>
      0  void even(void)
         {
      2         printf("it is even\n");
      3  }
         
         void odd(void)
         {

To add a perf probe at the beginning of the even() function, run the following command:

# perf probe -x evenodd 'even'

To get the execution time of a given function, we need to add a probe to the last possible line of the function. As shown in the perf probe –line example, it is line 3. The command below adds another probe named even_end event at line 3 of the function event() on the evenondd binary.

# perf probe -x evenodd 'even_end=even:3'

With this, we have all the instrumentation set, and we can record the trace using perf record.

After start running the evenodd application, run the following command to record the entry and exit probes for the even() execution for 10 seconds:

# perf record -e probe_evenodd:even -e probe_evenodd:even_end -a sleep 10

After then record finishes, It is then possible to print the trace using the following command:

# perf script
<output example>
evenodd 101291 [001] 83653.513255: probe_evenodd:even: (401126)
evenodd 101291 [001] 83653.513259: probe_evenodd:even_end: (401134)
evenodd 101291 [001] 83653.513260: probe_evenodd:even: (401126)
evenodd 101291 [001] 83653.513261: probe_evenodd:even_end: (401134)
evenodd 101291 [001] 83653.513262: probe_evenodd:even: (401126)
evenodd 101291 [001] 83653.513263: probe_evenodd:even_end: (401134)
evenodd 101291 [001] 83653.513264: probe_evenodd:even: (401126)
evenodd 101291 [001] 83653.513265: probe_evenodd:even_end: (401134)
evenodd 101291 [001] 83653.513266: probe_evenodd:even: (401126)
evenodd 101291 [001] 83653.513267: probe_evenodd:even_end: (401134)

It is also possible to automatize the computation of the execution time using a perf python script. After capturing a trace of these two events, the following command will generate a skeleton of the python script named perf-script.py:

# perf script -g python

Then edit the probe handlers to compute the execution time. Bellow, you can see one example of the part that matters:

# Global variable to be used to record the start used at the end
start=0
def probe_evenodd__even(event_name, context, common_cpu, common_secs, common_nsecs, common_pid, common_comm, common_callchain, __probe_ip, perf_sample_dict):
    global start
    start=(common_secs * 1000000000) + common_nsecs

def probe_evenodd__even_end(event_name, context, common_cpu, common_secs, common_nsecs, common_pid, common_comm, common_callchain, __probe_ip, perf_sample_dict):

    now=common_secs * 1000000000 + common_nsecs
    duration=now - start
    print("even runtime: " + str(duration) + " ns")

Then, by running the following command, the execution time of the function even is printed:

# perf script -s perf-script.py | head
in trace_begin
even runtime: 3382 ns
even runtime: 1030 ns
even runtime: 984 ns
even runtime: 982 ns
even runtime: 1030 ns
even runtime: 1029 ns
even runtime: 977 ns
even runtime: 982 ns
even runtime: 1058 ns

On academic papers, you can cite this paper where the same technic was used:

K. P. Silva, L. F. Arcaro, D. B. de Oliveira and R. S. de Oliveira, “An Empirical Study on the Adequacy of MBPTA for Tasks Executed on a Complex Computer Architecture with Linux,” 2018 IEEE 23rd International Conference on Emerging Technologies and Factory Automation (ETFA), 2018, pp. 321-328, doi: 10.1109/ETFA.2018.8502513.

mental note: commands to be done in a fresh dev VM

Set the serial console as output:

# grubby --args="console=tty0 console=ttyS0" --update-kernel=ALL

Configure grub to use the serial port as console: Edit /etc/defaults/grub:

GRUB_SERIAL_COMMAND="serial --unit=0 --speed=115200"
GRUB_TERMINAL=serial

Update grub config:

grub2-mkconfig > /etc/grub2.cfg

Install development tools:

# dnf -y group install "development tools"
# dnf install bc openssl openssl-devel ncurses-devel dwarves vim zstd

Set up the .gitconfig

Edit bashrc:

EDITOR=vim
MAKEJ=$(( `nproc` * 2 ))
alias mk='make -j $MAKEJ'
alias mi='make -j $MAKEJ && make modules_install && make install'
alias mr='make -j $MAKEJ && make modules_install && make install && reboot'

RTSS’21 program committee

It is not news that I am trying to bring Linux close to the real-time theory, and vice-versa. The main motivation behind it is my personal desire to stay connected with the theoretical community. Over the last years, I submitted papers, got some rejects, but got some accepts too. Presented papers, and learned a lot from other researchers’ papers as well.

In another step on my academic journey, I am glad to tell you all that I was invited to be part of the Technical Program Committee (TPC) of the Real-Time Systems Symposium. The RTSS is the main real-time conference and is focused more on the foundations of the real-time theory.

Being part of the TPC means I will review some of the submitted papers, and discuss the acceptance of them. This will put me in contact with some of the main researchers of the field, and allow me also to work together with other researchers that I know, but never had the opportunity to interact in a different context.

This is also a good thing for my main community, the real-time Linux one, and I will do my best to represent us there!