Using perf probe to measure execution time of user-space code on Linux

Some time ago, I used perf probe to capture some execution time on a user-space application to write the paper An Empirical Study on the Adequacy of MBPTA for Tasks Executed on a Complex Computer Architecture with Linux. End every once in a while someone asks me how to measure the execution time of a function, or any kind of “block of code” in user-space. So, this is a mental note about how to do this.

Here is the sample program that I will instrument, it is named evenodd.c:

void even(void)
{
     printf("it is even\n");
}

void odd(void)
{
     printf("it is odd\n");
}
int main()
{
     int i;
     for (i = 0; ;i++) {
          if (i % 2)
               odd();
          else
               even();
     }
}

Compile even odd with debug symbols:

gcc -o evenodd evenodd.c

The command below shows the function void even(void), from the evenodd binary in the same directory:

# perf probe --line even -x evenodd
  <even@/home/bristot/evenodd.c:0>
      0  void even(void)
         {
      2         printf("it is even\n");
      3  }
         
         void odd(void)
         {

To add a perf probe at the beginning of the even() function, run the following command:

# perf probe -x evenodd 'even'

To get the execution time of a given function, we need to add a probe to the last possible line of the function. As shown in the perf probe –line example, it is line 3. The command below adds another probe named even_end event at line 3 of the function event() on the evenondd binary.

# perf probe -x evenodd 'even_end=even:3'

With this, we have all the instrumentation set, and we can record the trace using perf record.

After start running the evenodd application, run the following command to record the entry and exit probes for the even() execution for 10 seconds:

# perf record -e probe_evenodd:even -e probe_evenodd:even_end -a sleep 10

After then record finishes, It is then possible to print the trace using the following command:

# perf script
<output example>
evenodd 101291 [001] 83653.513255: probe_evenodd:even: (401126)
evenodd 101291 [001] 83653.513259: probe_evenodd:even_end: (401134)
evenodd 101291 [001] 83653.513260: probe_evenodd:even: (401126)
evenodd 101291 [001] 83653.513261: probe_evenodd:even_end: (401134)
evenodd 101291 [001] 83653.513262: probe_evenodd:even: (401126)
evenodd 101291 [001] 83653.513263: probe_evenodd:even_end: (401134)
evenodd 101291 [001] 83653.513264: probe_evenodd:even: (401126)
evenodd 101291 [001] 83653.513265: probe_evenodd:even_end: (401134)
evenodd 101291 [001] 83653.513266: probe_evenodd:even: (401126)
evenodd 101291 [001] 83653.513267: probe_evenodd:even_end: (401134)

It is also possible to automatize the computation of the execution time using a perf python script. After capturing a trace of these two events, the following command will generate a skeleton of the python script named perf-script.py:

# perf script -g python

Then edit the probe handlers to compute the execution time. Bellow, you can see one example of the part that matters:

# Global variable to be used to record the start used at the end
start=0
def probe_evenodd__even(event_name, context, common_cpu, common_secs, common_nsecs, common_pid, common_comm, common_callchain, __probe_ip, perf_sample_dict):
    global start
    start=(common_secs * 1000000000) + common_nsecs

def probe_evenodd__even_end(event_name, context, common_cpu, common_secs, common_nsecs, common_pid, common_comm, common_callchain, __probe_ip, perf_sample_dict):

    now=common_secs * 1000000000 + common_nsecs
    duration=now - start
    print("even runtime: " + str(duration) + " ns")

Then, by running the following command, the execution time of the function even is printed:

# perf script -s perf-script.py | head
in trace_begin
even runtime: 3382 ns
even runtime: 1030 ns
even runtime: 984 ns
even runtime: 982 ns
even runtime: 1030 ns
even runtime: 1029 ns
even runtime: 977 ns
even runtime: 982 ns
even runtime: 1058 ns

On academic papers, you can cite this paper where the same technic was used:

K. P. Silva, L. F. Arcaro, D. B. de Oliveira and R. S. de Oliveira, “An Empirical Study on the Adequacy of MBPTA for Tasks Executed on a Complex Computer Architecture with Linux,” 2018 IEEE 23rd International Conference on Emerging Technologies and Factory Automation (ETFA), 2018, pp. 321-328, doi: 10.1109/ETFA.2018.8502513.