Some time ago, while using perf to check the automaton model, I noticed that perf was losing events. The same was reproducible with ftrace.
Steve pointed to a problem in the identification of the context execution used by the recursion control.
Currently, recursion control uses the preempt_counter to identify the current context. The NMI/HARD/SOFT IRQ counters are set in the preempt_counter in the irq_enter/exit functions.
In a trace, they are set like this:
0) ==========> | 0) | do_IRQ() { /* First C function */ 0) | irq_enter() { 0) | /* set the IRQ context. */ 0) 1.081 us | } 0) | handle_irq() { 0) | /* IRQ handling code */ 0) + 10.290 us | } 0) | irq_exit() { 0) | /* unset the IRQ context. */ 0) 6.657 us | } 0) + 18.995 us | } 0) <========== |
As one can see, functions (and events) that take place before the set and after unset the preempt_counter are identified in the wrong context, causing the miss interpretation that recursion is taking place. When this happens, events are dropped.
To resolve this problem, the set/unset of the IRQ/NMI context needs to be done before the execution of the first C execution, and after its return. By doing so, and using this method to identify the context in the trace recursion protection, no more events are lost.
A possible solution is to use a per-cpu variable set and unset in the entry point of NMI/IRQs, before calling the C handler.
This possible solution is presented in this patch series as a proof of concept, for x86_64. Let’s see what kind of comments we will receive!