• How I tripped over sched_rt_runtime_us

    From Grant Edwards@21:1/5 to All on Wed Aug 26 17:35:04 2015
    How I Tripped Over sched_rt_runtime_us


    Don't Try to Control What You Can't Measure

    I recently spent almost two weeks banging my head on the wall trying
    to figure out why some software I was working on was failing. There
    were a number of SCHED_FIFO tasks triggered by timers. They ran with
    periods of typically 4,8,10 and 20 ms. Each woke up based on a
    periodic timerfd and ran for a few hundred microseconds. Most times
    when I would start the program, everything ran fine -- and would run
    fine indefinitely.

    But, about 1 time out of 20 maybe, I would start up the programs, and
    they just wouldn't work right. Hardware devices would repeatedly
    time-out because they weren't being serviced. There would be gaps in
    message streams, etc.

    After poking about for a few days, I determined that a thread that was
    supposed to be waking up every 4ms (for example), would run fine for
    "a while" and then just "not run" for many tens of ms. Sometimes it
    just didn't wake up, and sometimes it got suspended while it was
    running. When running, it wasn't doing anything that could block.
    When it got suspended while running, the location where it got
    suspended was pretty much random.

    After instrumenting code a bit more, I determined that the most common
    failure state resulted in all my SCHED_FIFO threads being suspended
    for 50ms once every second. _Exactly_ 50ms, _exactly_ once every

    Changing to SCHED_RR didn't change anything. Rearranging the
    priorities didn't change anything. However when I changed one of the
    threads to SCHED_OTHER, it started running without interruption.

    The SCHED_OTHER thread was running _while_ SCHED_FIFO threads were
    being suspended "against their will".

    I had read about the real-time throttling feature in the kernel, but
    it was only supposed to kick in when realtime tasks were using up 95%
    of CPU. I had just measured my CPU usage, and the realtime tasks were
    barely using 5% -- not anywhere close to the 95% limit.

    But still, SCHED_FIFO threads being suspended for 50ms every 1s while SCHED_OTHER threads continue to run. That sure smells like somebody is intentionally blocking all realtime tasks 5% of the time.

    Then it dawned on my that (at least on the ARM9), the mechanism used
    by the kernel to measure per-thread CPU usage is based on the
    assumption that tasks execute _randomly_ and their execution times are
    not correlated with kernel timers.

    Mine _are_ controlled by kernel timers.

    If you're running threads triggered by timers, then the kernel's CPU
    usage numbers are complete garbage[*]. I knew this, and in the past had
    to work out my own CPU usage measurement schemes because of it.

    _If_ a task is a realtime task, _and_ it happens to be in phase with
    the CPU usage sampling, then the realtime usage looks like it's 100%
    and the sched_rt_runtime_us limiter kicks in and locks out all
    realtime threads for 50ms once every 1s. In this case, that breaks
    all sorts of seemingly unrelated stuff.

    The moral to the story: if you're are using an architecture with
    without _real_ per-thread CPU usage measurements and you are
    triggering realtime tasks using timers, you _must_ disable the
    realtime limiter by setting sched_rt_runtime_us to -1 because the
    limiter is being fed CPU measurements that are, in effect, random.

    * If you have a thread that uses 2% of the CPU time, but it's
    synchronized with the CPU usage sampling, then it will usually
    appear to use either 0% or 100% depending on the phase difference
    between the sampling and the task's execution. If the phase
    difference is _just_ right you will see a number that varies wildly
    between 0 and 100.

    Grant Edwards grant.b.edwards Yow! I want the presidency
    at so bad I can already taste
    gmail.com the hors d'oeuvres.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)