• GForth on Raspi OS Using PThreads - stick-to-core

    From Christof Eberspaecher@21:1/5 to All on Sun Jul 24 08:53:38 2022
    Hi,
    background: The goal is to control a small lathe with a Raspberry Pi using GForth 0-7-9. The axis are moved by step motors. The idea is to use a maximum step frequency, that always allows to stop within one full step. So if there is a time lag, steps
    should not be lost. Linear motion is done by a separate pthread.

    I have reserved core 3 using isolcpus=3. I can start GForth with all it's tasks on core 3 with "taskset". But this is not what I want to do, as only the stepper-thread shall use this CPU for best performance. So I try to use " 3 stick-to-core" but it
    does not seem to work. Is this tested on a raspi?
    ( I use "def" instead of ":" to get a function list with Geany as a Python file. "pStat addStat" is used to calculate average and standard deviation of the lag. )

    ...
    def usTDelay ( usecs ) 1000 * stop-ns ;

    0 value stepperTask
    0 value tMax#
    0 value core#

    def startStep \ Test Timing
    stacksize4 NewTask4 dup to stepperTask
    activate decimal
    1000 usTDelay
    0 to core#
    3 stick-to-core to core# \ <<<<<<<<<<<<<<< ???
    100000 usTDelay
    begin
    utime
    \ 100000 stop-ns
    100 usTDelay
    utime 2swap d-
    d>s 100 -
    dup tMax# max to tMax#
    pStat addStat
    pause
    pStat @ 9999 > until
    begin 1000 usTDelay pause again
    ;
    ...
    The 22decimal is written to core#. As far as I understand, this should be 0. The code/thread is running but not with improved performance.

    Thanks for some hints!
    Christof

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Christof Eberspaecher on Sun Jul 24 16:45:24 2022
    Christof Eberspaecher <chwebersp@gmail.com> writes:
    I have reserved core 3 using isolcpus=3D3. I can start GForth with all it's=
    tasks on core 3 with "taskset". But this is not what I want to do, as only= the stepper-thread shall use this CPU for best performance. So I try to us=
    e " 3 stick-to-core" but it does not seem to work. Is this tested on a rasp= >i?

    Apparently not tested at all, because it always returns EINVAL before
    setting thread affinity. I have now fixed this:

    <http://git.savannah.gnu.org/cgit/gforth.git/commit/?id=c6e54a12ad1a68e2353284df9c5caf0cbdab749b>

    I have tested the result on a PC, and it now works. I would be very
    surprised if it did not on a Raspi.

    What I used for fixing this bug: <https://stackoverflow.com/questions/1407786/how-to-set-cpu-affinity-of-a-particular-pthread>,

    The 22decimal is written to core#. As far as I understand, this should be 0=

    22 as return value is EINVAL; it should be 0.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christof Eberspaecher@21:1/5 to Anton Ertl on Mon Jul 25 01:34:48 2022
    Anton Ertl schrieb am Sonntag, 24. Juli 2022 um 19:19:25 UTC+2:
    Christof Eberspaecher <chwe...@gmail.com> writes:
    I have reserved core 3 using isolcpus=3D3. I can start GForth with all it's=
    tasks on core 3 with "taskset". But this is not what I want to do, as only=
    the stepper-thread shall use this CPU for best performance. So I try to us=
    e " 3 stick-to-core" but it does not seem to work. Is this tested on a rasp=
    i?

    Apparently not tested at all, because it always returns EINVAL before setting thread affinity. I have now fixed this:

    <http://git.savannah.gnu.org/cgit/gforth.git/commit/?id=c6e54a12ad1a68e2353284df9c5caf0cbdab749b>

    I have tested the result on a PC, and it now works. I would be very surprised if it did not on a Raspi.

    What I used for fixing this bug: <https://stackoverflow.com/questions/1407786/how-to-set-cpu-affinity-of-a-particular-pthread>,

    The 22decimal is written to core#. As far as I understand, this should be 0=

    22 as return value is EINVAL; it should be 0.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html New standard: https://forth-standard.org/
    EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

    Thank you very much, Anton, for the very fast action. It now returns 0.

    Unfortunately the effect is still not good. Numbers for latency in µs: standard deviation, average, max of 10000 samples
    2 59 226 if I start GForth with all it's processes on reserved core 3 using taskset, no stick-to core
    same if I use "3 stick-to-core" on the GForth prompt before starting the test

    8 62 up to 26.000, if I start GForth just normally, no stick-to-core
    319 71 up to 21.000, if I start GForth just normally, 3 stick-to-core for the testthread
    0 stick-to-core for the testthread gives the same numbers.

    So stick-to-core seems to work now for GForth including a sub-thread but not for a sub-thread only.
    I experimented with "sched_setaffinity(" but the results have been the same.




    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to chwebersp@gmail.com on Mon Jul 25 11:05:29 2022
    In article <aff8f55b-e72e-4cc5-a99d-45b91e55874cn@googlegroups.com>,
    Christof Eberspaecher <chwebersp@gmail.com> wrote:
    Hi,
    background: The goal is to control a small lathe with a Raspberry Pi
    using GForth 0-7-9. The axis are moved by step motors. The idea is to
    use a maximum step frequency, that always allows to stop within one full >step. So if there is a time lag, steps should not be lost. Linear motion
    is done by a separate pthread.

    I have reserved core 3 using isolcpus=3. I can start GForth with all
    it's tasks on core 3 with "taskset". But this is not what I want to do,
    as only the stepper-thread shall use this CPU for best performance. So I
    try to use " 3 stick-to-core" but it does not seem to work. Is this
    tested on a raspi?

    I had some bad experience with isolating CPU's. I have an 8 bit core AMD
    and I tried to isolate a CPU for midi bitbanging (midi is a asynchronous
    serial signal of 31.25 kHz). This works as far as the software is concerned with 320 us for 10 bits, e.g. generating a 16 us high 16 us low signal,
    up till the 50 ns resolution of the logical analyser.
    You have to do other things, such as manipulating the boot up.
    I run mprime on the background, and mprime run as if there 8 cores
    available, trying to interfere with my midi. So it tried to
    run an 8the mprime on the "isolated" core. Of course I killed mprime,
    but this is a cludge.
    Furthermore, in inspecting a 32 us square wave, it was obvious that
    there are numerous interruptions with different time intervals.
    midi is robust that it doesn't care to miss-present an event from time
    to time, but there was no way I get a scale out of this.

    I managed to control mechanical instruments up till mS precision
    or better (in Forth) , the isolcpu/taskset is no good for what
    we have in mind.
    It is interesting to see whether actual examples exists for isolcpu
    that accomplishes something useful. Then start from there.

    Thanks for some hints!
    Christof

    Groetjes Albert
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christof Eberspaecher@21:1/5 to none albert on Mon Jul 25 04:26:05 2022
    none albert schrieb am Montag, 25. Juli 2022 um 11:05:32 UTC+2:
    In article <aff8f55b-e72e-4cc5...@googlegroups.com>,
    Christof Eberspaecher <chwe...@gmail.com> wrote:
    Hi,
    background: The goal is to control a small lathe with a Raspberry Pi
    using GForth 0-7-9. The axis are moved by step motors. The idea is to
    use a maximum step frequency, that always allows to stop within one full >step. So if there is a time lag, steps should not be lost. Linear motion >is done by a separate pthread.

    I have reserved core 3 using isolcpus=3. I can start GForth with all
    it's tasks on core 3 with "taskset". But this is not what I want to do,
    as only the stepper-thread shall use this CPU for best performance. So I >try to use " 3 stick-to-core" but it does not seem to work. Is this
    tested on a raspi?
    I had some bad experience with isolating CPU's. I have an 8 bit core AMD
    and I tried to isolate a CPU for midi bitbanging (midi is a asynchronous serial signal of 31.25 kHz). This works as far as the software is concerned with 320 us for 10 bits, e.g. generating a 16 us high 16 us low signal,
    up till the 50 ns resolution of the logical analyser.
    You have to do other things, such as manipulating the boot up.
    I run mprime on the background, and mprime run as if there 8 cores available, trying to interfere with my midi. So it tried to
    run an 8the mprime on the "isolated" core. Of course I killed mprime,
    but this is a cludge.
    Furthermore, in inspecting a 32 us square wave, it was obvious that
    there are numerous interruptions with different time intervals.
    midi is robust that it doesn't care to miss-present an event from time
    to time, but there was no way I get a scale out of this.

    I managed to control mechanical instruments up till mS precision
    or better (in Forth) , the isolcpu/taskset is no good for what
    we have in mind.
    It is interesting to see whether actual examples exists for isolcpu
    that accomplishes something useful. Then start from there.
    Thanks for some hints!
    Christof
    Groetjes Albert
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

    Thanks, Albert, for sharing your findings!
    At the moment using Raspi4b @ 4*1.8GHz average latency with GForth seems to be about 60 µs with seldom peaks up to 1500µs with isolcpus. If you switch windows of Chromium browser in parallel, you can produce additional large latency.

    With Preempt-RT kernel 100µs seems to be a possible limit: https://lemariva.com/blog/2019/09/raspberry-pi-4b-preempt-rt-kernel-419y-performance-test

    My hitherto existing solution for the lathe is a combination of Python on Raspi together with a Parallax Propeller 1 running Tachyon Forth for the stepping linear interpolation. I have used that with down to 500µs per 1/4 microstep. The speed limit is
    given by decreasing torque against friction. At this moment I still think/hope, that this speed can be done with GForth without RT-kernel too, if you don't do too much with the LAN in parallel. There will be some jitter though.

    Gruß Christof

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christof Eberspaecher@21:1/5 to Christof Eberspaecher on Mon Jul 25 07:30:17 2022
    Christof Eberspaecher schrieb am Montag, 25. Juli 2022 um 13:26:06 UTC+2:
    none albert schrieb am Montag, 25. Juli 2022 um 11:05:32 UTC+2:
    In article <aff8f55b-e72e-4cc5...@googlegroups.com>,
    Christof Eberspaecher <chwe...@gmail.com> wrote:
    Hi,
    background: The goal is to control a small lathe with a Raspberry Pi >using GForth 0-7-9. The axis are moved by step motors. The idea is to >use a maximum step frequency, that always allows to stop within one full >step. So if there is a time lag, steps should not be lost. Linear motion >is done by a separate pthread.

    I have reserved core 3 using isolcpus=3. I can start GForth with all >it's tasks on core 3 with "taskset". But this is not what I want to do, >as only the stepper-thread shall use this CPU for best performance. So I >try to use " 3 stick-to-core" but it does not seem to work. Is this >tested on a raspi?
    I had some bad experience with isolating CPU's. I have an 8 bit core AMD and I tried to isolate a CPU for midi bitbanging (midi is a asynchronous serial signal of 31.25 kHz). This works as far as the software is concerned
    with 320 us for 10 bits, e.g. generating a 16 us high 16 us low signal,
    up till the 50 ns resolution of the logical analyser.
    You have to do other things, such as manipulating the boot up.
    I run mprime on the background, and mprime run as if there 8 cores available, trying to interfere with my midi. So it tried to
    run an 8the mprime on the "isolated" core. Of course I killed mprime,
    but this is a cludge.
    Furthermore, in inspecting a 32 us square wave, it was obvious that
    there are numerous interruptions with different time intervals.
    midi is robust that it doesn't care to miss-present an event from time
    to time, but there was no way I get a scale out of this.

    I managed to control mechanical instruments up till mS precision
    or better (in Forth) , the isolcpu/taskset is no good for what
    we have in mind.
    It is interesting to see whether actual examples exists for isolcpu
    that accomplishes something useful. Then start from there.
    Thanks for some hints!
    Christof
    Groetjes Albert
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
    Thanks, Albert, for sharing your findings!
    At the moment using Raspi4b @ 4*1.8GHz average latency with GForth seems to be about 60 µs with seldom peaks up to 1500µs with isolcpus. If you switch windows of Chromium browser in parallel, you can produce additional large latency.

    With Preempt-RT kernel 100µs seems to be a possible limit: https://lemariva.com/blog/2019/09/raspberry-pi-4b-preempt-rt-kernel-419y-performance-test

    My hitherto existing solution for the lathe is a combination of Python on Raspi together with a Parallax Propeller 1 running Tachyon Forth for the stepping linear interpolation. I have used that with down to 500µs per 1/4 microstep. The speed limit is
    given by decreasing torque against friction. At this moment I still think/hope, that this speed can be done with GForth without RT-kernel too, if you don't do too much with the LAN in parallel. There will be some jitter though.

    Gruß Christof

    Ha, EDIT: I had done wrong the part of inserting isolcpus=3 into cmdline.txt. It must be inserted into the one and only line! I have had it in a second line.
    Now it is possible to start only one pthread in core 3 and maximum latency is <150µs. :-)

    Christof

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Christof Eberspaecher on Mon Jul 25 15:03:39 2022
    Christof Eberspaecher <chwebersp@gmail.com> writes:
    Ha, EDIT: I had done wrong the part of inserting isolcpus=3D3 into cmdline.= >txt. It must be inserted into the one and only line! I have had it in a sec= >ond line.
    Now it is possible to start only one pthread in core 3 and maximum latency = >is <150=C2=B5s. :-)

    Congratulations!

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christof Eberspaecher@21:1/5 to none albert on Tue Jul 26 03:13:47 2022
    none albert schrieb am Montag, 25. Juli 2022 um 11:05:32 UTC+2:
    In article <aff8f55b-e72e-4cc5...@googlegroups.com>,
    Christof Eberspaecher <chwe...@gmail.com> wrote:
    Hi,
    background: The goal is to control a small lathe with a Raspberry Pi
    using GForth 0-7-9. The axis are moved by step motors. The idea is to
    use a maximum step frequency, that always allows to stop within one full >step. So if there is a time lag, steps should not be lost. Linear motion
    is done by a separate pthread.

    I have reserved core 3 using isolcpus=3. I can start GForth with all
    it's tasks on core 3 with "taskset". But this is not what I want to do,
    as only the stepper-thread shall use this CPU for best performance. So I >try to use " 3 stick-to-core" but it does not seem to work. Is this
    tested on a raspi?
    I had some bad experience with isolating CPU's. I have an 8 bit core AMD
    and I tried to isolate a CPU for midi bitbanging (midi is a asynchronous serial signal of 31.25 kHz). This works as far as the software is concerned with 320 us for 10 bits, e.g. generating a 16 us high 16 us low signal,
    up till the 50 ns resolution of the logical analyser.
    You have to do other things, such as manipulating the boot up.
    I run mprime on the background, and mprime run as if there 8 cores
    available, trying to interfere with my midi. So it tried to
    run an 8the mprime on the "isolated" core. Of course I killed mprime,
    but this is a cludge.
    Furthermore, in inspecting a 32 us square wave, it was obvious that
    there are numerous interruptions with different time intervals.
    midi is robust that it doesn't care to miss-present an event from time
    to time, but there was no way I get a scale out of this.

    I managed to control mechanical instruments up till mS precision
    or better (in Forth) , the isolcpu/taskset is no good for what
    we have in mind.
    It is interesting to see whether actual examples exists for isolcpu
    that accomplishes something useful. Then start from there.
    Thanks for some hints!
    Christof
    Groetjes Albert
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

    Hi Albert,
    Perhaps some findings about latency with isolated core are interesting. This s ferquency distribution of latencies. "56" means, that a delay, which should be 100microsecs was actually 156microseconds.
    44 0
    50 0
    56 22184148
    63 2200279
    70 72041
    79 98707
    89 13951
    100 4009
    112 1472
    125 186
    141 69
    158 21
    177 2
    199 1
    223 0
    251 0
    Maximum latency of this measurement was 214 microseconds.
    Raspi4b @1.8GHz with one isolated core, which was exclusively used to run the testthread. To use GForth-fast has no significant effect.
    So, yes, 32kbaud midi serial, which needs resultion better than 16us would clearly not be possible.
    If I think, that a resolution of 250us should be possible then a step frequency of 2000Hz will be possible.

    The motor winding inductance is relatively high with 7mH and my motor driver has only 24V. For full torque the motor is rated for 1.4A R=3.1Ohm.
    So 20V/1.4A= 14Ohms inductive resistance. 14/(2*pi*0.007)= 100Hz full step frequency for max torque. I do not have the data of the motor to calculate counter-emf. So all-in-all 2000Hz seems not to be the show stopper.

    It is also interesting, that in this environment of linux, using GForth seems not to be the speed limiting factor for this application. :-)
    Christof

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christof Eberspaecher@21:1/5 to Christof Eberspaecher on Tue Jul 26 08:25:49 2022
    Christof Eberspaecher schrieb am Dienstag, 26. Juli 2022 um 12:13:49 UTC+2:
    none albert schrieb am Montag, 25. Juli 2022 um 11:05:32 UTC+2:
    In article <aff8f55b-e72e-4cc5...@googlegroups.com>,
    Christof Eberspaecher <chwe...@gmail.com> wrote:
    Hi,
    background: The goal is to control a small lathe with a Raspberry Pi >using GForth 0-7-9. The axis are moved by step motors. The idea is to >use a maximum step frequency, that always allows to stop within one full >step. So if there is a time lag, steps should not be lost. Linear motion >is done by a separate pthread.

    I have reserved core 3 using isolcpus=3. I can start GForth with all >it's tasks on core 3 with "taskset". But this is not what I want to do, >as only the stepper-thread shall use this CPU for best performance. So I >try to use " 3 stick-to-core" but it does not seem to work. Is this >tested on a raspi?
    I had some bad experience with isolating CPU's. I have an 8 bit core AMD and I tried to isolate a CPU for midi bitbanging (midi is a asynchronous serial signal of 31.25 kHz). This works as far as the software is concerned
    with 320 us for 10 bits, e.g. generating a 16 us high 16 us low signal,
    up till the 50 ns resolution of the logical analyser.
    You have to do other things, such as manipulating the boot up.
    I run mprime on the background, and mprime run as if there 8 cores available, trying to interfere with my midi. So it tried to
    run an 8the mprime on the "isolated" core. Of course I killed mprime,
    but this is a cludge.
    Furthermore, in inspecting a 32 us square wave, it was obvious that
    there are numerous interruptions with different time intervals.
    midi is robust that it doesn't care to miss-present an event from time
    to time, but there was no way I get a scale out of this.

    I managed to control mechanical instruments up till mS precision
    or better (in Forth) , the isolcpu/taskset is no good for what
    we have in mind.
    It is interesting to see whether actual examples exists for isolcpu
    that accomplishes something useful. Then start from there.
    Thanks for some hints!
    Christof
    Groetjes Albert
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
    Hi Albert,
    Perhaps some findings about latency with isolated core are interesting. This s ferquency distribution of latencies. "56" means, that a delay, which should be 100microsecs was actually 156microseconds.
    44 0
    50 0
    56 22184148
    63 2200279
    70 72041
    79 98707
    89 13951
    100 4009
    112 1472
    125 186
    141 69
    158 21
    177 2
    199 1
    223 0
    251 0
    Maximum latency of this measurement was 214 microseconds.
    Raspi4b @1.8GHz with one isolated core, which was exclusively used to run the testthread. To use GForth-fast has no significant effect.
    So, yes, 32kbaud midi serial, which needs resultion better than 16us would clearly not be possible.
    If I think, that a resolution of 250us should be possible then a step frequency of 2000Hz will be possible.

    The motor winding inductance is relatively high with 7mH and my motor driver has only 24V. For full torque the motor is rated for 1.4A R=3.1Ohm.
    So 20V/1.4A= 14Ohms inductive resistance. 14/(2*pi*0.007)= 100Hz full step frequency for max torque. I do not have the data of the motor to calculate counter-emf. So all-in-all 2000Hz seems not to be the show stopper.

    It is also interesting, that in this environment of linux, using GForth seems not to be the speed limiting factor for this application. :-)
    Christof

    And one additional finding:
    If a very dumb delay routine is used, which does not involve the task sceduler:

    def usIdDelay ( us -- )
    s>d utime d+
    begin
    2dup utime d-
    d0<= until
    2drop
    ;

    Then latency is reduced to <=110us and most times latency is <10µs.

    1. 3105238
    1.12201845430196 2246084
    1.25892541179417 0
    1.41253754462275 0
    1.58489319246111 0
    1.77827941003892 0
    1.99526231496888 285790
    2.23872113856834 0
    2.51188643150958 0
    2.81838293126445 10043
    3.16227766016838 0
    3.54813389233576 0
    3.98107170553497 8904
    4.46683592150963 1572
    5.01187233627272 0
    5.62341325190349 998
    6.30957344480193 662
    7.07945784384138 0
    7.94328234724282 493
    8.91250938133745 382
    10. 576
    11.2201845430196 273
    12.5892541179417 453
    14.1253754462275 174
    15.8489319246111 321
    17.7827941003892 294
    19.9526231496888 330
    22.3872113856834 255
    25.1188643150958 212
    28.1838293126445 138
    31.6227766016838 187
    35.4813389233575 121
    39.8107170553497 80
    44.6683592150963 57
    50.1187233627272 32
    56.2341325190349 17
    63.0957344480193 10
    70.7945784384138 10
    79.4328234724281 6
    89.1250938133745 4
    100. 5
    112.201845430196 0

    So the pthreads sceduler seems to be responsible for about 55us latency. (Downside: you can't kill the thread anymore from outside.....)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)