• benchmark timings with gforth 0.7.3

    From none) (albert@21:1/5 to All on Tue Jun 7 18:50:25 2022
    I have a benchmark with the infamous byte benchmark repeated 10000
    times.

    The timings with mpeforth,swiftforth,lina and optimised-lina and gforth-fast are reasonably reproducible, say at most 10 percent, Mo sly better.
    E.g.
    time 2>&1 nice -20 gforth-fast ./sieve10k.frt
    give 3.3 seconds on my AMD 64 bit 4Ghz, all the time.

    However
    time 2>&1 nice -20 gforth ./sieve10k.frt
    gives 6.5 seconds and then the second time e.g. 4.2 seconds.

    What makes gforth 0.7.3 behave differently?

    groetjes Albert

    P.S. a typical testoutput is
    lina plain
    4.90user 0.00system 0:04.91elapsed 99%CPU (0avgtext+0avgdata 1348maxresident)k 0inputs+0outputs (0major+87minor)pagefaults 0swaps
    gforth plain
    5.97user 0.00system 0:05.97elapsed 99%CPU (0avgtext+0avgdata 3104maxresident)k 0inputs+0outputs (0major+399minor)pagefaults 0swaps
    gforth fast
    3.33user 0.00system 0:03.34elapsed 99%CPU (0avgtext+0avgdata 3068maxresident)k 0inputs+0outputs (0major+342minor)pagefaults 0swaps
    lina optimised
    $Revision: 1.21 $
    0.88user 0.00system 0:00.88elapsed 99%CPU (0avgtext+0avgdata 1348maxresident)k 0inputs+0outputs (0major+84minor)pagefaults 0swaps
    swiftforth
    0.88user 0.00system 0:00.88elapsed 99%CPU (0avgtext+0avgdata 1876maxresident)k 0inputs+0outputs (0major+216minor)pagefaults 0swaps
    mpeforth
    0.69user 0.00system 0:00.69elapsed 100%CPU (0avgtext+0avgdata 1780maxresident)k 0inputs+0outputs (0major+2128minor)pagefaults 0swaps
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to albert@cherry. on Wed Jun 8 12:05:56 2022
    albert@cherry.(none) (albert) writes:
    I have a benchmark with the infamous byte benchmark repeated 10000
    times.

    The timings with mpeforth,swiftforth,lina and optimised-lina and gforth-fast >are reasonably reproducible, say at most 10 percent, Mo sly better.
    E.g.
    time 2>&1 nice -20 gforth-fast ./sieve10k.frt
    give 3.3 seconds on my AMD 64 bit 4Ghz, all the time.

    However
    time 2>&1 nice -20 gforth ./sieve10k.frt
    gives 6.5 seconds and then the second time e.g. 4.2 seconds.

    What makes gforth 0.7.3 behave differently?

    Nothing particular to gforth-0.7.3 that I can think of.

    One thing that I can think of, but that would affect everything is if
    you are using a CPU with SMT (aka Hyperthreading). If another thread
    runs on the same core, this tends to slow down your thread while still
    seeming to take 100% CPU time (unlike classic OS time-multiplexing of
    CPUs, where you get a longer elapsed time, but roughly the same user
    and system time for running the same job while competing with another
    job for CPU resources.

    gforth plain
    5.97user 0.00system 0:05.97elapsed 99%CPU (0avgtext+0avgdata 3104maxresident)k >0inputs+0outputs (0major+399minor)pagefaults 0swaps

    user=elapsed means that gforth ran exclusively on the thread.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to none albert on Wed Jun 8 06:20:17 2022
    On Tuesday, June 7, 2022 at 6:50:29 PM UTC+2, none albert wrote:
    I have a benchmark with the infamous byte benchmark repeated 10000
    times.

    The timings with mpeforth,swiftforth,lina and optimised-lina and gforth-fast are reasonably reproducible, say at most 10 percent, Mo sly better.
    E.g.
    time 2>&1 nice -20 gforth-fast ./sieve10k.frt
    give 3.3 seconds on my AMD 64 bit 4Ghz, all the time.

    However
    time 2>&1 nice -20 gforth ./sieve10k.frt
    gives 6.5 seconds and then the second time e.g. 4.2 seconds.

    What makes gforth 0.7.3 behave differently?
    [..]

    I don't know about Gforth, but I have had problems with power saving
    schemes on Windows. The typical (non-high-performance) scheme
    leads to disks being send to sleep (sometimes seconds delay). Even
    when that is not the case, the performance is about 50% of what is
    possible.
    There is of course also a cache effect if there is not enough memory.

    What happens if you run the test more than twice (say 10 times)?

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Marcel Hendrix on Wed Jun 8 14:27:05 2022
    Marcel Hendrix <mhx@iae.nl> writes:
    On Tuesday, June 7, 2022 at 6:50:29 PM UTC+2, none albert wrote:
    I have a benchmark with the infamous byte benchmark repeated 10000
    times.

    The timings with mpeforth,swiftforth,lina and optimised-lina and gforth-fast >> are reasonably reproducible, say at most 10 percent, Mo sly better.
    E.g.
    time 2>&1 nice -20 gforth-fast ./sieve10k.frt
    give 3.3 seconds on my AMD 64 bit 4Ghz, all the time.

    However
    time 2>&1 nice -20 gforth ./sieve10k.frt
    gives 6.5 seconds and then the second time e.g. 4.2 seconds.

    What makes gforth 0.7.3 behave differently?
    [..]

    I don't know about Gforth, but I have had problems with power saving
    schemes on Windows.

    Good point. CPUs these days don't just run at 4GHz. Instead, it
    depends on a number of factors, including how CPU-intensive the job is
    (should not be a problem in this case), how many other cores are
    loaded, the total power consumption, and the temperature of the CPU.
    That's one reason why I like to measure cycles rather than seconds for CPU-intensive stuff like this.

    There is of course also a cache effect if there is not enough memory.

    The Byte sieve that he measured should easily be within the L1 cache.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)