• Quicksort (was: recursion is silly)

    From Anton Ertl@21:1/5 to Anton Ertl on Sat Dec 30 15:54:52 2023
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes: >anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    \ SORTI (slightly slower than SORTM, slightly faster than SORTI2)

    : sorti ( addr u -- )
    cells over + 0 -rot begin ( 0 al1 ah1 .. aln ahn )
    begin
    over cell+ over u>= while
    2drop dup 0= if
    drop exit then
    repeat
    partition
    again ;

    Let's see if we can improve on this with the extended CASE (untested):

    : sorti3 ( addr u -- )
    cells over + 0 -rot case ( 0 al1 ah1 .. aln ahn )
    over cell+ over u< ?of partition contof
    2drop dup ?of contof
    endcase
    drop ;

    It looks neater and gets rid of the EXIT in the middle.

    But it suffers from me falling into Eaker's trap: ENDCASE already
    performs a DROP, so the DROP at the end is wrong:

    : sorti3 ( addr u -- )
    cells over + 0 -rot case ( 0 al1 ah1 .. aln ahn )
    over cell+ over u< ?of partition contof
    2drop dup ?of contof
    endcase ;

    Alternatively, one could write

    : sorti3a ( addr u -- )
    cells over + 0 -rot case ( 0 al1 ah1 .. aln ahn )
    over cell+ over u< ?of partition contof
    2drop dup 0= ?of drop endof
    next-case ;

    Let's see where we get with the cleaner SORTI2 (based on work by
    Albert van der Horst):

    : sorti2 ( addr u -- )
    cells over + 0 -rot begin ( 0 al1 ah1 .. aln ahn )
    over cell+ over u< if
    partition
    else
    2drop
    then
    dup 0= until
    drop ;

    I guess a reason why this is slower than SORTI and SORTI3 is that it
    also checks for the end after the call to PARTITION. Let's eliminate
    that:

    : sorti4 ( addr u -- )
    cells over + 0 -rot begin ( 0 al1 ah1 .. aln ahn )
    begin
    over cell+ over u< while
    partition
    repeat
    2drop
    dup 0= until
    drop ;

    Let's see the results (on the 3800MHz Golden/Raptor Cove core of a
    Core i3-1315U):

    vfx64
    m-a m2 m r i i2 i4 d
    2.30 2.26 2.27 2.30 2.26 2.27 2.26 2.33 4000 * 10000
    2.74 2.72 2.71 2.74 2.73 2.72 2.72 2.78 40000 * 1000
    3.19 3.15 3.15 3.17 3.16 3.16 3.15 3.23 400000 * 100
    3.64 3.59 3.60 3.62 3.62 3.61 3.61 3.68 4000000 * 10
    4.10 4.05 4.08 4.09 4.07 4.05 4.08 4.14 40000000 * 1

    gforth-fast
    m-a m2 m r i i2 i3 i4 d
    3.27 3.17 3.17 3.28 3.20 3.23 3.17 3.19 3.25 4000 * 10000
    3.93 3.82 3.83 3.93 3.86 3.90 3.83 3.84 3.94 40000 * 1000
    4.57 4.45 4.47 4.58 4.49 4.53 4.47 4.48 4.59 400000 * 100
    5.21 5.10 5.10 5.21 5.12 5.16 5.11 5.13 5.22 4000000 * 10
    5.88 5.79 5.78 5.89 5.81 5.87 5.76 5.80 5.89 40000000 * 1

    The numbers are times in seconds; 400000 * 100 means that the
    benchmark has sorted 100 random integer arrays of 400000 elements
    each. The column headers have the following meaning:

    m: recursive, but convert the tail-recursion into a loop
    m-a: m, but arrange to process the smaller partition first
    m2: m, but arrange partitions until they are smaller than 256KB
    r: recursive on both partitions
    i: sorti above
    i2: sorti2 above
    i3: sorti3 above (not measured on VFX)
    i4: sorti4 above
    d: like sorti2, but uses DEPTH for terminating the loop

    We see that the arrangement overhead of m-a costs too much if it is
    done at all levels. Doing it only for large partitions (m2) seems to
    help a little for large arrays (but I expected more).

    The other measurements don't arrange the partitions and are therefore
    more comparable to each other. d is clearly slower than the others.
    r is clearly slower on gforth-fast and slightly slower on VFX64. m,
    i, i2 and i4 have around the same performance on VFX64. On
    gforth-fast, m and i3 seem to have the same performance, and i, i2,
    and i4 have slight differences, with the order being m/i2, i4, i, i2
    (from fastest to slowest). These differences are so small that we can
    choose among them by preference.

    For comparison, here are the measurements from <2013Aug15.191549@mips.complang.tuwien.ac.at> on a 3GHz Core 2 Duo
    E8400:

    vfxlin
    m-a m r i i2 d
    0.70 0.68 0.69 0.68 0.69 0.80 1000 * 10000
    0.84 0.83 0.84 0.84 0.85 0.95 10000 * 1000
    1.00 0.98 1.00 1.00 1.01 1.12 100000 * 100
    1.15 1.14 1.14 1.15 1.16 1.27 1000000 * 10
    1.27 1.28 1.28 1.29 1.30 1.43 10000000 * 1

    gforth-fast 64-bit
    m-a m r i i2 d
    2.22 2.10 2.16 2.12 2.13 2.53 1000 * 10000
    2.73 2.61 2.66 2.62 2.64 3.03 10000 * 1000
    3.22 3.11 3.16 3.12 3.14 3.53 100000 * 100
    3.72 3.61 3.66 3.62 3.63 4.03 1000000 * 10
    4.26 4.14 4.21 4.14 4.18 4.57 10000000 * 1

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2023: https://euro.theforth.net/2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gerry Jackson@21:1/5 to Anton Ertl on Thu Jan 4 15:08:29 2024
    On 30/12/2023 15:54, Anton Ertl wrote:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
    \ SORTI (slightly slower than SORTM, slightly faster than SORTI2)

    : sorti ( addr u -- )
    cells over + 0 -rot begin ( 0 al1 ah1 .. aln ahn )
    begin
    over cell+ over u>= while
    2drop dup 0= if
    drop exit then
    repeat
    partition
    again ;

    Let's see if we can improve on this with the extended CASE (untested):

    : sorti3 ( addr u -- )
    cells over + 0 -rot case ( 0 al1 ah1 .. aln ahn )
    over cell+ over u< ?of partition contof
    2drop dup ?of contof
    endcase
    drop ;

    It looks neater and gets rid of the EXIT in the middle.

    But it suffers from me falling into Eaker's trap: ENDCASE already
    performs a DROP, so the DROP at the end is wrong:

    : sorti3 ( addr u -- )
    cells over + 0 -rot case ( 0 al1 ah1 .. aln ahn )
    over cell+ over u< ?of partition contof
    2drop dup ?of contof
    endcase ;


    Compare this with using the FOR-EACH construct discussed recently. THE
    FOR-EACH definitions are:

    : cs-drop-dest ( CS: dest -- ) postpone true postpone until ;

    \ A workaround to get the MAP with locals working in GForth
    [undefined] assume-live [if] : assume-live ; immediate [then]

    : for-each
    postpone ahead postpone assume-live postpone begin postpone 0<
    postpone if postpone begin 3 cs-roll postpone then
    ; immediate

    synonym do[ while [defined] [SwiftForth] [if] immediate [then]

    : ]next
    postpone repeat postpone then cs-drop-dest
    ; immediate

    \ |> is the pipeline operator
    : |> ( 0 -- ) \ or ( n -- n )
    3 cs-pick postpone ?dup postpone 0= postpone until
    ; immediate

    And SORTI3 using FOR-EACH could be:

    : sorti3 ( ad u -- )
    cells over + 0 -rot
    for-each true
    do[ over cell+ over u< dup >r if partition then r> |>
    2drop dup 0= abs |>
    ]next drop
    ;

    Untested on a real sort but tested with dummy data and dummy code for
    the pipeline operations.

    SORTi3 is written like that for comparison with your definition. I would probably write it as:

    : sort-partition \ There is probably a better name
    over cell+ over u< dup >r if partition then r>
    ;

    : ?done ( ad -- ad 0|1 ) dup 0= abs ;

    : sorti3 ( ad u -- )
    cells over + 0 -rot
    for-each true
    do[ sort-partition |> 2drop ?done |> ]next drop
    ;

    The FOR-EACH TRUE is a bit awkward (i.e. no iterator needed) and
    supports RUVIM's comment that the the iterator should just be first in
    the pipeline. An alternative might be defining FOR[ that omits an
    iterator and DO[. i.e. ... FOR[ ... |> ... etc ]NEXT

    --
    Gerry

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)