Forum: >>> Magnum BBS <<<

Quicksort (was: recursion is silly)

From Anton Ertl@21:1/5 to Anton Ertl on Sat Dec 30 15:54:52 2023

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes: >anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

\ SORTI (slightly slower than SORTM, slightly faster than SORTI2)

: sorti ( addr u -- )
cells over + 0 -rot begin ( 0 al1 ah1 .. aln ahn )
begin
over cell+ over u>= while
2drop dup 0= if
drop exit then
repeat
partition
again ;

Let's see if we can improve on this with the extended CASE (untested):

: sorti3 ( addr u -- )
cells over + 0 -rot case ( 0 al1 ah1 .. aln ahn )
over cell+ over u< ?of partition contof
2drop dup ?of contof
endcase
drop ;

It looks neater and gets rid of the EXIT in the middle.

But it suffers from me falling into Eaker's trap: ENDCASE already
performs a DROP, so the DROP at the end is wrong:

: sorti3 ( addr u -- )
cells over + 0 -rot case ( 0 al1 ah1 .. aln ahn )
over cell+ over u< ?of partition contof
2drop dup ?of contof
endcase ;

Alternatively, one could write

: sorti3a ( addr u -- )
cells over + 0 -rot case ( 0 al1 ah1 .. aln ahn )
over cell+ over u< ?of partition contof
2drop dup 0= ?of drop endof
next-case ;

Let's see where we get with the cleaner SORTI2 (based on work by
Albert van der Horst):

: sorti2 ( addr u -- )
cells over + 0 -rot begin ( 0 al1 ah1 .. aln ahn )
over cell+ over u< if
partition
else
2drop
then
dup 0= until
drop ;

I guess a reason why this is slower than SORTI and SORTI3 is that it
also checks for the end after the call to PARTITION. Let's eliminate
that:

: sorti4 ( addr u -- )
cells over + 0 -rot begin ( 0 al1 ah1 .. aln ahn )
begin
over cell+ over u< while
partition
repeat
2drop
dup 0= until
drop ;

Let's see the results (on the 3800MHz Golden/Raptor Cove core of a
Core i3-1315U):

vfx64
m-a m2 m r i i2 i4 d
2.30 2.26 2.27 2.30 2.26 2.27 2.26 2.33 4000 * 10000
2.74 2.72 2.71 2.74 2.73 2.72 2.72 2.78 40000 * 1000
3.19 3.15 3.15 3.17 3.16 3.16 3.15 3.23 400000 * 100
3.64 3.59 3.60 3.62 3.62 3.61 3.61 3.68 4000000 * 10
4.10 4.05 4.08 4.09 4.07 4.05 4.08 4.14 40000000 * 1

gforth-fast
m-a m2 m r i i2 i3 i4 d
3.27 3.17 3.17 3.28 3.20 3.23 3.17 3.19 3.25 4000 * 10000
3.93 3.82 3.83 3.93 3.86 3.90 3.83 3.84 3.94 40000 * 1000
4.57 4.45 4.47 4.58 4.49 4.53 4.47 4.48 4.59 400000 * 100
5.21 5.10 5.10 5.21 5.12 5.16 5.11 5.13 5.22 4000000 * 10
5.88 5.79 5.78 5.89 5.81 5.87 5.76 5.80 5.89 40000000 * 1

The numbers are times in seconds; 400000 * 100 means that the
benchmark has sorted 100 random integer arrays of 400000 elements
each. The column headers have the following meaning:

m: recursive, but convert the tail-recursion into a loop
m-a: m, but arrange to process the smaller partition first
m2: m, but arrange partitions until they are smaller than 256KB
r: recursive on both partitions
i: sorti above
i2: sorti2 above
i3: sorti3 above (not measured on VFX)
i4: sorti4 above
d: like sorti2, but uses DEPTH for terminating the loop

We see that the arrangement overhead of m-a costs too much if it is
done at all levels. Doing it only for large partitions (m2) seems to
help a little for large arrays (but I expected more).

The other measurements don't arrange the partitions and are therefore
more comparable to each other. d is clearly slower than the others.
r is clearly slower on gforth-fast and slightly slower on VFX64. m,
i, i2 and i4 have around the same performance on VFX64. On
gforth-fast, m and i3 seem to have the same performance, and i, i2,
and i4 have slight differences, with the order being m/i2, i4, i, i2
(from fastest to slowest). These differences are so small that we can
choose among them by preference.

For comparison, here are the measurements from <2013Aug15.191549@mips.complang.tuwien.ac.at> on a 3GHz Core 2 Duo
E8400:

vfxlin
m-a m r i i2 d
0.70 0.68 0.69 0.68 0.69 0.80 1000 * 10000
0.84 0.83 0.84 0.84 0.85 0.95 10000 * 1000
1.00 0.98 1.00 1.00 1.01 1.12 100000 * 100
1.15 1.14 1.14 1.15 1.16 1.27 1000000 * 10
1.27 1.28 1.28 1.29 1.30 1.43 10000000 * 1

gforth-fast 64-bit
m-a m r i i2 d
2.22 2.10 2.16 2.12 2.13 2.53 1000 * 10000
2.73 2.61 2.66 2.62 2.64 3.03 10000 * 1000
3.22 3.11 3.16 3.12 3.14 3.53 100000 * 100
3.72 3.61 3.66 3.62 3.63 4.03 1000000 * 10
4.26 4.14 4.21 4.14 4.18 4.57 10000000 * 1

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023: https://euro.theforth.net/2023

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gerry Jackson@21:1/5 to Anton Ertl on Thu Jan 4 15:08:29 2024

On 30/12/2023 15:54, Anton Ertl wrote:

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:

\ SORTI (slightly slower than SORTM, slightly faster than SORTI2)

: sorti ( addr u -- )
cells over + 0 -rot begin ( 0 al1 ah1 .. aln ahn )
begin
over cell+ over u>= while
2drop dup 0= if
drop exit then
repeat
partition
again ;

Let's see if we can improve on this with the extended CASE (untested):

: sorti3 ( addr u -- )
cells over + 0 -rot case ( 0 al1 ah1 .. aln ahn )
over cell+ over u< ?of partition contof
2drop dup ?of contof
endcase
drop ;

It looks neater and gets rid of the EXIT in the middle.

But it suffers from me falling into Eaker's trap: ENDCASE already
performs a DROP, so the DROP at the end is wrong:

: sorti3 ( addr u -- )
cells over + 0 -rot case ( 0 al1 ah1 .. aln ahn )
over cell+ over u< ?of partition contof
2drop dup ?of contof
endcase ;

Compare this with using the FOR-EACH construct discussed recently. THE
FOR-EACH definitions are:

: cs-drop-dest ( CS: dest -- ) postpone true postpone until ;

\ A workaround to get the MAP with locals working in GForth
[undefined] assume-live [if] : assume-live ; immediate [then]

: for-each
postpone ahead postpone assume-live postpone begin postpone 0<
postpone if postpone begin 3 cs-roll postpone then
; immediate

synonym do[ while [defined] [SwiftForth] [if] immediate [then]

: ]next
postpone repeat postpone then cs-drop-dest
; immediate

\ |> is the pipeline operator
: |> ( 0 -- ) \ or ( n -- n )
3 cs-pick postpone ?dup postpone 0= postpone until
; immediate

And SORTI3 using FOR-EACH could be:

: sorti3 ( ad u -- )
cells over + 0 -rot
for-each true
do[ over cell+ over u< dup >r if partition then r> |>
2drop dup 0= abs |>
]next drop
;

Untested on a real sort but tested with dummy data and dummy code for
the pipeline operations.

SORTi3 is written like that for comparison with your definition. I would probably write it as:

: sort-partition \ There is probably a better name
over cell+ over u< dup >r if partition then r>
;

: ?done ( ad -- ad 0|1 ) dup 0= abs ;

: sorti3 ( ad u -- )
cells over + 0 -rot
for-each true
do[ sort-partition |> 2drop ?done |> ]next drop
;

The FOR-EACH TRUE is a bit awkward (i.e. no iterator needed) and
supports RUVIM's comment that the the iterator should just be first in
the pipeline. An alternative might be defining FOR[ that omits an
iterator and DO[. i.e. ... FOR[ ... |> ... etc ]NEXT

--
Gerry

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Tue May 7 21:37:57 2024
  from Wales, Uk via Telnet
- Keyop
  Tue May 7 20:20:13 2024
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Tue May 7 15:23:21 2024
  from Wales, Uk via Telnet
- Bob Worm
  Tue May 7 14:12:05 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	52:06:06
Calls:	6,712
Calls today:	5
Files:	12,243
Messages:	5,355,173
Posted today:	1

Quicksort (was: recursion is silly)

Who's Online

Recent Visitors

System Info