anton@mips.complang.tuwien.ac.at (Anton Ertl) writes: >
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
\ SORTI (slightly slower than SORTM, slightly faster than SORTI2)
: sorti ( addr u -- )
cells over + 0 -rot begin ( 0 al1 ah1 .. aln ahn )
begin
over cell+ over u>= while
2drop dup 0= if
drop exit then
repeat
partition
again ;
Let's see if we can improve on this with the extended CASE (untested):
: sorti3 ( addr u -- )
cells over + 0 -rot case ( 0 al1 ah1 .. aln ahn )
over cell+ over u< ?of partition contof
2drop dup ?of contof
endcase
drop ;
It looks neater and gets rid of the EXIT in the middle.
But it suffers from me falling into Eaker's trap: ENDCASE already
performs a DROP, so the DROP at the end is wrong:
: sorti3 ( addr u -- )
cells over + 0 -rot case ( 0 al1 ah1 .. aln ahn )
over cell+ over u< ?of partition contof
2drop dup ?of contof
endcase ;
Alternatively, one could write
: sorti3a ( addr u -- )
cells over + 0 -rot case ( 0 al1 ah1 .. aln ahn )
over cell+ over u< ?of partition contof
2drop dup 0= ?of drop endof
next-case ;
Let's see where we get with the cleaner SORTI2 (based on work by
Albert van der Horst):
: sorti2 ( addr u -- )
cells over + 0 -rot begin ( 0 al1 ah1 .. aln ahn )
over cell+ over u< if
partition
else
2drop
then
dup 0= until
drop ;
I guess a reason why this is slower than SORTI and SORTI3 is that it
also checks for the end after the call to PARTITION. Let's eliminate
that:
: sorti4 ( addr u -- )
cells over + 0 -rot begin ( 0 al1 ah1 .. aln ahn )
begin
over cell+ over u< while
partition
repeat
2drop
dup 0= until
drop ;
Let's see the results (on the 3800MHz Golden/Raptor Cove core of a
Core i3-1315U):
vfx64
m-a m2 m r i i2 i4 d
2.30 2.26 2.27 2.30 2.26 2.27 2.26 2.33 4000 * 10000
2.74 2.72 2.71 2.74 2.73 2.72 2.72 2.78 40000 * 1000
3.19 3.15 3.15 3.17 3.16 3.16 3.15 3.23 400000 * 100
3.64 3.59 3.60 3.62 3.62 3.61 3.61 3.68 4000000 * 10
4.10 4.05 4.08 4.09 4.07 4.05 4.08 4.14 40000000 * 1
gforth-fast
m-a m2 m r i i2 i3 i4 d
3.27 3.17 3.17 3.28 3.20 3.23 3.17 3.19 3.25 4000 * 10000
3.93 3.82 3.83 3.93 3.86 3.90 3.83 3.84 3.94 40000 * 1000
4.57 4.45 4.47 4.58 4.49 4.53 4.47 4.48 4.59 400000 * 100
5.21 5.10 5.10 5.21 5.12 5.16 5.11 5.13 5.22 4000000 * 10
5.88 5.79 5.78 5.89 5.81 5.87 5.76 5.80 5.89 40000000 * 1
The numbers are times in seconds; 400000 * 100 means that the
benchmark has sorted 100 random integer arrays of 400000 elements
each. The column headers have the following meaning:
m: recursive, but convert the tail-recursion into a loop
m-a: m, but arrange to process the smaller partition first
m2: m, but arrange partitions until they are smaller than 256KB
r: recursive on both partitions
i: sorti above
i2: sorti2 above
i3: sorti3 above (not measured on VFX)
i4: sorti4 above
d: like sorti2, but uses DEPTH for terminating the loop
We see that the arrangement overhead of m-a costs too much if it is
done at all levels. Doing it only for large partitions (m2) seems to
help a little for large arrays (but I expected more).
The other measurements don't arrange the partitions and are therefore
more comparable to each other. d is clearly slower than the others.
r is clearly slower on gforth-fast and slightly slower on VFX64. m,
i, i2 and i4 have around the same performance on VFX64. On
gforth-fast, m and i3 seem to have the same performance, and i, i2,
and i4 have slight differences, with the order being m/i2, i4, i, i2
(from fastest to slowest). These differences are so small that we can
choose among them by preference.
For comparison, here are the measurements from <
2013Aug15.191549@mips.complang.tuwien.ac.at> on a 3GHz Core 2 Duo
E8400:
vfxlin
m-a m r i i2 d
0.70 0.68 0.69 0.68 0.69 0.80 1000 * 10000
0.84 0.83 0.84 0.84 0.85 0.95 10000 * 1000
1.00 0.98 1.00 1.00 1.01 1.12 100000 * 100
1.15 1.14 1.14 1.15 1.16 1.27 1000000 * 10
1.27 1.28 1.28 1.29 1.30 1.43 10000000 * 1
gforth-fast 64-bit
m-a m r i i2 d
2.22 2.10 2.16 2.12 2.13 2.53 1000 * 10000
2.73 2.61 2.66 2.62 2.64 3.03 10000 * 1000
3.22 3.11 3.16 3.12 3.14 3.53 100000 * 100
3.72 3.61 3.66 3.62 3.63 4.03 1000000 * 10
4.26 4.14 4.21 4.14 4.18 4.57 10000000 * 1
- anton
--
M. Anton Ertl
http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs:
http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard:
https://forth-standard.org/
EuroForth 2023:
https://euro.theforth.net/2023
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)