Question to Stephen Pelc: After all, your wonderful optimization fairy tale can be carried on. Why not to use single instruction:
MOV RDX, [8*R14+00977830]
instead of two of yours:
MOV RDX, R14
MOV RDX, [8*RDX+00977830]
or even load the target registry rbx right away:
MOV RBX, [8*R14+00977830]
instead of your 3 instructions?
MOV RDX, R14
MOV RDX, [8*RDX+00977830]
MOV RBX, RDX
Question to all of you: Can you rewrite this algorithm so that it becomes Forth's[..]
pride and not Forth's insult?
Maybe you might want to use variables or locals or arrange human comments? You don't have to stay in CORE, but even if you want to stay like Waldek, then
VARIABLE and ( ) are rather in Core, right Waldek? I don't know where this fear
of using variables by Forthers comes from.
Hello Waldek, Stephen and others!
Not so long ago my compatriot professor Waldek Hebisch "has coded" the Fort= >h version of the Heap Sort Algorithm.
http://www.math.uni.wroc.pl/~hebisch/prog/taxi_hs.fs=20
Heap Sort is a beautiful algorithm which learning will give a lot to each o= >f you! You will learn what "binary trees" are, when they are "complete", ho= >w to represent a binary tree as an ordinary array, why this array should be=
indexed from 1, and what relates this binary index to moving through the t=
ree. You will learn what "heap condition" a complete binary tree must satis= >fy in order to be a "heap". Finally, you will learn the surprising non-obvi= >ous thing, why you are creating a heap of arbitrary data in O(n) pessimisti= >c time, which is faster than you thought!
The Heap Sort Algorithm sorts in pessimistic O(n*log n) time which is bette= >r than QuickSort which does it in O(n^2). Admittedly Merge Sort has this ti= >me too, but Heap Sort sorts in place and Merge Sort needs extra second memo= >ry.
: sift..
( rra ra ir ii )
DUP 2 * 1+ ( rra ra ir ii jj )
BEGIN=20
>R OVER R@ SWAP R> OVER OVER ( rra ra ir ii jj ir jj ir jj )
> WHILE ( rra ra ir ii jj ir jj )
1+ > IF ( rra ra ir ii jj )
>R >R OVER R> SWAP R> SWAP OVER ( rra ra ir ii jj ra jj )
CELLS + DUP @ SWAP CELL+ @ < IF ( rra ra ir ii jj )
1 +
THEN
THEN ( rra ra ir ii jj )
>R >R >R OVER OVER R> ROT ROT R> ROT ROT R@
ROT ROT R> ( rra ra ir ii jj rra ra jj )
CELLS OVER + @ ROT ( rra ra ir ii jj ra rr_jj rra )
OVER < IF ( rra ra ir ii jj ra rr_jj )
>R >R OVER CELLS R> + R> SWAP !
SWAP DROP DUP DUP + 1+
ELSE
DROP DROP DROP ( rra ra ir ii )
OVER 1 +
THEN ( rra ra ir ii jj )
REPEAT
DROP DROP DROP SWAP DROP ( rra ra ii )
CELLS + !
;
: heapsort ( ra n )
0 OVER 1 - 2 / DO
OVER OVER OVER ( ra n ra n ra )
I CELLS + @ ROT ROT I ( ra n rra ra n I )
sift
-1 +LOOP
1 - 1 SWAP DO ( ra )
DUP DUP I CELLS + ( ra ra ra+I )
DUP @ ( ra ra ra+I rra )
SWAP ROT ( ra rra ra+I ra )
@ SWAP ! ( ra rra )
I 1 =3D IF
OVER !
ELSE
OVER I 0 ( rra ra I 0 )
sift
THEN
-1 +LOOP
;
As everybody can see the above code is ugly. Alas, this code is even an exa= >mple of why not to use Forth.
But when we use VFX Forth, the above code becomes beautiful.
It's because of Stephen who designed the VFX code generator.
Be sure to disassemble words sift and heapsort to see this beauty!
Hello Waldek, Stephen and others!
Not so long ago my compatriot professor Waldek Hebisch "has coded" the Forth version of the Heap Sort Algorithm.
http://www.math.uni.wroc.pl/~hebisch/prog/taxi_hs.fs
As everybody can see the above code is ugly. Alas, this code is even an example of why not to use Forth. Admit, the code discourages you.
You probably want to see how it works. Let's create an array of 10 numbers and sort it:
create example 13 , 7 , 9 , 12 , 1 , 8 , 2 , 99 , 14 , 3 ,
: print 10 0 do example i cells + @ . loop ;
example 10 heapsort
Note 1: The word heapsort incorrectly leaves an address on the stack. There is no DROP before ; in heapsort.
There is even more junk on the stack after all the rama_taxis is done.
Question to Waldek: But basically your heapsort works so it could be used all over the world. The question of copyright arises. Do you give your heapsort code to the community for free and agree to any use of your heapsort, even without mentioning thatit is you who "has coded" it?
Question to all of you: Can you rewrite this algorithm so that it becomes Forth's pride and not Forth's insult?
Maybe you might want to use variables or locals or arrange human comments?
You don't have to stay in CORE, but even if you want to stay like Waldek, then VARIABLE and ( ) are rather in Core, right Waldek? I don't know where this fear of using variables by Forthers comes from.
But when we use VFX Forth, the above code becomes beautiful.
It's because of Stephen who designed the VFX code generator.
Be sure to disassemble words sift and heapsort to see this beauty!
dasm sift
( 00977880 488BD3 ) MOV RDX, RBX
( 00977883 48D1E3 ) SHL RBX, # 1
( 00977886 48FFC3 ) INC RBX
( 00977889 488D6DF8 ) LEA RBP, [RBP+-08]
( 0097788D 48895500 ) MOV [RBP], RDX
( 00977891 90 ) NOP
( 00977892 90 ) NOP
( 00977893 90 ) NOP
( 00977894 90 ) NOP
( 00977895 90 ) NOP
( 00977896 90 ) NOP
( 00977897 90 ) NOP
( 00977898 53 ) PUSH RBX
( 00977899 488B1C24 ) MOV RBX, [RSP]
( 0097789D 5A ) POP RDX
( 0097789E 483B5508 ) CMP RDX, [RBP+08]
( 009778A2 488D6DF0 ) LEA RBP, [RBP+-10]
( 009778A6 488B4D18 ) MOV RCX, [RBP+18]
( 009778AA 48894D00 ) MOV [RBP], RCX
( 009778AE 48895D08 ) MOV [RBP+08], RBX
( 009778B2 488BDA ) MOV RBX, RDX
( 009778B5 0F8DC1000000 ) JNL/GE 0097797C
( 009778BB 48FFC3 ) INC RBX
( 009778BE 483B5D00 ) CMP RBX, [RBP]
( 009778C2 488B5D08 ) MOV RBX, [RBP+08]
( 009778C6 488D6D10 ) LEA RBP, [RBP+10]
( 009778CA 0F8D2A000000 ) JNL/GE 009778FA
( 009778D0 53 ) PUSH RBX
( 009778D1 48FF7500 ) PUSH QWORD [RBP]
( 009778D5 5B ) POP RBX
( 009778D6 5A ) POP RDX
( 009778D7 488BCA ) MOV RCX, RDX
( 009778DA 48C1E103 ) SHL RCX, # 03
( 009778DE 48034D10 ) ADD RCX, [RBP+10]
( 009778E2 488B01 ) MOV RAX, 0 [RCX]
( 009778E5 483B4108 ) CMP RAX, [RCX+08]
( 009778E9 48895D00 ) MOV [RBP], RBX
( 009778ED 488BDA ) MOV RBX, RDX
( 009778F0 0F8D04000000 ) JNL/GE 009778FA
( 009778F6 4883C301 ) ADD RBX, # 01
( 009778FA 53 ) PUSH RBX
( 009778FB 48FF7500 ) PUSH QWORD [RBP]
( 009778FF 48FF7508 ) PUSH QWORD [RBP+08]
( 00977903 5B ) POP RBX
( 00977904 5A ) POP RDX
( 00977905 488B0C24 ) MOV RCX, [RSP]
( 00977909 58 ) POP RAX
( 0097790A 48C1E003 ) SHL RAX, # 03
( 0097790E 48034510 ) ADD RAX, [RBP+10]
( 00977912 4C8B00 ) MOV R8, 0 [RAX]
( 00977915 4C3B4518 ) CMP R8, [RBP+18]
( 00977919 488D6DF0 ) LEA RBP, [RBP+-10]
( 0097791D 488B4520 ) MOV RAX, [RBP+20]
( 00977921 48894500 ) MOV [RBP], RAX
( 00977925 48894D08 ) MOV [RBP+08], RCX
( 00977929 48895510 ) MOV [RBP+10], RDX
( 0097792D 48895D18 ) MOV [RBP+18], RBX
( 00977931 498BD8 ) MOV RBX, R8
( 00977934 0F8E31000000 ) JLE/NG 0097796B
( 0097793A 53 ) PUSH RBX
( 0097793B 48FF7500 ) PUSH QWORD [RBP]
( 0097793F 488B5D10 ) MOV RBX, [RBP+10]
( 00977943 48C1E303 ) SHL RBX, # 03
( 00977947 5A ) POP RDX
( 00977948 4803DA ) ADD RBX, RDX
( 0097794B 5A ) POP RDX
( 0097794C 488913 ) MOV 0 [RBX], RDX
( 0097794F 488B5D08 ) MOV RBX, [RBP+08]
( 00977953 48035D08 ) ADD RBX, [RBP+08]
( 00977957 48FFC3 ) INC RBX
( 0097795A 488B5508 ) MOV RDX, [RBP+08]
( 0097795E 48895510 ) MOV [RBP+10], RDX
( 00977962 488D6D10 ) LEA RBP, [RBP+10]
( 00977966 E90C000000 ) JMP 00977977
( 0097796B 488B5D18 ) MOV RBX, [RBP+18]
( 0097796F 4883C301 ) ADD RBX, # 01
( 00977973 488D6D10 ) LEA RBP, [RBP+10]
( 00977977 E91CFFFFFF ) JMP 00977898
( 0097797C 488B5D10 ) MOV RBX, [RBP+10]
( 00977980 48C1E303 ) SHL RBX, # 03
( 00977984 48035D20 ) ADD RBX, [RBP+20]
( 00977988 488B5528 ) MOV RDX, [RBP+28]
( 0097798C 488913 ) MOV 0 [RBX], RDX
( 0097798F 488B5D30 ) MOV RBX, [RBP+30]
( 00977993 488D6D38 ) LEA RBP, [RBP+38]
( 00977997 C3 ) RET/NEXT
( 280 bytes, 86 instructions )
Thank you Waldek on behalf of the community and my own for your heapsort code! :-)
I have been working on the code since the morning. I hope not only me, but if not all of you then at least Zombie (Anton's new nickname) is also looking for holes and ways to rewrite the code ;)
So far I have found a non-critical mistake in the implementation of the algorithm. Take it easy, sorting is still working well thanks to a favorable coincidence.
Note 2: The algorithm starts with building the heap. Running from the end through the tree nodes, it tries to restore the heap condition to each node.
Leaf nodes always satisfy the heap condition. So we don't have to fix them. We can omit leaves from the heap-building process. It is why we start the loop with the index of the last parent node.
Remember that in practice unfortunately we index the array from 0, not from 1, as the theory would like.
In our situation, the index of the last parent node is n/2-1. You can check it by drawing on a sheet of paper. Unfortunately, Waldek at the beginning of heapsort calculates it like this:
OVER 1 - 2 /
This means that Waldek calculates (n-1)/2. For example, if we wanted to sort 11 items, n = 11. From the correct formula n/2-1=4 (we use integer arithmetic), from the Waldek formula (n-1)/2=5.
Check on the paper to see who is right. The last parent node has index 4, starting with 0. Node 5 is a leaf.
There should be written 2/ 1- behind OVER at the beginning of the heapsort word in the code (by the way, both 2/ 1- words are in Core):
OVER 2/ 1-
Despite this mistake, the Waldek's heapsort code works, but sometimes it does too much unnecessarily.
Meanwhile, while waiting for more results...
I found Marcel Hendrix's post in another thread from 16 years ago. It shows what Marcel is happy about, and with fond memories, allows you to see how our community made progress ;)
The discussion was about the "forth Insertion Sort". Until the end of my post there is a quote from Marcel:
Hello Waldek, Stephen and others!
Not so long ago my compatriot professor Waldek Hebisch "has coded" the
Forth version of the Heap Sort Algorithm.
http://www.math.uni.wroc.pl/~hebisch/prog/taxi_hs.fs
Heap Sort is a beautiful algorithm which learning will give a lot to each
of you! You will learn what "binary trees" are, when they are "complete",
how to represent a binary tree as an ordinary array, why this array should
be indexed from 1, and what relates this binary index to moving through
the tree. You will learn what "heap condition" a complete binary tree must >satisfy in order to be a "heap". Finally, you will learn the surprising >non-obvious thing, why you are creating a heap of arbitrary data in O(n) >pessimistic time, which is faster than you thought!
The Heap Sort Algorithm sorts in pessimistic O(n*log n) time which is
better than QuickSort which does it in O(n^2). Admittedly Merge Sort has
this time too, but Heap Sort sorts in place and Merge Sort needs extra
second memory.
its worst case O(log(n)) memory for keeping track.
Note that quicksort needs worst case O(n) for this, which can be
diminished heuristically -not guaranteed- to O(log(n)).
albert@cherry.(none) (albert) writes:
its worst case O(log(n)) memory for keeping track.
Note that quicksort needs worst case O(n) for this, which can be
diminished heuristically -not guaranteed- to O(log(n)).
Doesn't quicksort guarantee O(log(n)) if you always sort the smaller >partition first after pivoting?
To quote my numerical math professor:
In a pathological case the smaller partition can be O(1) size.
That makes for O(n) steps.
Hello Waldek, I have been still analyzing your heapsort code.
I must praise you for how you swap the values ??in the nodes of the subtree in the sift word.
I really like how you delay saving the subtree's root value until you reach the target location :-)
It looks like you are probably proud of your sift word and paid less attention to the heapsort word ;)
Your sift word uses 4 parametrs which you named as follows:
rra is the value of the ii-th array element,
ra is the address of the array,
ir is the total number of elements in the array,
ii is the index of the subtree's root node, which is where we want to restore the heap condition.
You also use the notation jj for the index of the left child of node ii.
Note 3: Your word sift controls index ranges very well and also works well for a single-element array, when invoked with the parameters ir = 1 and ii = 0, right?
In the sorting loop in the heapsort word near the end of the code you put the following condition IF:
I 1 = IF
OVER !
ELSE
OVER I 0 ( rra ra I 0 )
sift
THEN
This IF condition is tested n-1 times in the loop. You do these costly tests just to distinguish the case of I = 1.
But this is not needed because the case I = 1 is handled as well as the rest of the cases by sift.
If you wanted to get faster acceleration of this one case, it is definitely not worth doing it for the price of n-1 checks.
But perhaps the real reason for highlighting this case was some concern about the execution of this edge case.
Maybe when you started writing the sift word, you didn't know how it would end, and the sift evolved - you applied optimizations.
Please correct me if I am wrong, but all this IF is not needed and all you do after ELSE is enough.
R >R[
This is quite a valuable technique. It is an alternative to calling c-routines
from Forth.
I checked the Mouse (Waldek's new nickname) information, how good it looks when compiling its code in C. On my Fedora
Linux 36 (which is known to be very convenient for programmers because it has the latest software versions installed out
of the box) I have installed:
gcc (GCC) 12.1.1 20220507 (Red Hat 12.1.1-1)
I got similarly good results as Mouse. Although gcc achieves excellent optimization results, they are still far from perfect.
gcc didn't really show off when we replace jj = ir with break in sift, which should have helped gcc, but destroyed the
optimization of the entire code ;(
I have a gift for all of you: Mouse's Ugly Heapsort written in x64 assembly language that works in all native 64-bit Forths.
It is supposed to be the best, because the fastest heapsort ever created. We define the best one in such a way that if,
after the presentation of the code and your criticism (which the whole world counts on), no one can speed up or improve
this code, the code is practically the best.
Since this code will be used by future generations, long after we are gone, try to improve it to leave something for the
world after our vacation this year. Of course, all this code is intended for free use by anyone without even mentioning
the authors.
First, I will describe the programming technique used.
Since the code is supposed to work on every native 64-bit Forth, we cannot use the built-in assembler because this one is
different in every Forth.
We will use an external assembler which will generate a binary code that can be inserted into any 64-bit native Forth.
In this case, I used NASM that you can install on Fedora with the command: sudo dnf install nasm
Source files in NASM are very easy to write, just remember to put one directive at the beginning of the file: BITS 64
The default NASM format is binary (plain code, no extra data), so we don't need to specify the format when assembling. We
can add the generation of a so-called listing file, which can help us make choices about the use of certain instructions.
We will be dealing with three files written in assembly language: heapsort.asm, prolog.asm, epilog.asm. We assemble these
files with the following commands:
nasm -l heapsort.lst heapsort.asm
nasm -l prolog.lst prolog.asm
nasm -l epilog.lst epilog.asm
On Tuesday, August 23, 2022 at 9:32:56 PM UTC+2, none albert wrote:
[..]
This is quite a valuable technique. It is an alternative to calling c-routines
from Forth.
Hardly. I am not going to write, e.g., the Faddeev-Leverrier algorithm
in assembler.
-marcel
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 42:00:23 |
Calls: | 6,708 |
Calls today: | 1 |
Files: | 12,243 |
Messages: | 5,353,929 |