This is consistent with my observations. There is typically
a speed difference of around 10 per cent between code variants
with stack juggling and with locals. The difference is irrelevant
for the vast majority of tasks.
The gain in readability, on the other hand, is often enormous.
But that's another old, worn-out discussion.
With locals in CPU registers, I expect an even smaller speed
difference. However, I have not yet implemented and tested this.
On 10/04/2024 6:54 am, mhx wrote:[..]
minforth wrote:
VFX doesn't put much effort into optimizing locals code. Should they have? Or would that have encouraged users to write code that lacks thought and leave it to the compiler to make efficient? Which is the C approach.
On 10/04/2024 3:25 pm, mhx wrote:
dxf wrote:
Do you have a disassembly for Fib1? I ask because NT/Forth which purports
to do exactly that produces notably different results - despite there being little 'stack juggling' to resolve.
With locals in CPU registers, I expect an even smaller speed
difference. However, I have not yet implemented and tested this.
r 1->0 third 1->2 >l >l 1->1 dup 1->1mov -$08[r14],r8 mov r15,$10[r13] >l mov $00[r13],r8
2->1 add r14,$08 mov rax,rbp mov rbx,[r14]mov -$08[r13],r15 mov rax,[rbx] lea rbp,-$08[rbp] add r14,$08
I wouldn't know what those authors think, believe, or try to enforce.
There has been no effort spent in optimizing iForth's locals. The shown result is simply a byproduct of the architecture of the compiler.
It is not my task to guide the users, they can do whatever they ask.
I believe my actions are consistent with that.
Locals are completely OK if you want to get stuff done. I use them for
the boring parts.
On 10/04/2024 5:34 pm, minforth wrote:
mhx wrote:
Locals are completely OK if you want to get stuff done. I use them for...
the boring parts.
I think the attitude of "programming to make the compiler happy i.e.
by stack juggling" is a waste of human resources. I can't remember a
single case where a performance bottleneck was fixed by switching
from a local to a non-local code formulation. The difference in speed,
if any, is simply too small.
To use forth's stack until such time as it becomes too much for one
suggests a half-heartedness. It begs the question why use forth at all
if one is merely going to toy with it.
: 3dup.2 ( a b c -- a b c a b c ) 2 pick 2 pick 2 pick ;...
: 3dup.4 ( a b c -- a b c a b c ) dup 2over rot ;
For comparison, here's what gforth-fast currently gives you:
3dup.2 3dup.4
third 1->2 dup 1->1
mov r15,$10[r13] mov $00[r13],r8
third 2->3 sub r13,$08
mov r9,$08[r13] 2over 1->3
third 3->1 mov r15,$18[r13]
mov $00[r13],r8 mov r9,$10[r13]
sub r13,$18 rot 3->1
mov $10[r13],r15 mov $00[r13],r15
mov $08[r13],r9 sub r13,$10
;s 1->1 mov $08[r13],r9
mov rbx,[r14] ;s 1->1
add r14,$08 mov rbx,[r14]
mov rax,[rbx] add r14,$08
jmp eax mov rax,[rbx]
jmp eax
Interestingly, 3DUP.2 is essentially the same as the lxf/ntf variant:
load two items from the memory parts of the stack, store three items
to the memory part of the stack, and one stack-pointer update. The >differences are:
64-bit vs. 32-bit cells
different way of returning to the caller (;S code vs. ret)
sub vs. lea for the stack-pointer update
different order of loads and stores
different register allocation
But this is just a happy coincidence, and the other versions make it
clear that gforth-fast does not analyse the data flow even in
straight-line code.
However, it's more difficult when control flow is involved, and most native-code compilers register-allocate only in straight-line code.[..]
For comparison, here's what gforth-fast currently gives you:
Interestingly, 3DUP.2 is essentially the same as the lxf/ntf variant:..
load two items from the memory parts of the stack, store three items
to the memory part of the stack, and one stack-pointer update. The differences are:
The local version loses track of constants.
On 10/04/2024 8:38 pm, minforth wrote:
dxf wrote:
On 10/04/2024 5:34 pm, minforth wrote:
mhx wrote:
Locals are completely OK if you want to get stuff done. I use them for >>>>> the boring parts....
I think the attitude of "programming to make the compiler happy i.e.
by stack juggling" is a waste of human resources. I can't remember a
single case where a performance bottleneck was fixed by switching
from a local to a non-local code formulation. The difference in speed, >>>> if any, is simply too small.
To use forth's stack until such time as it becomes too much for one
suggests a half-heartedness. It begs the question why use forth at all >>> if one is merely going to toy with it.
Yes, of course it's entirely up to you if you don't fully utilise the
available possibilities of your tools.
In calling locals 'a tool' or 'used for the boring stuff' I'd have the >problem of explaining to colleagues how it is I can use a language in a
way that its creator has dismissed as antithetical.
But I also suspect that your Forth
system has no locals at all.
DX-Forth offers locals as a loadable option. They're implemented
efficiently so as to be credible. I'm not aware of any user that has
availed themselves of it. Indeed I find they come to forth looking
for something that's honestly unique and challenging. To discover
Moore still resolute after all these years provides their role model.
mhx wrote:
The local version loses track of constants.
Interesting aspect. Within the test words all 3dup variants had been
inlined, I assume.
However inlining semanthically identic words with locals is
"not commutative", to grossly abuse the algebraic expression. Hmm...
minforth@gmx.net (minforth) writes:
mhx wrote:
The local version loses track of constants.
Interesting aspect. Within the test words all 3dup variants had been >>inlined, I assume.
However inlining semanthically identic words with locals is
"not commutative", to grossly abuse the algebraic expression. Hmm...
I think mathematicians have a word for what you mean, but it's not >"commutative".
Anyway, it's not specific to locals. Everything that loses
information will affect everything that comes afterwards. E.g., in
Gforth we have a literal stack that only represents literals on the
data stack. So anything that moves values from the data stack (e.g.,
to the return stack) will lose the information about the constant.
- anton--
To use forth's stack until such time as it becomes too much for one
suggests a half-heartedness. It begs the question why use forth at all
if one is merely going to toy with it.
Moore and Fox discussed that. Making a non-optimal approach more
efficient isn't the answer.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (3 / 13) |
Uptime: | 45:15:02 |
Calls: | 6,710 |
Calls today: | 3 |
Files: | 12,243 |
Messages: | 5,354,233 |
Posted today: | 1 |