Forum: >>> Magnum BBS <<<

Pushing and Popping FP numbers to/from Return Stack

From Krishna Myneni@21:1/5 to All on Sun May 21 18:05:28 2023

On systems with a separate floating poing stack (Forth 200x standard
systems), one disadvantage of the separate fp stack is the occasional
need to push and pop floating point numbers off the fp stack. Temporary
fp variables are okay for simple solutions, but are restrictive compared
to having a stack onto which the fp numbers are pushed and popped.

On Forth systems which provide FP@ and FP! for getting and setting the
floating point stack pointer, and RP@ and RP! for the return stack
pointer, the words F>R and FR> may be defined in source as shown below.

Assumptions:

1. The width of the floating point stack, in bytes, is given by 1 FLOATS
2. The width above does not need to be a multiple of 1 CELLS
3. The floating point stack is contiguous.

The following code works in kForth-64 and in Gforth.

1 floats constant fpsize
fpsize 1 cells /mod swap [IF] 1+ [THEN] cells constant fpcells

\ From Wil Baden's toolkit
: :inline ( "name <char> ccc<char>" -- )
: [CHAR] ; PARSE POSTPONE SLITERAL POSTPONE EVALUATE
POSTPONE ; IMMEDIATE ;

:inline f>r ( F: r -- ) ( r: -- i*x ) fp@ rp@ fpcells - 2dup fpsize move
rp! fpsize + fp! ;
:inline fr> ( F: -- r ) ( r: i*x -- ) rp@ fp@ fpsize - 2dup fpsize move
fp! fpcells + rp! ;

\ Test code

: test1 f>r fr> ;
3.214e test1 \ F: 3.214e

: test2 fdup f>r 1e f- f>r fr> fr> ;
1.23e test2 \ F: 2.3e-1 1.23e0

--
Krishna Myneni

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From dxforth@21:1/5 to Krishna Myneni on Mon May 22 19:57:30 2023

On 22/05/2023 9:05 am, Krishna Myneni wrote:

On systems with a separate floating poing stack (Forth 200x standard systems), one disadvantage of the separate fp stack is the occasional need to push and pop floating point numbers off the fp stack. Temporary fp variables are okay for simple

solutions, but are restrictive compared to having a stack onto which the fp numbers are pushed and popped.

I'd be interested in the case for having F>R FR> should anyone care to make it (?)
I very rarely use fp but there was an instance in implementing fp on my system where F>R FR> would have come in handy. In the end I used a hidden variable but
it did make me wonder. Implementation in my case would be a trivial code definition.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@21:1/5 to Krishna Myneni on Mon May 22 20:48:50 2023

Krishna Myneni schrieb am Montag, 22. Mai 2023 um 01:05:33 UTC+2:

On systems with a separate floating poing stack (Forth 200x standard systems), one disadvantage of the separate fp stack is the occasional
need to push and pop floating point numbers off the fp stack. Temporary
fp variables are okay for simple solutions, but are restrictive compared
to having a stack onto which the fp numbers are pushed and popped.

On Forth systems which provide FP@ and FP! for getting and setting the floating point stack pointer, and RP@ and RP! for the return stack
pointer, the words F>R and FR> may be defined in source as shown below.

Assumptions:

1. The width of the floating point stack, in bytes, is given by 1 FLOATS
2. The width above does not need to be a multiple of 1 CELLS
3. The floating point stack is contiguous.

The following code works in kForth-64 and in Gforth.

1 floats constant fpsize
fpsize 1 cells /mod swap [IF] 1+ [THEN] cells constant fpcells

\ From Wil Baden's toolkit
: :inline ( "name <char> ccc<char>" -- )
: [CHAR] ; PARSE POSTPONE SLITERAL POSTPONE EVALUATE
POSTPONE ; IMMEDIATE ;

:inline f>r ( F: r -- ) ( r: -- i*x ) fp@ rp@ fpcells - 2dup fpsize move
rp! fpsize + fp! ;
:inline fr> ( F: -- r ) ( r: i*x -- ) rp@ fp@ fpsize - 2dup fpsize move
fp! fpcells + rp! ;

\ Test code

: test1 f>r fr> ;
3.214e test1 \ F: 3.214e

: test2 fdup f>r 1e f- f>r fr> fr> ;
1.23e test2 \ F: 2.3e-1 1.23e0

FP-alignment was obviously no issue here.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to dxforth on Tue May 23 21:38:17 2023

On 5/22/23 04:57, dxforth wrote:

On 22/05/2023 9:05 am, Krishna Myneni wrote:

On systems with a separate floating poing stack (Forth 200x standard systems), one disadvantage of the separate fp stack is the occasional need to push and pop floating point numbers off the fp stack. Temporary fp variables are okay for simple

solutions, but are restrictive compared to having a stack onto which the fp numbers are pushed and popped.

I'd be interested in the case for having F>R FR> should anyone care to make it (?)
I very rarely use fp but there was an instance in implementing fp on my system
where F>R FR> would have come in handy. In the end I used a hidden variable but
it did make me wonder. Implementation in my case would be a trivial code definition.

I think F>R and FR> will be useful as primitives. I plan to implement
them as such in kForth-64 (and in kForth-32 ver 3.x). The purpose of the
source code versions, for systems with return stack and fp stack pointer
words, is to exercise them a bit before baking them into the system as efficient words.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Krishna Myneni@21:1/5 to minforth on Tue May 23 21:49:04 2023

On 5/22/23 22:48, minforth wrote:

Krishna Myneni schrieb am Montag, 22. Mai 2023 um 01:05:33 UTC+2:

On systems with a separate floating poing stack (Forth 200x standard
systems), one disadvantage of the separate fp stack is the occasional
need to push and pop floating point numbers off the fp stack. Temporary
fp variables are okay for simple solutions, but are restrictive compared
to having a stack onto which the fp numbers are pushed and popped.

On Forth systems which provide FP@ and FP! for getting and setting the
floating point stack pointer, and RP@ and RP! for the return stack
pointer, the words F>R and FR> may be defined in source as shown below.

Assumptions:

1. The width of the floating point stack, in bytes, is given by 1 FLOATS
2. The width above does not need to be a multiple of 1 CELLS
3. The floating point stack is contiguous.

The following code works in kForth-64 and in Gforth.

1 floats constant fpsize
fpsize 1 cells /mod swap [IF] 1+ [THEN] cells constant fpcells

\ From Wil Baden's toolkit
: :inline ( "name <char> ccc<char>" -- )
: [CHAR] ; PARSE POSTPONE SLITERAL POSTPONE EVALUATE
POSTPONE ; IMMEDIATE ;

:inline f>r ( F: r -- ) ( r: -- i*x ) fp@ rp@ fpcells - 2dup fpsize move
rp! fpsize + fp! ;
:inline fr> ( F: -- r ) ( r: i*x -- ) rp@ fp@ fpsize - 2dup fpsize move
fp! fpcells + rp! ;

\ Test code

: test1 f>r fr> ;
3.214e test1 \ F: 3.214e

: test2 fdup f>r 1e f- f>r fr> fr> ;
1.23e test2 \ F: 2.3e-1 1.23e0

FP-alignment was obviously no issue here.

Maybe more generally we need to use CMOVE rather than MOVE to avoid the
issue of alignment. On Intel it will work as implemented above.

--
Krishna

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From dxforth@21:1/5 to Krishna Myneni on Wed May 24 16:44:11 2023

On 24/05/2023 12:49 pm, Krishna Myneni wrote:

On 5/22/23 22:48, minforth wrote:

...
FP-alignment was obviously no issue here.

Maybe more generally we need to use CMOVE rather than MOVE to avoid the issue of alignment. On Intel it will work as implemented above.

As a matter of interest are there any fp implementations that require better than
cell aligned addresses?

With regard to F>R FR> implementation, using F! and F@ should prove simpler and more portable. (I can't believe I said that :)

I misremembered the circumstances under which F>R FR> would have been beneficial
to me, so for now the routines will go into a library. The DTC definitions were:

https://pastebin.com/sSuUBKT2

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to dxforth on Wed May 24 07:48:19 2023

dxforth <dxforth@gmail.com> writes:

As a matter of interest are there any fp implementations that require better than
cell aligned addresses?

32-bit Gforth on hardware that requires 8-byte alignment of DP FP
numbers, e.g., MIPS, SPARC or some ARM implementations. That's why
Gforth does not have F>R FR>.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Krishna Myneni on Wed May 24 08:10:26 2023

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

Maybe more generally we need to use CMOVE rather than MOVE to avoid the
issue of alignment. On Intel it will work as implemented above.

No. MOVE has no alignment requirements, CMOVE requires
character-aligned addresses (although on pretty much all systems that
is only a theoretical concern, because 1 char = 1 on them).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to dxforth@gmail.com on Wed May 24 10:57:57 2023

In article <u4kbnp$2sufu$1@dont-email.me>, dxforth <dxforth@gmail.com> wrote: >On 24/05/2023 12:49 pm, Krishna Myneni wrote:

On 5/22/23 22:48, minforth wrote:

...
FP-alignment was obviously no issue here.

Maybe more generally we need to use CMOVE rather than MOVE to avoid

the issue of alignment. On Intel it will work as implemented above.

As a matter of interest are there any fp implementations that require
better than
cell aligned addresses?

You mean 64 bit floats on a 32 bit system? Sure.
ciforth doesn't bother to spoil the precision of 80 bit Intel stack
floats. Storing those in memory costs 10 bytes.

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Krishna Myneni on Wed May 24 08:35:07 2023

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

On systems with a separate floating poing stack (Forth 200x standard >systems), one disadvantage of the separate fp stack is the occasional
need to push and pop floating point numbers off the fp stack. Temporary
fp variables are okay for simple solutions, but are restrictive compared
to having a stack onto which the fp numbers are pushed and popped.

On Forth systems which provide FP@ and FP! for getting and setting the >floating point stack pointer, and RP@ and RP! for the return stack
pointer, the words F>R and FR> may be defined in source as shown below.

Assumptions:

1. The width of the floating point stack, in bytes, is given by 1 FLOATS
2. The width above does not need to be a multiple of 1 CELLS
3. The floating point stack is contiguous.

The following code works in kForth-64 and in Gforth.

1 floats constant fpsize
fpsize 1 cells /mod swap [IF] 1+ [THEN] cells constant fpcells

\ From Wil Baden's toolkit
: :inline ( "name <char> ccc<char>" -- )
: [CHAR] ; PARSE POSTPONE SLITERAL POSTPONE EVALUATE
POSTPONE ; IMMEDIATE ;

:inline f>r ( F: r -- ) ( r: -- i*x ) fp@ rp@ fpcells - 2dup fpsize move
rp! fpsize + fp! ;
:inline fr> ( F: -- r ) ( r: i*x -- ) rp@ fp@ fpsize - 2dup fpsize move
fp! fpcells + rp! ;

This suffers from the usual pitfalls of EVALUATE-based macros <http://www.complang.tuwien.ac.at/forth/why-evaluate-is-bad>.

This assumes that the stacks grow towards lower addresses. That's the
case for most Forth systems.

It also assumes that the top-of-FP-stack and top of return stack is in
memory rather than in a register (maybe that's subsumed by assumption
3); that assumption does not hold for gforth-fast, but FP@ and FP! try
to support code that makes that assumption: FP@ stores the FTOS to
memory, and FP! stores the FTOS register into memory before changing
FP and loads it from memory afterwards; but you need to be careful
when to perform FP@, FP! and other words that involve FTOS. The usage
in the code above is fine.

It also assumes that RP is cell-aligned after subtracting or adding 1
FLOATS, or that RP does not require cell alignment; this assumption is
true for all Forth systems I am aware of: Either 1 FLOATS is a
multiple of the cell size (e.g., all Gforth ports), or the system does
not require alignment (e.g., systems on IA-32/AMD64 that use 10-byte
floats).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to Anton Ertl on Wed May 24 11:02:41 2023

In article <2023May24.094819@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

dxforth <dxforth@gmail.com> writes:

As a matter of interest are there any fp implementations that require

better than

cell aligned addresses?

32-bit Gforth on hardware that requires 8-byte alignment of DP FP
numbers, e.g., MIPS, SPARC or some ARM implementations. That's why
Gforth does not have F>R FR>.

I guess a pinch of ingenuity solves this. For example the requirement
that MS-Windows 64 bit require 16 bit alignment for parameters
is easy to fullfil (once you start thinking about it).

- anton

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to Anton Ertl on Wed May 24 11:05:46 2023

In article <2023May24.101026@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

Maybe more generally we need to use CMOVE rather than MOVE to avoid the >>issue of alignment. On Intel it will work as implemented above.

No. MOVE has no alignment requirements, CMOVE requires
character-aligned addresses (although on pretty much all systems that
is only a theoretical concern, because 1 char = 1 on them).

One of this requires memory propagation, and the other is intelligent, i.e. moves up or down with overlapping regions. That is the most important distinction. It is hard to remember which is which though.

- anton

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From dxforth@21:1/5 to Anton Ertl on Wed May 24 18:23:03 2023

On 24/05/2023 5:48 pm, Anton Ertl wrote:

...
That's why
Gforth does not have F>R FR>.

So not cell-aligned but what excludes those functions?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From dxforth@21:1/5 to albert on Wed May 24 19:42:05 2023

On 24/05/2023 7:05 pm, albert wrote:

In article <2023May24.101026@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

Krishna Myneni <krishna.myneni@ccreweb.org> writes:

Maybe more generally we need to use CMOVE rather than MOVE to avoid the
issue of alignment. On Intel it will work as implemented above.

No. MOVE has no alignment requirements, CMOVE requires
character-aligned addresses (although on pretty much all systems that
is only a theoretical concern, because 1 char = 1 on them).

One of this requires memory propagation, and the other is intelligent, i.e. moves up or down with overlapping regions. That is the most important distinction. It is hard to remember which is which though.

Historically CMOVE did characters and MOVE did cells. ANS and 200x messed
with the latter, each in its own way.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to dxforth on Wed May 24 16:23:25 2023

dxforth <dxforth@gmail.com> writes:

On 24/05/2023 5:48 pm, Anton Ertl wrote:

...
That's why
Gforth does not have F>R FR>.

So not cell-aligned but what excludes those functions?

R would be something like (on MIPS):

addiu $rp, $rp, -8
sd $ftos, 0($rp)
ld $ftos, 0($fp)
addiu $fp, $fp, 8
NEXT

But rp is not necessarily 8-byte-aligned, and if not, this produces a
SIGBUS. An implementation that does not have this problem would
require significantly more instructions, and is slower than one might
expect. By contrast, the locals stack pointer is always aligned for
cells and floats, so you don't run into this problem with FP locals.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bernd Linsel@21:1/5 to Anton Ertl on Wed May 24 19:23:50 2023

On 24.05.2023 18:23, Anton Ertl wrote:

dxforth <dxforth@gmail.com> writes:

On 24/05/2023 5:48 pm, Anton Ertl wrote:

...
That's why
Gforth does not have F>R FR>.

So not cell-aligned but what excludes those functions?

R would be something like (on MIPS):

addiu $rp, $rp, -8
sd $ftos, 0($rp)
ld $ftos, 0($fp)
addiu $fp, $fp, 8
NEXT

But rp is not necessarily 8-byte-aligned, and if not, this produces a
SIGBUS. An implementation that does not have this problem would
require significantly more instructions, and is slower than one might
expect. By contrast, the locals stack pointer is always aligned for
cells and floats, so you don't run into this problem with FP locals.

- anton

remedy ($tmp is placeholder for a free scratch register):
Consume three words on the return stack and 8-align ftos within these.

FtoR: addiu $rp, $rp, -12
addiu $tmp, $rp, 7
ins $tmp, $zero, 0, 3 # up-align tmp to multiple of 8
sd $ftos, 0($tmp)
ld $ftos, 0($fp)
addiu $fp, $fp, 8
NEXT

FRfrom: addiu $fp, $fp, -8
sd $ftos, 8($fp)
addiu $tmp, $rp, 7
ins $tmp, $zero, 0, 3 # up-align tmp to multiple of 8
ld $ftos, 0($tmp)
addiu $rp, $rp, 12
NEXT

Kind regards
--
Bernd Linsel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Bernd Linsel on Wed May 24 17:36:23 2023

Bernd Linsel <bl1-thispartdoesnotbelonghere@gmx.com> writes:

On 24.05.2023 18:23, Anton Ertl wrote:

dxforth <dxforth@gmail.com> writes:

On 24/05/2023 5:48 pm, Anton Ertl wrote:

...
That's why
Gforth does not have F>R FR>.

So not cell-aligned but what excludes those functions?

R would be something like (on MIPS):

addiu $rp, $rp, -8
sd $ftos, 0($rp)
ld $ftos, 0($fp)
addiu $fp, $fp, 8
NEXT

But rp is not necessarily 8-byte-aligned, and if not, this produces a
SIGBUS. An implementation that does not have this problem would
require significantly more instructions, and is slower than one might
expect. By contrast, the locals stack pointer is always aligned for
cells and floats, so you don't run into this problem with FP locals.

- anton

remedy ($tmp is placeholder for a free scratch register):
Consume three words on the return stack and 8-align ftos within these.

FtoR: addiu $rp, $rp, -12
addiu $tmp, $rp, 7
ins $tmp, $zero, 0, 3 # up-align tmp to multiple of 8
sd $ftos, 0($tmp)
ld $ftos, 0($fp)
addiu $fp, $fp, 8
NEXT

FRfrom: addiu $fp, $fp, -8
sd $ftos, 8($fp)
addiu $tmp, $rp, 7
ins $tmp, $zero, 0, 3 # up-align tmp to multiple of 8
ld $ftos, 0($tmp)
addiu $rp, $rp, 12
NEXT

Clever! Now for MIPS I without the "ins" instruction. Maybe
something like:

R:

ori $tmp, $rp, 7
addiu $rp, $rp, -12
sd $ftos, -15($tmp)
ld $ftos, 0($fp)
addiu $fp, $fp, 8
NEXT

Not sure if it I got the details right, but with some tweaks the
general idea should be workable. And likewise for FR>.

The end result would be that F>R consumes 12 bytes and an extra
instruction on the return stack on every target. Better that what I
had in mind, but still worse than F>L.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bernd Linsel@21:1/5 to Anton Ertl on Wed May 24 22:28:59 2023

On 24.05.2023 19:36, Anton Ertl wrote:

Bernd Linsel <bl1-thispartdoesnotbelonghere@gmx.com> writes:

On 24.05.2023 18:23, Anton Ertl wrote:

dxforth <dxforth@gmail.com> writes:

On 24/05/2023 5:48 pm, Anton Ertl wrote:

...
That's why
Gforth does not have F>R FR>.

So not cell-aligned but what excludes those functions?

R would be something like (on MIPS):

addiu $rp, $rp, -8
sd $ftos, 0($rp)
ld $ftos, 0($fp)
addiu $fp, $fp, 8
NEXT

But rp is not necessarily 8-byte-aligned, and if not, this produces a
SIGBUS. An implementation that does not have this problem would
require significantly more instructions, and is slower than one might
expect. By contrast, the locals stack pointer is always aligned for
cells and floats, so you don't run into this problem with FP locals.

- anton

remedy ($tmp is placeholder for a free scratch register):
Consume three words on the return stack and 8-align ftos within these.

FtoR: addiu $rp, $rp, -12
addiu $tmp, $rp, 7
ins $tmp, $zero, 0, 3 # up-align tmp to multiple of 8
sd $ftos, 0($tmp)
ld $ftos, 0($fp)
addiu $fp, $fp, 8
NEXT

FRfrom: addiu $fp, $fp, -8
sd $ftos, 8($fp)
addiu $tmp, $rp, 7
ins $tmp, $zero, 0, 3 # up-align tmp to multiple of 8
ld $ftos, 0($tmp)
addiu $rp, $rp, 12
NEXT

Clever! Now for MIPS I without the "ins" instruction. Maybe
something like:

R:

ori $tmp, $rp, 7
addiu $rp, $rp, -12
sd $ftos, -15($tmp)
ld $ftos, 0($fp)
addiu $fp, $fp, 8
NEXT

Not sure if it I got the details right, but with some tweaks the
general idea should be workable. And likewise for FR>.

The end result would be that F>R consumes 12 bytes and an extra
instruction on the return stack on every target. Better that what I
had in mind, but still worse than F>L.

- anton

For MIPS I, just replace

ins $tmp, $zero, 0, 3
by

.set push; .set noat
addiu $at, $zero, -4
and $tmp, $tmp, $at
.set pop

It's been a long time since I've had to take care of MIPS ISA levels
below 4k.

Just took the opportunity to skim above code for use-after-load without
load delay slots and happily discovered that I did it intuitively right.
Maybe because of my habit of still taking care for load delay pipeline
bubbles when hand-coding MIPS32.

Kind regards,
--
Bernd Linsel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bernd Linsel@21:1/5 to All on Wed May 24 22:36:53 2023

T24gMjQuMDUuMjAyMyAyMjoyOCwgQmVybmQgTGluc2VsIHdyb3RlOg0KPiBPbiAyNC4wNS4y MDIzIDE5OjM2LCBBbnRvbiBFcnRsIHdyb3RlOg0KPj4gQmVybmQgTGluc2VsIDxibDEtdGhp c3BhcnRkb2Vzbm90YmVsb25naGVyZUBnbXguY29tPiB3cml0ZXM6DQo+Pj4gT24gMjQuMDUu MjAyMyAxODoyMywgQW50b24gRXJ0bCB3cm90ZToNCj4+Pj4gZHhmb3J0aCA8ZHhmb3J0aEBn bWFpbC5jb20+IHdyaXRlczoNCj4+Pj4+IE9uIDI0LzA1LzIwMjMgNTo0OCBwbSwgQW50b24g RXJ0bCB3cm90ZToNCj4+Pj4+PiAuLi4NCj4+Pj4+PiBUaGF0J3Mgd2h5DQo+Pj4+Pj4gR2Zv cnRoIGRvZXMgbm90IGhhdmUgRj5SIEZSPi4NCj4+Pj4+DQo+Pj4+PiBTbyBub3QgY2VsbC1h bGlnbmVkIGJ1dCB3aGF0IGV4Y2x1ZGVzIHRob3NlIGZ1bmN0aW9ucz8NCj4+Pj4NCj4+Pj4g Rj5SIHdvdWxkIGJlIHNvbWV0aGluZyBsaWtlIChvbiBNSVBTKToNCj4+Pj4NCj4+Pj4gYWRk aXUgJHJwLCAkcnAsIC04DQo+Pj4+IHNkICRmdG9zLCAwKCRycCkNCj4+Pj4gbGQgJGZ0b3Ms IDAoJGZwKQ0KPj4+PiBhZGRpdSAkZnAsICRmcCwgOA0KPj4+PiBORVhUDQo+Pj4+DQo+Pj4+ IEJ1dCBycCBpcyBub3QgbmVjZXNzYXJpbHkgOC1ieXRlLWFsaWduZWQsIGFuZCBpZiBub3Qs IHRoaXMgcHJvZHVjZXMgYQ0KPj4+PiBTSUdCVVMuwqAgQW4gaW1wbGVtZW50YXRpb24gdGhh dCBkb2VzIG5vdCBoYXZlIHRoaXMgcHJvYmxlbSB3b3VsZA0KPj4+PiByZXF1aXJlIHNpZ25p ZmljYW50bHkgbW9yZSBpbnN0cnVjdGlvbnMsIGFuZCBpcyBzbG93ZXIgdGhhbiBvbmUgbWln aHQNCj4+Pj4gZXhwZWN0LsKgIEJ5IGNvbnRyYXN0LCB0aGUgbG9jYWxzIHN0YWNrIHBvaW50 ZXIgaXMgYWx3YXlzIGFsaWduZWQgZm9yDQo+Pj4+IGNlbGxzIGFuZCBmbG9hdHMsIHNvIHlv dSBkb24ndCBydW4gaW50byB0aGlzIHByb2JsZW0gd2l0aCBGUCBsb2NhbHMuDQo+Pj4+DQo+ Pj4+IC0gYW50b24NCj4+Pg0KPj4+IHJlbWVkeSAoJHRtcCBpcyBwbGFjZWhvbGRlciBmb3Ig YSBmcmVlIHNjcmF0Y2ggcmVnaXN0ZXIpOg0KPj4+IENvbnN1bWUgdGhyZWUgd29yZHMgb24g dGhlIHJldHVybiBzdGFjayBhbmQgOC1hbGlnbiBmdG9zIHdpdGhpbiB0aGVzZS4NCj4+Pg0K Pj4+IEZ0b1I6wqDCoMKgIGFkZGl1ICRycCwgJHJwLCAtMTINCj4+PiDCoMKgwqDCoGFkZGl1 ICR0bXAsICRycCwgNw0KPj4+IMKgwqDCoMKgaW5zICR0bXAsICR6ZXJvLCAwLCAzwqDCoMKg ICMgdXAtYWxpZ24gdG1wIHRvIG11bHRpcGxlIG9mIDgNCj4+PiDCoMKgwqDCoHNkICRmdG9z LCAwKCR0bXApDQo+Pj4gwqDCoMKgwqBsZCAkZnRvcywgMCgkZnApDQo+Pj4gwqDCoMKgwqBh ZGRpdSAkZnAsICRmcCwgOA0KPj4+IMKgwqDCoMKgTkVYVA0KPj4+DQo+Pj4gRlJmcm9tOsKg wqDCoCBhZGRpdSAkZnAsICRmcCwgLTgNCj4+PiDCoMKgwqDCoHNkICRmdG9zLCA4KCRmcCkN Cj4+PiDCoMKgwqDCoGFkZGl1ICR0bXAsICRycCwgNw0KPj4+IMKgwqDCoMKgaW5zICR0bXAs ICR6ZXJvLCAwLCAzwqDCoMKgICMgdXAtYWxpZ24gdG1wIHRvIG11bHRpcGxlIG9mIDgNCj4+ PiDCoMKgwqDCoGxkICRmdG9zLCAwKCR0bXApDQo+Pj4gwqDCoMKgwqBhZGRpdSAkcnAsICRy cCwgMTINCj4+PiDCoMKgwqDCoE5FWFQNCj4+DQo+PiBDbGV2ZXIhwqAgTm93IGZvciBNSVBT IEkgd2l0aG91dCB0aGUgImlucyIgaW5zdHJ1Y3Rpb24uwqAgTWF5YmUNCj4+IHNvbWV0aGlu ZyBsaWtlOg0KPj4NCj4+IEY+UjoNCj4+IG9yaSAkdG1wLCAkcnAsIDcNCj4+IGFkZGl1ICRy cCwgJHJwLCAtMTINCj4+IHNkICRmdG9zLCAtMTUoJHRtcCkNCj4+IGxkICRmdG9zLCAwKCRm cCkNCj4+IGFkZGl1ICRmcCwgJGZwLCA4DQo+PiBORVhUDQo+Pg0KPj4gTm90IHN1cmUgaWYg aXQgSSBnb3QgdGhlIGRldGFpbHMgcmlnaHQsIGJ1dCB3aXRoIHNvbWUgdHdlYWtzIHRoZQ0K Pj4gZ2VuZXJhbCBpZGVhIHNob3VsZCBiZSB3b3JrYWJsZS7CoCBBbmQgbGlrZXdpc2UgZm9y IEZSPi4NCj4+DQo+PiBUaGUgZW5kIHJlc3VsdCB3b3VsZCBiZSB0aGF0IEY+UiBjb25zdW1l cyAxMiBieXRlcyBhbmQgYW4gZXh0cmENCj4+IGluc3RydWN0aW9uIG9uIHRoZSByZXR1cm4g c3RhY2sgb24gZXZlcnkgdGFyZ2V0LsKgIEJldHRlciB0aGF0IHdoYXQgSQ0KPj4gaGFkIGlu IG1pbmQsIGJ1dCBzdGlsbCB3b3JzZSB0aGFuIEY+TC4NCj4+DQo+PiAtIGFudG9uDQo+IA0K PiBGb3IgTUlQUyBJLCBqdXN0IHJlcGxhY2UNCj4gDQo+ICDCoMKgwqDCoGlucyAkdG1wLCAk emVybywgMCwgMw0KPiBieQ0KPiANCj4gIMKgwqDCoMKgLnNldCBwdXNoOyAuc2V0IG5vYXQN Cj4gIMKgwqDCoMKgYWRkaXUgJGF0LCAkemVybywgLTQNCj4gIMKgwqDCoMKgYW5kICR0bXAs ICR0bXAsICRhdA0KPiAgwqDCoMKgwqAuc2V0IHBvcA0KDQpvb3BzLCBvdmVybG9va2VkIHRo YXQNCg0KCW9yaSAkdG1wLCAkcnAsIDcNCglzZCAkZnRvcywgLTE1KCR0bXApDQoNCmhhcyBl eGFjdGx5IHRoZSBzYW1lIGVmZmVjdC4uLiBzb3JyeSENCg0KDQo+IA0KPiBJdCdzIGJlZW4g YSBsb25nIHRpbWUgc2luY2UgSSd2ZSBoYWQgdG8gdGFrZSBjYXJlIG9mIE1JUFMgSVNBIGxl dmVscyANCj4gYmVsb3cgNGsuDQo+IA0KPiBKdXN0IHRvb2sgdGhlIG9wcG9ydHVuaXR5IHRv IHNraW0gYWJvdmUgY29kZSBmb3IgdXNlLWFmdGVyLWxvYWQgd2l0aG91dCANCj4gbG9hZCBk ZWxheSBzbG90cyBhbmQgaGFwcGlseSBkaXNjb3ZlcmVkIHRoYXQgSSBkaWQgaXQgaW50dWl0 aXZlbHkgcmlnaHQuDQo+IE1heWJlIGJlY2F1c2Ugb2YgbXkgaGFiaXQgb2Ygc3RpbGwgdGFr aW5nIGNhcmUgZm9yIGxvYWQgZGVsYXkgcGlwZWxpbmUgDQo+IGJ1YmJsZXMgd2hlbiBoYW5k LWNvZGluZyBNSVBTMzIuDQo+IA0KPiANCj4gS2luZCByZWdhcmRzLA0KDQotLSANCkJlcm5k IExpbnNlbA0KDQo=

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From dxforth@21:1/5 to Anton Ertl on Thu May 25 12:55:04 2023

On 25/05/2023 2:23 am, Anton Ertl wrote:

dxforth <dxforth@gmail.com> writes:

On 24/05/2023 5:48 pm, Anton Ertl wrote:

...
That's why
Gforth does not have F>R FR>.

So not cell-aligned but what excludes those functions?

R would be something like (on MIPS):

addiu $rp, $rp, -8
sd $ftos, 0($rp)
ld $ftos, 0($fp)
addiu $fp, $fp, 8
NEXT

But rp is not necessarily 8-byte-aligned, and if not, this produces a
SIGBUS. An implementation that does not have this problem would
require significantly more instructions, and is slower than one might
expect.

Agreed. But then F>R and FR> is about convenience - not performance.

By contrast, the locals stack pointer is always aligned for
cells and floats, so you don't run into this problem with FP locals.

Extending locals beyond cells takes effort and resources. If one has
FP locals, granted there's not much point providing F>R FR> .

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marcel Hendrix@21:1/5 to dxforth on Wed May 24 21:51:38 2023

On Thursday, May 25, 2023 at 4:55:08 AM UTC+2, dxforth wrote:
[..]

Extending locals beyond cells takes effort and resources. If one has
FP locals, granted there's not much point providing F>R FR> .

Effort? In iForth the locals stack can hold extended floats (10 bytes),
which take 16 bytes of aligned memory (iForth specific). That means
that most of the aligned SSE/AVX type variables/registers can also
be held there. Extending the locals stack to be 256 / 512 bits wide is
just changing a constant in the metacompiler. Maybe I'll do that when
we want our own AVX assembler.

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From dxforth@21:1/5 to Marcel Hendrix on Thu May 25 15:56:34 2023

On 25/05/2023 2:51 pm, Marcel Hendrix wrote:

On Thursday, May 25, 2023 at 4:55:08 AM UTC+2, dxforth wrote:
[..]

Extending locals beyond cells takes effort and resources. If one has
FP locals, granted there's not much point providing F>R FR> .

Effort? In iForth the locals stack can hold extended floats (10 bytes),
which take 16 bytes of aligned memory (iForth specific). That means
that most of the aligned SSE/AVX type variables/registers can also
be held there. Extending the locals stack to be 256 / 512 bits wide is
just changing a constant in the metacompiler. Maybe I'll do that when
we want our own AVX assembler.

AFAIK SwiftForth doesn't handle locals beyond integer. That they haven't extended it to doubles and floats is what - ignorance?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to dxforth on Thu May 25 06:30:02 2023

dxforth <dxforth@gmail.com> writes:

AFAIK SwiftForth doesn't handle locals beyond integer. That they haven't >extended it to doubles and floats is what - ignorance?

Forth, Inc. does not seem to have much interest in locals or floats.
E.g., SwiftForth 3.x does not have the float words loaded by default.
This changes in SwiftForth 4.x, but it shows the level of interest.
So it's no surprise that the combination of these two features is not
seen as interesting enough to be implemented.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Marcel Hendrix on Thu May 25 06:18:58 2023

Marcel Hendrix <mhx@iae.nl> writes:

Effort? In iForth the locals stack can hold extended floats (10 bytes),
which take 16 bytes of aligned memory (iForth specific). That means
that most of the aligned SSE/AVX type variables/registers can also
be held there. Extending the locals stack to be 256 / 512 bits wide is
just changing a constant in the metacompiler. Maybe I'll do that when
we want our own AVX assembler.

AVX and AVX-512 can access unaligned addresses, so more alignment is
not necessary. And this combined with SSE's 16-byte alignment
requirement is why AMD64 calling conventions require the stack pointer
to be 16-byte aligned but not aligned to 32-byte or 64-byte
boundaries.

The question is also if you want to have locals (and, I guess, a
stack) for the SIMD units (16 bytes with SSE and AVX-128, 32 bytes
with AVX (-256), 64 bytes with AVX-512), or if you deal with
arbitrary-length vectors and leave the SIMD units as implementation
detail. Or both?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to dxforth on Thu May 25 06:17:45 2023

dxforth <dxforth@gmail.com> writes:

Extending locals beyond cells takes effort and resources. If one has
FP locals, granted there's not much point providing F>R FR> .

Gforth has FP locals.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Keyop
  Sun May 5 19:26:27 2024
  from Huddersfield, West Yorkshire via SSH
- Keyop
  Sun May 5 19:26:11 2024
  from Huddersfield, West Yorkshire via SSH
- Bob Worm
  Mon May 6 11:44:29 2024
  from Wales, Uk via Telnet
- Bob Worm
  Tue May 7 09:06:52 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	41:41:43
Calls:	6,708
Calls today:	1
Files:	12,243
Messages:	5,353,855

Pushing and Popping FP numbers to/from Return Stack

Who's Online

Recent Visitors

System Info