• Pushing and Popping FP numbers to/from Return Stack

    From Krishna Myneni@21:1/5 to All on Sun May 21 18:05:28 2023
    On systems with a separate floating poing stack (Forth 200x standard
    systems), one disadvantage of the separate fp stack is the occasional
    need to push and pop floating point numbers off the fp stack. Temporary
    fp variables are okay for simple solutions, but are restrictive compared
    to having a stack onto which the fp numbers are pushed and popped.

    On Forth systems which provide FP@ and FP! for getting and setting the
    floating point stack pointer, and RP@ and RP! for the return stack
    pointer, the words F>R and FR> may be defined in source as shown below.

    Assumptions:

    1. The width of the floating point stack, in bytes, is given by 1 FLOATS
    2. The width above does not need to be a multiple of 1 CELLS
    3. The floating point stack is contiguous.

    The following code works in kForth-64 and in Gforth.

    1 floats constant fpsize
    fpsize 1 cells /mod swap [IF] 1+ [THEN] cells constant fpcells

    \ From Wil Baden's toolkit
    : :inline ( "name <char> ccc<char>" -- )
    : [CHAR] ; PARSE POSTPONE SLITERAL POSTPONE EVALUATE
    POSTPONE ; IMMEDIATE ;

    :inline f>r ( F: r -- ) ( r: -- i*x ) fp@ rp@ fpcells - 2dup fpsize move
    rp! fpsize + fp! ;
    :inline fr> ( F: -- r ) ( r: i*x -- ) rp@ fp@ fpsize - 2dup fpsize move
    fp! fpcells + rp! ;

    \ Test code

    : test1 f>r fr> ;
    3.214e test1 \ F: 3.214e


    : test2 fdup f>r 1e f- f>r fr> fr> ;
    1.23e test2 \ F: 2.3e-1 1.23e0

    --
    Krishna Myneni

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to Krishna Myneni on Mon May 22 19:57:30 2023
    On 22/05/2023 9:05 am, Krishna Myneni wrote:
    On systems with a separate floating poing stack (Forth 200x standard systems), one disadvantage of the separate fp stack is the occasional need to push and pop floating point numbers off the fp stack. Temporary fp variables are okay for simple
    solutions, but are restrictive compared to having a stack onto which the fp numbers are pushed and popped.


    I'd be interested in the case for having F>R FR> should anyone care to make it (?)
    I very rarely use fp but there was an instance in implementing fp on my system where F>R FR> would have come in handy. In the end I used a hidden variable but
    it did make me wonder. Implementation in my case would be a trivial code definition.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to Krishna Myneni on Mon May 22 20:48:50 2023
    Krishna Myneni schrieb am Montag, 22. Mai 2023 um 01:05:33 UTC+2:
    On systems with a separate floating poing stack (Forth 200x standard systems), one disadvantage of the separate fp stack is the occasional
    need to push and pop floating point numbers off the fp stack. Temporary
    fp variables are okay for simple solutions, but are restrictive compared
    to having a stack onto which the fp numbers are pushed and popped.

    On Forth systems which provide FP@ and FP! for getting and setting the floating point stack pointer, and RP@ and RP! for the return stack
    pointer, the words F>R and FR> may be defined in source as shown below.

    Assumptions:

    1. The width of the floating point stack, in bytes, is given by 1 FLOATS
    2. The width above does not need to be a multiple of 1 CELLS
    3. The floating point stack is contiguous.

    The following code works in kForth-64 and in Gforth.

    1 floats constant fpsize
    fpsize 1 cells /mod swap [IF] 1+ [THEN] cells constant fpcells

    \ From Wil Baden's toolkit
    : :inline ( "name <char> ccc<char>" -- )
    : [CHAR] ; PARSE POSTPONE SLITERAL POSTPONE EVALUATE
    POSTPONE ; IMMEDIATE ;

    :inline f>r ( F: r -- ) ( r: -- i*x ) fp@ rp@ fpcells - 2dup fpsize move
    rp! fpsize + fp! ;
    :inline fr> ( F: -- r ) ( r: i*x -- ) rp@ fp@ fpsize - 2dup fpsize move
    fp! fpcells + rp! ;

    \ Test code

    : test1 f>r fr> ;
    3.214e test1 \ F: 3.214e


    : test2 fdup f>r 1e f- f>r fr> fr> ;
    1.23e test2 \ F: 2.3e-1 1.23e0


    FP-alignment was obviously no issue here.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to dxforth on Tue May 23 21:38:17 2023
    On 5/22/23 04:57, dxforth wrote:
    On 22/05/2023 9:05 am, Krishna Myneni wrote:
    On systems with a separate floating poing stack (Forth 200x standard systems), one disadvantage of the separate fp stack is the occasional need to push and pop floating point numbers off the fp stack. Temporary fp variables are okay for simple
    solutions, but are restrictive compared to having a stack onto which the fp numbers are pushed and popped.


    I'd be interested in the case for having F>R FR> should anyone care to make it (?)
    I very rarely use fp but there was an instance in implementing fp on my system
    where F>R FR> would have come in handy. In the end I used a hidden variable but
    it did make me wonder. Implementation in my case would be a trivial code definition.



    I think F>R and FR> will be useful as primitives. I plan to implement
    them as such in kForth-64 (and in kForth-32 ver 3.x). The purpose of the
    source code versions, for systems with return stack and fp stack pointer
    words, is to exercise them a bit before baking them into the system as efficient words.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Krishna Myneni@21:1/5 to minforth on Tue May 23 21:49:04 2023
    On 5/22/23 22:48, minforth wrote:
    Krishna Myneni schrieb am Montag, 22. Mai 2023 um 01:05:33 UTC+2:
    On systems with a separate floating poing stack (Forth 200x standard
    systems), one disadvantage of the separate fp stack is the occasional
    need to push and pop floating point numbers off the fp stack. Temporary
    fp variables are okay for simple solutions, but are restrictive compared
    to having a stack onto which the fp numbers are pushed and popped.

    On Forth systems which provide FP@ and FP! for getting and setting the
    floating point stack pointer, and RP@ and RP! for the return stack
    pointer, the words F>R and FR> may be defined in source as shown below.

    Assumptions:

    1. The width of the floating point stack, in bytes, is given by 1 FLOATS
    2. The width above does not need to be a multiple of 1 CELLS
    3. The floating point stack is contiguous.

    The following code works in kForth-64 and in Gforth.

    1 floats constant fpsize
    fpsize 1 cells /mod swap [IF] 1+ [THEN] cells constant fpcells

    \ From Wil Baden's toolkit
    : :inline ( "name <char> ccc<char>" -- )
    : [CHAR] ; PARSE POSTPONE SLITERAL POSTPONE EVALUATE
    POSTPONE ; IMMEDIATE ;

    :inline f>r ( F: r -- ) ( r: -- i*x ) fp@ rp@ fpcells - 2dup fpsize move
    rp! fpsize + fp! ;
    :inline fr> ( F: -- r ) ( r: i*x -- ) rp@ fp@ fpsize - 2dup fpsize move
    fp! fpcells + rp! ;

    \ Test code

    : test1 f>r fr> ;
    3.214e test1 \ F: 3.214e


    : test2 fdup f>r 1e f- f>r fr> fr> ;
    1.23e test2 \ F: 2.3e-1 1.23e0


    FP-alignment was obviously no issue here.

    Maybe more generally we need to use CMOVE rather than MOVE to avoid the
    issue of alignment. On Intel it will work as implemented above.

    --
    Krishna

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to Krishna Myneni on Wed May 24 16:44:11 2023
    On 24/05/2023 12:49 pm, Krishna Myneni wrote:
    On 5/22/23 22:48, minforth wrote:
    ...
    FP-alignment was obviously no issue here.
      
    Maybe more generally we need to use CMOVE rather than MOVE to avoid the issue of alignment. On Intel it will work as implemented above.

    As a matter of interest are there any fp implementations that require better than
    cell aligned addresses?

    With regard to F>R FR> implementation, using F! and F@ should prove simpler and more portable. (I can't believe I said that :)

    I misremembered the circumstances under which F>R FR> would have been beneficial
    to me, so for now the routines will go into a library. The DTC definitions were:

    https://pastebin.com/sSuUBKT2

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to dxforth on Wed May 24 07:48:19 2023
    dxforth <dxforth@gmail.com> writes:
    As a matter of interest are there any fp implementations that require better than
    cell aligned addresses?

    32-bit Gforth on hardware that requires 8-byte alignment of DP FP
    numbers, e.g., MIPS, SPARC or some ARM implementations. That's why
    Gforth does not have F>R FR>.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Krishna Myneni on Wed May 24 08:10:26 2023
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    Maybe more generally we need to use CMOVE rather than MOVE to avoid the
    issue of alignment. On Intel it will work as implemented above.

    No. MOVE has no alignment requirements, CMOVE requires
    character-aligned addresses (although on pretty much all systems that
    is only a theoretical concern, because 1 char = 1 on them).

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to dxforth@gmail.com on Wed May 24 10:57:57 2023
    In article <u4kbnp$2sufu$1@dont-email.me>, dxforth <dxforth@gmail.com> wrote: >On 24/05/2023 12:49 pm, Krishna Myneni wrote:
    On 5/22/23 22:48, minforth wrote:
    ...
    FP-alignment was obviously no issue here.
      
    Maybe more generally we need to use CMOVE rather than MOVE to avoid
    the issue of alignment. On Intel it will work as implemented above.

    As a matter of interest are there any fp implementations that require
    better than
    cell aligned addresses?

    You mean 64 bit floats on a 32 bit system? Sure.
    ciforth doesn't bother to spoil the precision of 80 bit Intel stack
    floats. Storing those in memory costs 10 bytes.


    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Krishna Myneni on Wed May 24 08:35:07 2023
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    On systems with a separate floating poing stack (Forth 200x standard >systems), one disadvantage of the separate fp stack is the occasional
    need to push and pop floating point numbers off the fp stack. Temporary
    fp variables are okay for simple solutions, but are restrictive compared
    to having a stack onto which the fp numbers are pushed and popped.

    On Forth systems which provide FP@ and FP! for getting and setting the >floating point stack pointer, and RP@ and RP! for the return stack
    pointer, the words F>R and FR> may be defined in source as shown below.

    Assumptions:

    1. The width of the floating point stack, in bytes, is given by 1 FLOATS
    2. The width above does not need to be a multiple of 1 CELLS
    3. The floating point stack is contiguous.

    The following code works in kForth-64 and in Gforth.

    1 floats constant fpsize
    fpsize 1 cells /mod swap [IF] 1+ [THEN] cells constant fpcells

    \ From Wil Baden's toolkit
    : :inline ( "name <char> ccc<char>" -- )
    : [CHAR] ; PARSE POSTPONE SLITERAL POSTPONE EVALUATE
    POSTPONE ; IMMEDIATE ;

    :inline f>r ( F: r -- ) ( r: -- i*x ) fp@ rp@ fpcells - 2dup fpsize move
    rp! fpsize + fp! ;
    :inline fr> ( F: -- r ) ( r: i*x -- ) rp@ fp@ fpsize - 2dup fpsize move
    fp! fpcells + rp! ;

    This suffers from the usual pitfalls of EVALUATE-based macros <http://www.complang.tuwien.ac.at/forth/why-evaluate-is-bad>.

    This assumes that the stacks grow towards lower addresses. That's the
    case for most Forth systems.

    It also assumes that the top-of-FP-stack and top of return stack is in
    memory rather than in a register (maybe that's subsumed by assumption
    3); that assumption does not hold for gforth-fast, but FP@ and FP! try
    to support code that makes that assumption: FP@ stores the FTOS to
    memory, and FP! stores the FTOS register into memory before changing
    FP and loads it from memory afterwards; but you need to be careful
    when to perform FP@, FP! and other words that involve FTOS. The usage
    in the code above is fine.

    It also assumes that RP is cell-aligned after subtracting or adding 1
    FLOATS, or that RP does not require cell alignment; this assumption is
    true for all Forth systems I am aware of: Either 1 FLOATS is a
    multiple of the cell size (e.g., all Gforth ports), or the system does
    not require alignment (e.g., systems on IA-32/AMD64 that use 10-byte
    floats).

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to Anton Ertl on Wed May 24 11:02:41 2023
    In article <2023May24.094819@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    dxforth <dxforth@gmail.com> writes:
    As a matter of interest are there any fp implementations that require
    better than
    cell aligned addresses?

    32-bit Gforth on hardware that requires 8-byte alignment of DP FP
    numbers, e.g., MIPS, SPARC or some ARM implementations. That's why
    Gforth does not have F>R FR>.

    I guess a pinch of ingenuity solves this. For example the requirement
    that MS-Windows 64 bit require 16 bit alignment for parameters
    is easy to fullfil (once you start thinking about it).


    - anton

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to Anton Ertl on Wed May 24 11:05:46 2023
    In article <2023May24.101026@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    Maybe more generally we need to use CMOVE rather than MOVE to avoid the >>issue of alignment. On Intel it will work as implemented above.

    No. MOVE has no alignment requirements, CMOVE requires
    character-aligned addresses (although on pretty much all systems that
    is only a theoretical concern, because 1 char = 1 on them).

    One of this requires memory propagation, and the other is intelligent, i.e. moves up or down with overlapping regions. That is the most important distinction. It is hard to remember which is which though.

    - anton

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to Anton Ertl on Wed May 24 18:23:03 2023
    On 24/05/2023 5:48 pm, Anton Ertl wrote:
    ...
    That's why
    Gforth does not have F>R FR>.

    So not cell-aligned but what excludes those functions?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to albert on Wed May 24 19:42:05 2023
    On 24/05/2023 7:05 pm, albert wrote:
    In article <2023May24.101026@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Krishna Myneni <krishna.myneni@ccreweb.org> writes:
    Maybe more generally we need to use CMOVE rather than MOVE to avoid the
    issue of alignment. On Intel it will work as implemented above.

    No. MOVE has no alignment requirements, CMOVE requires
    character-aligned addresses (although on pretty much all systems that
    is only a theoretical concern, because 1 char = 1 on them).

    One of this requires memory propagation, and the other is intelligent, i.e. moves up or down with overlapping regions. That is the most important distinction. It is hard to remember which is which though.

    Historically CMOVE did characters and MOVE did cells. ANS and 200x messed
    with the latter, each in its own way.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to dxforth on Wed May 24 16:23:25 2023
    dxforth <dxforth@gmail.com> writes:
    On 24/05/2023 5:48 pm, Anton Ertl wrote:
    ...
    That's why
    Gforth does not have F>R FR>.

    So not cell-aligned but what excludes those functions?

    R would be something like (on MIPS):

    addiu $rp, $rp, -8
    sd $ftos, 0($rp)
    ld $ftos, 0($fp)
    addiu $fp, $fp, 8
    NEXT

    But rp is not necessarily 8-byte-aligned, and if not, this produces a
    SIGBUS. An implementation that does not have this problem would
    require significantly more instructions, and is slower than one might
    expect. By contrast, the locals stack pointer is always aligned for
    cells and floats, so you don't run into this problem with FP locals.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bernd Linsel@21:1/5 to Anton Ertl on Wed May 24 19:23:50 2023
    On 24.05.2023 18:23, Anton Ertl wrote:
    dxforth <dxforth@gmail.com> writes:
    On 24/05/2023 5:48 pm, Anton Ertl wrote:
    ...
    That's why
    Gforth does not have F>R FR>.

    So not cell-aligned but what excludes those functions?

    R would be something like (on MIPS):

    addiu $rp, $rp, -8
    sd $ftos, 0($rp)
    ld $ftos, 0($fp)
    addiu $fp, $fp, 8
    NEXT

    But rp is not necessarily 8-byte-aligned, and if not, this produces a
    SIGBUS. An implementation that does not have this problem would
    require significantly more instructions, and is slower than one might
    expect. By contrast, the locals stack pointer is always aligned for
    cells and floats, so you don't run into this problem with FP locals.

    - anton

    remedy ($tmp is placeholder for a free scratch register):
    Consume three words on the return stack and 8-align ftos within these.

    FtoR: addiu $rp, $rp, -12
    addiu $tmp, $rp, 7
    ins $tmp, $zero, 0, 3 # up-align tmp to multiple of 8
    sd $ftos, 0($tmp)
    ld $ftos, 0($fp)
    addiu $fp, $fp, 8
    NEXT

    FRfrom: addiu $fp, $fp, -8
    sd $ftos, 8($fp)
    addiu $tmp, $rp, 7
    ins $tmp, $zero, 0, 3 # up-align tmp to multiple of 8
    ld $ftos, 0($tmp)
    addiu $rp, $rp, 12
    NEXT


    Kind regards
    --
    Bernd Linsel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Bernd Linsel on Wed May 24 17:36:23 2023
    Bernd Linsel <bl1-thispartdoesnotbelonghere@gmx.com> writes:
    On 24.05.2023 18:23, Anton Ertl wrote:
    dxforth <dxforth@gmail.com> writes:
    On 24/05/2023 5:48 pm, Anton Ertl wrote:
    ...
    That's why
    Gforth does not have F>R FR>.

    So not cell-aligned but what excludes those functions?

    R would be something like (on MIPS):

    addiu $rp, $rp, -8
    sd $ftos, 0($rp)
    ld $ftos, 0($fp)
    addiu $fp, $fp, 8
    NEXT

    But rp is not necessarily 8-byte-aligned, and if not, this produces a
    SIGBUS. An implementation that does not have this problem would
    require significantly more instructions, and is slower than one might
    expect. By contrast, the locals stack pointer is always aligned for
    cells and floats, so you don't run into this problem with FP locals.

    - anton

    remedy ($tmp is placeholder for a free scratch register):
    Consume three words on the return stack and 8-align ftos within these.

    FtoR: addiu $rp, $rp, -12
    addiu $tmp, $rp, 7
    ins $tmp, $zero, 0, 3 # up-align tmp to multiple of 8
    sd $ftos, 0($tmp)
    ld $ftos, 0($fp)
    addiu $fp, $fp, 8
    NEXT

    FRfrom: addiu $fp, $fp, -8
    sd $ftos, 8($fp)
    addiu $tmp, $rp, 7
    ins $tmp, $zero, 0, 3 # up-align tmp to multiple of 8
    ld $ftos, 0($tmp)
    addiu $rp, $rp, 12
    NEXT

    Clever! Now for MIPS I without the "ins" instruction. Maybe
    something like:

    R:
    ori $tmp, $rp, 7
    addiu $rp, $rp, -12
    sd $ftos, -15($tmp)
    ld $ftos, 0($fp)
    addiu $fp, $fp, 8
    NEXT

    Not sure if it I got the details right, but with some tweaks the
    general idea should be workable. And likewise for FR>.

    The end result would be that F>R consumes 12 bytes and an extra
    instruction on the return stack on every target. Better that what I
    had in mind, but still worse than F>L.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bernd Linsel@21:1/5 to Anton Ertl on Wed May 24 22:28:59 2023
    On 24.05.2023 19:36, Anton Ertl wrote:
    Bernd Linsel <bl1-thispartdoesnotbelonghere@gmx.com> writes:
    On 24.05.2023 18:23, Anton Ertl wrote:
    dxforth <dxforth@gmail.com> writes:
    On 24/05/2023 5:48 pm, Anton Ertl wrote:
    ...
    That's why
    Gforth does not have F>R FR>.

    So not cell-aligned but what excludes those functions?

    R would be something like (on MIPS):

    addiu $rp, $rp, -8
    sd $ftos, 0($rp)
    ld $ftos, 0($fp)
    addiu $fp, $fp, 8
    NEXT

    But rp is not necessarily 8-byte-aligned, and if not, this produces a
    SIGBUS. An implementation that does not have this problem would
    require significantly more instructions, and is slower than one might
    expect. By contrast, the locals stack pointer is always aligned for
    cells and floats, so you don't run into this problem with FP locals.

    - anton

    remedy ($tmp is placeholder for a free scratch register):
    Consume three words on the return stack and 8-align ftos within these.

    FtoR: addiu $rp, $rp, -12
    addiu $tmp, $rp, 7
    ins $tmp, $zero, 0, 3 # up-align tmp to multiple of 8
    sd $ftos, 0($tmp)
    ld $ftos, 0($fp)
    addiu $fp, $fp, 8
    NEXT

    FRfrom: addiu $fp, $fp, -8
    sd $ftos, 8($fp)
    addiu $tmp, $rp, 7
    ins $tmp, $zero, 0, 3 # up-align tmp to multiple of 8
    ld $ftos, 0($tmp)
    addiu $rp, $rp, 12
    NEXT

    Clever! Now for MIPS I without the "ins" instruction. Maybe
    something like:

    R:
    ori $tmp, $rp, 7
    addiu $rp, $rp, -12
    sd $ftos, -15($tmp)
    ld $ftos, 0($fp)
    addiu $fp, $fp, 8
    NEXT

    Not sure if it I got the details right, but with some tweaks the
    general idea should be workable. And likewise for FR>.

    The end result would be that F>R consumes 12 bytes and an extra
    instruction on the return stack on every target. Better that what I
    had in mind, but still worse than F>L.

    - anton

    For MIPS I, just replace

    ins $tmp, $zero, 0, 3
    by

    .set push; .set noat
    addiu $at, $zero, -4
    and $tmp, $tmp, $at
    .set pop

    It's been a long time since I've had to take care of MIPS ISA levels
    below 4k.

    Just took the opportunity to skim above code for use-after-load without
    load delay slots and happily discovered that I did it intuitively right.
    Maybe because of my habit of still taking care for load delay pipeline
    bubbles when hand-coding MIPS32.


    Kind regards,
    --
    Bernd Linsel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Bernd Linsel@21:1/5 to All on Wed May 24 22:36:53 2023
    T24gMjQuMDUuMjAyMyAyMjoyOCwgQmVybmQgTGluc2VsIHdyb3RlOg0KPiBPbiAyNC4wNS4y MDIzIDE5OjM2LCBBbnRvbiBFcnRsIHdyb3RlOg0KPj4gQmVybmQgTGluc2VsIDxibDEtdGhp c3BhcnRkb2Vzbm90YmVsb25naGVyZUBnbXguY29tPiB3cml0ZXM6DQo+Pj4gT24gMjQuMDUu MjAyMyAxODoyMywgQW50b24gRXJ0bCB3cm90ZToNCj4+Pj4gZHhmb3J0aCA8ZHhmb3J0aEBn bWFpbC5jb20+IHdyaXRlczoNCj4+Pj4+IE9uIDI0LzA1LzIwMjMgNTo0OCBwbSwgQW50b24g RXJ0bCB3cm90ZToNCj4+Pj4+PiAuLi4NCj4+Pj4+PiBUaGF0J3Mgd2h5DQo+Pj4+Pj4gR2Zv cnRoIGRvZXMgbm90IGhhdmUgRj5SIEZSPi4NCj4+Pj4+DQo+Pj4+PiBTbyBub3QgY2VsbC1h bGlnbmVkIGJ1dCB3aGF0IGV4Y2x1ZGVzIHRob3NlIGZ1bmN0aW9ucz8NCj4+Pj4NCj4+Pj4g Rj5SIHdvdWxkIGJlIHNvbWV0aGluZyBsaWtlIChvbiBNSVBTKToNCj4+Pj4NCj4+Pj4gYWRk aXUgJHJwLCAkcnAsIC04DQo+Pj4+IHNkICRmdG9zLCAwKCRycCkNCj4+Pj4gbGQgJGZ0b3Ms IDAoJGZwKQ0KPj4+PiBhZGRpdSAkZnAsICRmcCwgOA0KPj4+PiBORVhUDQo+Pj4+DQo+Pj4+ IEJ1dCBycCBpcyBub3QgbmVjZXNzYXJpbHkgOC1ieXRlLWFsaWduZWQsIGFuZCBpZiBub3Qs IHRoaXMgcHJvZHVjZXMgYQ0KPj4+PiBTSUdCVVMuwqAgQW4gaW1wbGVtZW50YXRpb24gdGhh dCBkb2VzIG5vdCBoYXZlIHRoaXMgcHJvYmxlbSB3b3VsZA0KPj4+PiByZXF1aXJlIHNpZ25p ZmljYW50bHkgbW9yZSBpbnN0cnVjdGlvbnMsIGFuZCBpcyBzbG93ZXIgdGhhbiBvbmUgbWln aHQNCj4+Pj4gZXhwZWN0LsKgIEJ5IGNvbnRyYXN0LCB0aGUgbG9jYWxzIHN0YWNrIHBvaW50 ZXIgaXMgYWx3YXlzIGFsaWduZWQgZm9yDQo+Pj4+IGNlbGxzIGFuZCBmbG9hdHMsIHNvIHlv dSBkb24ndCBydW4gaW50byB0aGlzIHByb2JsZW0gd2l0aCBGUCBsb2NhbHMuDQo+Pj4+DQo+ Pj4+IC0gYW50b24NCj4+Pg0KPj4+IHJlbWVkeSAoJHRtcCBpcyBwbGFjZWhvbGRlciBmb3Ig YSBmcmVlIHNjcmF0Y2ggcmVnaXN0ZXIpOg0KPj4+IENvbnN1bWUgdGhyZWUgd29yZHMgb24g dGhlIHJldHVybiBzdGFjayBhbmQgOC1hbGlnbiBmdG9zIHdpdGhpbiB0aGVzZS4NCj4+Pg0K Pj4+IEZ0b1I6wqDCoMKgIGFkZGl1ICRycCwgJHJwLCAtMTINCj4+PiDCoMKgwqDCoGFkZGl1 ICR0bXAsICRycCwgNw0KPj4+IMKgwqDCoMKgaW5zICR0bXAsICR6ZXJvLCAwLCAzwqDCoMKg ICMgdXAtYWxpZ24gdG1wIHRvIG11bHRpcGxlIG9mIDgNCj4+PiDCoMKgwqDCoHNkICRmdG9z LCAwKCR0bXApDQo+Pj4gwqDCoMKgwqBsZCAkZnRvcywgMCgkZnApDQo+Pj4gwqDCoMKgwqBh ZGRpdSAkZnAsICRmcCwgOA0KPj4+IMKgwqDCoMKgTkVYVA0KPj4+DQo+Pj4gRlJmcm9tOsKg wqDCoCBhZGRpdSAkZnAsICRmcCwgLTgNCj4+PiDCoMKgwqDCoHNkICRmdG9zLCA4KCRmcCkN Cj4+PiDCoMKgwqDCoGFkZGl1ICR0bXAsICRycCwgNw0KPj4+IMKgwqDCoMKgaW5zICR0bXAs ICR6ZXJvLCAwLCAzwqDCoMKgICMgdXAtYWxpZ24gdG1wIHRvIG11bHRpcGxlIG9mIDgNCj4+ PiDCoMKgwqDCoGxkICRmdG9zLCAwKCR0bXApDQo+Pj4gwqDCoMKgwqBhZGRpdSAkcnAsICRy cCwgMTINCj4+PiDCoMKgwqDCoE5FWFQNCj4+DQo+PiBDbGV2ZXIhwqAgTm93IGZvciBNSVBT IEkgd2l0aG91dCB0aGUgImlucyIgaW5zdHJ1Y3Rpb24uwqAgTWF5YmUNCj4+IHNvbWV0aGlu ZyBsaWtlOg0KPj4NCj4+IEY+UjoNCj4+IG9yaSAkdG1wLCAkcnAsIDcNCj4+IGFkZGl1ICRy cCwgJHJwLCAtMTINCj4+IHNkICRmdG9zLCAtMTUoJHRtcCkNCj4+IGxkICRmdG9zLCAwKCRm cCkNCj4+IGFkZGl1ICRmcCwgJGZwLCA4DQo+PiBORVhUDQo+Pg0KPj4gTm90IHN1cmUgaWYg aXQgSSBnb3QgdGhlIGRldGFpbHMgcmlnaHQsIGJ1dCB3aXRoIHNvbWUgdHdlYWtzIHRoZQ0K Pj4gZ2VuZXJhbCBpZGVhIHNob3VsZCBiZSB3b3JrYWJsZS7CoCBBbmQgbGlrZXdpc2UgZm9y IEZSPi4NCj4+DQo+PiBUaGUgZW5kIHJlc3VsdCB3b3VsZCBiZSB0aGF0IEY+UiBjb25zdW1l cyAxMiBieXRlcyBhbmQgYW4gZXh0cmENCj4+IGluc3RydWN0aW9uIG9uIHRoZSByZXR1cm4g c3RhY2sgb24gZXZlcnkgdGFyZ2V0LsKgIEJldHRlciB0aGF0IHdoYXQgSQ0KPj4gaGFkIGlu IG1pbmQsIGJ1dCBzdGlsbCB3b3JzZSB0aGFuIEY+TC4NCj4+DQo+PiAtIGFudG9uDQo+IA0K PiBGb3IgTUlQUyBJLCBqdXN0IHJlcGxhY2UNCj4gDQo+ICDCoMKgwqDCoGlucyAkdG1wLCAk emVybywgMCwgMw0KPiBieQ0KPiANCj4gIMKgwqDCoMKgLnNldCBwdXNoOyAuc2V0IG5vYXQN Cj4gIMKgwqDCoMKgYWRkaXUgJGF0LCAkemVybywgLTQNCj4gIMKgwqDCoMKgYW5kICR0bXAs ICR0bXAsICRhdA0KPiAgwqDCoMKgwqAuc2V0IHBvcA0KDQpvb3BzLCBvdmVybG9va2VkIHRo YXQNCg0KCW9yaSAkdG1wLCAkcnAsIDcNCglzZCAkZnRvcywgLTE1KCR0bXApDQoNCmhhcyBl eGFjdGx5IHRoZSBzYW1lIGVmZmVjdC4uLiBzb3JyeSENCg0KDQo+IA0KPiBJdCdzIGJlZW4g YSBsb25nIHRpbWUgc2luY2UgSSd2ZSBoYWQgdG8gdGFrZSBjYXJlIG9mIE1JUFMgSVNBIGxl dmVscyANCj4gYmVsb3cgNGsuDQo+IA0KPiBKdXN0IHRvb2sgdGhlIG9wcG9ydHVuaXR5IHRv IHNraW0gYWJvdmUgY29kZSBmb3IgdXNlLWFmdGVyLWxvYWQgd2l0aG91dCANCj4gbG9hZCBk ZWxheSBzbG90cyBhbmQgaGFwcGlseSBkaXNjb3ZlcmVkIHRoYXQgSSBkaWQgaXQgaW50dWl0 aXZlbHkgcmlnaHQuDQo+IE1heWJlIGJlY2F1c2Ugb2YgbXkgaGFiaXQgb2Ygc3RpbGwgdGFr aW5nIGNhcmUgZm9yIGxvYWQgZGVsYXkgcGlwZWxpbmUgDQo+IGJ1YmJsZXMgd2hlbiBoYW5k LWNvZGluZyBNSVBTMzIuDQo+IA0KPiANCj4gS2luZCByZWdhcmRzLA0KDQotLSANCkJlcm5k IExpbnNlbA0KDQo=

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to Anton Ertl on Thu May 25 12:55:04 2023
    On 25/05/2023 2:23 am, Anton Ertl wrote:
    dxforth <dxforth@gmail.com> writes:
    On 24/05/2023 5:48 pm, Anton Ertl wrote:
    ...
    That's why
    Gforth does not have F>R FR>.

    So not cell-aligned but what excludes those functions?

    R would be something like (on MIPS):

    addiu $rp, $rp, -8
    sd $ftos, 0($rp)
    ld $ftos, 0($fp)
    addiu $fp, $fp, 8
    NEXT

    But rp is not necessarily 8-byte-aligned, and if not, this produces a
    SIGBUS. An implementation that does not have this problem would
    require significantly more instructions, and is slower than one might
    expect.

    Agreed. But then F>R and FR> is about convenience - not performance.

    By contrast, the locals stack pointer is always aligned for
    cells and floats, so you don't run into this problem with FP locals.

    Extending locals beyond cells takes effort and resources. If one has
    FP locals, granted there's not much point providing F>R FR> .

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to dxforth on Wed May 24 21:51:38 2023
    On Thursday, May 25, 2023 at 4:55:08 AM UTC+2, dxforth wrote:
    [..]
    Extending locals beyond cells takes effort and resources. If one has
    FP locals, granted there's not much point providing F>R FR> .

    Effort? In iForth the locals stack can hold extended floats (10 bytes),
    which take 16 bytes of aligned memory (iForth specific). That means
    that most of the aligned SSE/AVX type variables/registers can also
    be held there. Extending the locals stack to be 256 / 512 bits wide is
    just changing a constant in the metacompiler. Maybe I'll do that when
    we want our own AVX assembler.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to Marcel Hendrix on Thu May 25 15:56:34 2023
    On 25/05/2023 2:51 pm, Marcel Hendrix wrote:
    On Thursday, May 25, 2023 at 4:55:08 AM UTC+2, dxforth wrote:
    [..]
    Extending locals beyond cells takes effort and resources. If one has
    FP locals, granted there's not much point providing F>R FR> .

    Effort? In iForth the locals stack can hold extended floats (10 bytes),
    which take 16 bytes of aligned memory (iForth specific). That means
    that most of the aligned SSE/AVX type variables/registers can also
    be held there. Extending the locals stack to be 256 / 512 bits wide is
    just changing a constant in the metacompiler. Maybe I'll do that when
    we want our own AVX assembler.

    AFAIK SwiftForth doesn't handle locals beyond integer. That they haven't extended it to doubles and floats is what - ignorance?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to dxforth on Thu May 25 06:30:02 2023
    dxforth <dxforth@gmail.com> writes:
    AFAIK SwiftForth doesn't handle locals beyond integer. That they haven't >extended it to doubles and floats is what - ignorance?

    Forth, Inc. does not seem to have much interest in locals or floats.
    E.g., SwiftForth 3.x does not have the float words loaded by default.
    This changes in SwiftForth 4.x, but it shows the level of interest.
    So it's no surprise that the combination of these two features is not
    seen as interesting enough to be implemented.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Marcel Hendrix on Thu May 25 06:18:58 2023
    Marcel Hendrix <mhx@iae.nl> writes:
    Effort? In iForth the locals stack can hold extended floats (10 bytes),
    which take 16 bytes of aligned memory (iForth specific). That means
    that most of the aligned SSE/AVX type variables/registers can also
    be held there. Extending the locals stack to be 256 / 512 bits wide is
    just changing a constant in the metacompiler. Maybe I'll do that when
    we want our own AVX assembler.

    AVX and AVX-512 can access unaligned addresses, so more alignment is
    not necessary. And this combined with SSE's 16-byte alignment
    requirement is why AMD64 calling conventions require the stack pointer
    to be 16-byte aligned but not aligned to 32-byte or 64-byte
    boundaries.

    The question is also if you want to have locals (and, I guess, a
    stack) for the SIMD units (16 bytes with SSE and AVX-128, 32 bytes
    with AVX (-256), 64 bytes with AVX-512), or if you deal with
    arbitrary-length vectors and leave the SIMD units as implementation
    detail. Or both?

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to dxforth on Thu May 25 06:17:45 2023
    dxforth <dxforth@gmail.com> writes:
    Extending locals beyond cells takes effort and resources. If one has
    FP locals, granted there's not much point providing F>R FR> .

    Gforth has FP locals.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)