• Forth for a Addressible Stack CPU

    From Rick C@21:1/5 to All on Wed Jul 20 13:14:34 2022
    The last stack processor design I worked on had instructions for addressing operands on the stack rather than just the top data items. It didn't reach very deep, but the difference was noticeable in code density. A sample program (interrupt routine for
    managing the data for a Numerically Controlled Oscillator - NCO) was reduced in size by a third, by eliminating stack manipulations.

    But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would take a fair amount of work to design something like a compiler for a more typical register based CPU.

    I may resurrect work on the design. Even if it is not supported with a Forth tool, it can be programmed in assembly which is not so much different from Forth, other than the addressability, which can be ignored for non-optimized code.

    --

    Rick C.

    - Get 1,000 miles of free Supercharging
    - Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Clive Arthur@21:1/5 to Rick C on Wed Jul 20 22:50:07 2022
    On 20/07/2022 21:14, Rick C wrote:
    The last stack processor design I worked on had instructions for addressing operands on the stack rather than just the top data items. It didn't reach very deep, but the difference was noticeable in code density. A sample program (interrupt routine
    for managing the data for a Numerically Controlled Oscillator - NCO) was reduced in size by a third, by eliminating stack manipulations.

    But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would take a fair amount of work to design something like a compiler for a more typical register based CPU.

    I may resurrect work on the design. Even if it is not supported with a Forth tool, it can be programmed in assembly which is not so much different from Forth, other than the addressability, which can be ignored for non-optimized code.


    The Patriot Scientific PTSC1000 had single cycle instructions for
    fetching and storing return stack items as well as dropping them. - IIRC
    the top 15. This meant that you could use >r to push locals and then
    r1@, r5@, r3! etc (my names) to manipulate and use them then say r8drop
    to tidy up.

    I found it very useful, but sadly the PTSC1000 was discontinued. It was
    based on Chuck's ShBoom processor, so maybe he saw the utility too.

    --
    Cheers
    Clive

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Rick C on Wed Jul 20 18:13:42 2022
    Rick C <gnuarm.deletethisbit@gmail.com> writes:
    The last stack processor design I worked on had instructions for
    addressing operands on the stack rather than just the top data items.
    It didn't reach very deep, but the difference was noticeable in code
    density.

    Congratulations, you have re-invented locals ;)

    But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would take
    a fair amount of work to design something like a compiler for a more
    typical register based CPU.

    Idk about Forth compilers but it is a usual thing for C compilers,
    particularly for machines with not many registers, so they have to put
    locals in the stack. Some cpus like the SPARC had "register windows"
    which basically meant the register file was aliased to the top N stack
    slots. That got rid of the overhead of saving and restoring registers
    on subroutine call and return. I guess it had other costs since the
    scheme hasn't been popular. I've never tried to program it at low
    level.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to Paul Rubin on Thu Jul 21 03:46:44 2022
    On Thursday, July 21, 2022 at 11:13:44 AM UTC+10, Paul Rubin wrote:
    Rick C <gnuarm.del...@gmail.com> writes:
    The last stack processor design I worked on had instructions for
    addressing operands on the stack rather than just the top data items.
    It didn't reach very deep, but the difference was noticeable in code density.
    Congratulations, you have re-invented locals ;)
    But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would take
    a fair amount of work to design something like a compiler for a more typical register based CPU.
    Idk about Forth compilers but it is a usual thing for C compilers, particularly for machines with not many registers, so they have to put

    locals in the stack. Some cpus like the SPARC had "register windows"
    which basically meant the register file was aliased to the top N stack
    slots. That got rid of the overhead of saving and restoring registers
    on subroutine call and return. I guess it had other costs since the
    scheme hasn't been popular. I've never tried to program it at low
    level.

    Well, as it is presumably a custom design, with no changes in processor,
    he could just make forth words to do the operation, and project segment
    them in a lost of words which have to be re-written if the processor
    changes. The original design would just use the assembly for that word.
    I don't see what the problem is. But, I'm presuming it's a forth processor, and not another some stack derivative. Still, would like to inspect how
    good this ISA is.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Rick C on Thu Jul 21 14:02:01 2022
    Rick C <gnuarm.deletethisbit@gmail.com> writes:
    The last stack processor design I worked on had instructions for addressing=
    operands on the stack rather than just the top data items. It didn't reac=
    h very deep, but the difference was noticeable in code density. A sample p= >rogram (interrupt routine for managing the data for a Numerically Controlle= >d Oscillator - NCO) was reduced in size by a third, by eliminating stack m= >anipulations. =20

    But I never saw a way to write a Forth compiler that would be able to optim= >ize the code for such an architecture. I suppose that would take a fair am= >ount of work to design something like a compiler for a more typical registe= >r based CPU.

    The last sentence is confusing. How is it related to the rest?

    I think that a Forth compiler for such an architecture could be as
    simple as optimizing sequences like OVER + into a single instruction,
    and programmers could then design their code to prefer DUP, OVER,
    THIRD and the like over SWAP, ROT, TUCK, and the like.

    If you are more ambitious, I think you can do something with similar
    complexity of an analytic compiler for a register machine. I have no
    idea how much more that would buy.

    BTW, such an architecture is widely available: The 387. Looking at
    what iForth, lxf, and VFX 4.72 produce for

    : foo fover f+ ;

    they don't make use of this optimization opportunity, but all generate
    stuff like:

    ( 080C6C70 D9C1 ) FLD ST(1)
    ( 080C6C72 DEC1 ) FADDP ST(1), ST
    ( 080C6C74 C3 ) NEXT,
    ( 5 bytes, 3 instructions )

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to Wayne morellini on Thu Jul 21 06:33:33 2022
    On Thursday, July 21, 2022 at 6:46:45 AM UTC-4, Wayne morellini wrote:
    On Thursday, July 21, 2022 at 11:13:44 AM UTC+10, Paul Rubin wrote:
    Rick C <gnuarm.del...@gmail.com> writes:
    The last stack processor design I worked on had instructions for addressing operands on the stack rather than just the top data items.
    It didn't reach very deep, but the difference was noticeable in code density.
    Congratulations, you have re-invented locals ;)
    But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would take
    a fair amount of work to design something like a compiler for a more typical register based CPU.
    Idk about Forth compilers but it is a usual thing for C compilers, particularly for machines with not many registers, so they have to put

    locals in the stack. Some cpus like the SPARC had "register windows"
    which basically meant the register file was aliased to the top N stack slots. That got rid of the overhead of saving and restoring registers
    on subroutine call and return. I guess it had other costs since the
    scheme hasn't been popular. I've never tried to program it at low
    level.
    Well, as it is presumably a custom design, with no changes in processor,
    he could just make forth words to do the operation, and project segment
    them in a lost of words which have to be re-written if the processor
    changes. The original design would just use the assembly for that word.
    I don't see what the problem is. But, I'm presuming it's a forth processor, and not another some stack derivative. Still, would like to inspect how
    good this ISA is.

    What is a "Forth processor" as distinct from a stack processor?

    --

    Rick C.

    + Get 1,000 miles of free Supercharging
    + Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Paul Rubin on Thu Jul 21 14:14:03 2022
    Paul Rubin <no.email@nospam.invalid> writes:
    Some cpus like the SPARC had "register windows"
    which basically meant the register file was aliased to the top N stack
    slots.

    No. Registers on SPARC (and rotating register files on AMD29K, and
    IA-64) are not addressable as memory. If you use more register
    windows than the hardware has registers, the contents of the most
    remote windows are stored to memory in an interrupt routine; if you
    then pop windows back until you reach one that is no longer in the
    register file, another interrupt loads the contents from memory into
    the register window.

    Similarly on the AMD29K, and AFAIK in practice on IA-64
    implementations; IA-64 implementations were intended to do this
    spilling and refilling in the background in hardware, but all I heard
    is that it did not work, and that the interrupt approach continued to
    be used.

    The Burroughs B5500 had a stack where an entry also has a memory
    address. Similarly, the AT&T CRISP/Hobbit also used an adressable
    stack. For performance in modern times (already in the CRISP days,
    less so in the B5500 days) this is a bad idea, because it makes
    pipelining harder and you have to check against a memory alias all the
    time, and deal with it when it occurs. There is a reason why Berkeley
    RISC, its child SPARC, the AMD29K and its descendant IA-64 did not go
    there.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to Anton Ertl on Thu Jul 21 07:54:34 2022
    On Friday, July 22, 2022 at 12:13:55 AM UTC+10, Anton Ertl wrote:
    Rick C <gnuarm.del...@gmail.com> writes:
    The last stack processor design I worked on had instructions for addressing=
    operands on the stack rather than just the top data items. It didn't reac=
    h very deep, but the difference was noticeable in code density. A sample p= >rogram (interrupt routine for managing the data for a Numerically Controlle= >d Oscillator - NCO) was reduced in size by a third, by eliminating stack m= >anipulations. =20

    But I never saw a way to write a Forth compiler that would be able to optim= >ize the code for such an architecture. I suppose that would take a fair am= >ount of work to design something like a compiler for a more typical registe= >r based CPU.

    The last sentence is confusing. How is it related to the rest?

    I think that a Forth compiler for such an architecture could be as
    simple as optimizing sequences like OVER + into a single instruction,
    and programmers could then design their code to prefer DUP, OVER,
    THIRD and the like over SWAP, ROT, TUCK, and the like.

    If you are more ambitious, I think you can do something with similar complexity of an analytic compiler for a register machine. I have no
    idea how much more that would buy.

    BTW, such an architecture is widely available: The 387. Looking at
    what iForth, lxf, and VFX 4.72 produce for

    : foo fover f+ ;

    they don't make use of this optimization opportunity, but all generate
    stuff like:

    ( 080C6C70 D9C1 ) FLD ST(1)
    ( 080C6C72 DEC1 ) FADDP ST(1), ST
    ( 080C6C74 C3 ) NEXT,
    ( 5 bytes, 3 instructions )

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

    I see, it's to run existing libraries (of which?).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Rick C on Thu Jul 21 09:43:07 2022
    Rick C <gnuarm.deletethisbit@gmail.com> writes:
    What is a "Forth processor" as distinct from a stack processor?

    Forth processors traditionally have two stacks, I think. I don't know
    if non-Forth "stack processors" have that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to Paul Rubin on Thu Jul 21 19:24:21 2022
    On Friday, July 22, 2022 at 2:43:23 AM UTC+10, Paul Rubin wrote:
    Rick C <gnuarm.del...@gmail.com> writes:
    What is a "Forth processor" as distinct from a stack processor?
    Forth processors traditionally have two stacks, I think. I don't know
    if non-Forth "stack processors" have that.
    I guessed we are not talking about a C machine here. There is more
    than forth, and if I added a few instructions to the 6502 instruction
    set, to handle data on stacks, I could then program it with interpreted
    forth and use those new machine instructions, fur data processing,
    without it being a forth processor. So, still a bit obscure. The
    definition has been avoided "what is ..forth processor" so, I guess it
    isn't, it's something that uses stacks they can use forth. Well see the clarity.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to Anton Ertl on Thu Jul 21 22:05:01 2022
    On Thursday, July 21, 2022 at 10:13:55 AM UTC-4, Anton Ertl wrote:
    Rick C <gnuarm.del...@gmail.com> writes:
    The last stack processor design I worked on had instructions for addressing=
    operands on the stack rather than just the top data items. It didn't reac=
    h very deep, but the difference was noticeable in code density. A sample p= >rogram (interrupt routine for managing the data for a Numerically Controlle=
    d Oscillator - NCO) was reduced in size by a third, by eliminating stack m= >anipulations. =20

    But I never saw a way to write a Forth compiler that would be able to optim=
    ize the code for such an architecture. I suppose that would take a fair am= >ount of work to design something like a compiler for a more typical registe=
    r based CPU.

    The last sentence is confusing. How is it related to the rest?

    Forth is an easy thing to write. You either use the stack architecture of the stack CPU or you emulate a virtual stack machine. It seems to me that to optimize a stack processor which has stack addressing, it would require some of the same features
    found in compilers for register based machines. My understanding is this would be significantly more work.


    I think that a Forth compiler for such an architecture could be as
    simple as optimizing sequences like OVER + into a single instruction,
    and programmers could then design their code to prefer DUP, OVER,
    THIRD and the like over SWAP, ROT, TUCK, and the like.

    I don't think it is that simple. I found that, for all practical purposes, every stack manipulation could be optimized away. But it requires careful ordering of operands, even more so than with traditional Forth. I had the code in front of me for a
    standard stack architecture, and I found myself continually working backwards to optimize the sequencing of data and calculations. But then, I know little about writing optimizing compilers.


    If you are more ambitious, I think you can do something with similar complexity of an analytic compiler for a register machine. I have no
    idea how much more that would buy.

    BTW, such an architecture is widely available: The 387. Looking at
    what iForth, lxf, and VFX 4.72 produce for

    : foo fover f+ ;

    they don't make use of this optimization opportunity, but all generate
    stuff like:

    ( 080C6C70 D9C1 ) FLD ST(1)
    ( 080C6C72 DEC1 ) FADDP ST(1), ST
    ( 080C6C74 C3 ) NEXT,
    ( 5 bytes, 3 instructions )

    Ok

    --

    Rick C.

    -- Get 1,000 miles of free Supercharging
    -- Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to Paul Rubin on Thu Jul 21 22:07:09 2022
    On Thursday, July 21, 2022 at 12:43:23 PM UTC-4, Paul Rubin wrote:
    Rick C <gnuarm.del...@gmail.com> writes:
    What is a "Forth processor" as distinct from a stack processor?
    Forth processors traditionally have two stacks, I think. I don't know
    if non-Forth "stack processors" have that.

    I suppose it depends on how you define, "non-Forth" stack processors.

    --

    Rick C.

    -+ Get 1,000 miles of free Supercharging
    -+ Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Rick C on Fri Jul 22 00:33:29 2022
    Rick C <gnuarm.deletethisbit@gmail.com> writes:
    I suppose it depends on how you define, "non-Forth" stack processors.

    The Burroughs B5500 etc. predated Forth for a while. Their main system programming language was Algol-60. Koopman talks about them in his book
    "Stack Machines" but it's been a while since I read that book, so idr
    what it said.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to Anton Ertl on Fri Jul 22 11:58:52 2022
    In article <2022Jul21.161403@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Paul Rubin <no.email@nospam.invalid> writes:
    Some cpus like the SPARC had "register windows"
    which basically meant the register file was aliased to the top N stack >>slots.

    No. Registers on SPARC (and rotating register files on AMD29K, and
    IA-64) are not addressable as memory. If you use more register
    windows than the hardware has registers, the contents of the most
    remote windows are stored to memory in an interrupt routine; if you
    then pop windows back until you reach one that is no longer in the
    register file, another interrupt loads the contents from memory into
    the register window.

    Similarly on the AMD29K, and AFAIK in practice on IA-64
    implementations; IA-64 implementations were intended to do this
    spilling and refilling in the background in hardware, but all I heard
    is that it did not work, and that the interrupt approach continued to
    be used.

    This is interesting. Were they not able to accomplish in a
    c-compiler? What about a Forth compiler?

    - anton

    Groetjes Albert
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Rick C on Fri Jul 22 10:52:42 2022
    Rick C <gnuarm.deletethisbit@gmail.com> writes:
    Forth is an easy thing to write. You either use the stack architecture of = >the stack CPU or you emulate a virtual stack machine. It seems to me that = >to optimize a stack processor which has stack addressing, it would require = >some of the same features found in compilers for register based machines. = >My understanding is this would be significantly more work.=20

    Yes, unless you leave that job to the programmer. At least with this
    kind of architecture the programmer can express this by using DUP OVER
    THIRD PICK instead of SWAP ROT TUCK.

    I think that a Forth compiler for such an architecture could be as=20
    simple as optimizing sequences like OVER + into a single instruction,=20
    and programmers could then design their code to prefer DUP, OVER,=20
    THIRD and the like over SWAP, ROT, TUCK, and the like.=20

    I don't think it is that simple. I found that, for all practical purposes,=
    every stack manipulation could be optimized away. But it requires careful= ordering of operands, even more so than with traditional Forth.

    Yes, you have an even smaller set of operations for arranging data.
    Nobody said it was simple. Already with traditional Forth many
    programmers don't want to go to the necessary lengths, and most
    switched to other languages over time, and of the rest many use
    locals, more or less frequently, much to the outrage of Forth purists.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to albert@cherry. on Fri Jul 22 10:49:49 2022
    albert@cherry.(none) (albert) writes:
    In article <2022Jul21.161403@mips.complang.tuwien.ac.at>,
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Paul Rubin <no.email@nospam.invalid> writes:
    Some cpus like the SPARC had "register windows"
    which basically meant the register file was aliased to the top N stack >>>slots.

    No. Registers on SPARC (and rotating register files on AMD29K, and
    IA-64) are not addressable as memory. If you use more register
    windows than the hardware has registers, the contents of the most
    remote windows are stored to memory in an interrupt routine; if you
    then pop windows back until you reach one that is no longer in the
    register file, another interrupt loads the contents from memory into
    the register window.

    Similarly on the AMD29K, and AFAIK in practice on IA-64
    implementations; IA-64 implementations were intended to do this
    spilling and refilling in the background in hardware, but all I heard
    is that it did not work, and that the interrupt approach continued to
    be used.

    This is interesting. Were they not able to accomplish in a
    c-compiler? What about a Forth compiler?

    Accomplish what?

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to Anton Ertl on Fri Jul 22 10:27:07 2022
    On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:
    Rick C <gnuarm.del...@gmail.com> writes:
    Forth is an easy thing to write. You either use the stack architecture of = >the stack CPU or you emulate a virtual stack machine. It seems to me that = >to optimize a stack processor which has stack addressing, it would require =
    some of the same features found in compilers for register based machines. = >My understanding is this would be significantly more work.=20

    Yes, unless you leave that job to the programmer. At least with this
    kind of architecture the programmer can express this by using DUP OVER
    THIRD PICK instead of SWAP ROT TUCK.

    So you are suggesting a developer can program this CPU in Forth by using particular idioms?


    I think that a Forth compiler for such an architecture could be as=20
    simple as optimizing sequences like OVER + into a single instruction,=20 >> and programmers could then design their code to prefer DUP, OVER,=20
    THIRD and the like over SWAP, ROT, TUCK, and the like.=20

    I don't think it is that simple. I found that, for all practical purposes,=
    every stack manipulation could be optimized away. But it requires careful= ordering of operands, even more so than with traditional Forth.
    Yes, you have an even smaller set of operations for arranging data.
    Nobody said it was simple.

    That's my point. It's not simple. Ideally, the programmer would not need to be aware of the details of the CPU involved. That's what optimizing compilers do. The only hand work is in the very rare cases of needing to further optimize, but now it
    becomes a game of trial and error to find particular code sequences that produce particular assembly that is not obviously opimized for a particular CPU variant. I think it was in this group where people explored this finding that both CPUs and
    compilers evolve, resulting in very unpredictable results for any particular combination.

    I'd be happy with a Forth compiler that would easily produce approximately optimal code for a given instruction set, but that is beyond my experience and knowledge.


    Already with traditional Forth many
    programmers don't want to go to the necessary lengths, and most
    switched to other languages over time, and of the rest many use
    locals, more or less frequently, much to the outrage of Forth purists.

    Mostly, programming in Forth doesn't take much work as optimization is seldom required.

    One question I have about locals... in C and other languages, they are on the stack, so are only valid when that portion of code is "in scope". My understanding is Forth does not allocate locals on the data stack. Is that right? Is there a locals
    stack or is it dedicated storage, like an otherwise declared variable?

    --

    Rick C.

    +- Get 1,000 miles of free Supercharging
    +- Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Rick C on Fri Jul 22 20:50:33 2022
    Rick C <gnuarm.deletethisbit@gmail.com> writes:
    On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:
    Yes, unless you leave that job to the programmer. At least with this=20
    kind of architecture the programmer can express this by using DUP OVER=20
    THIRD PICK instead of SWAP ROT TUCK.=20

    So you are suggesting a developer can program this CPU in Forth by using pa= >rticular idioms? =20

    Yes, but my point was that the programmer can make the compiler's job
    easy by programming such an architecture with Forth and preferring DUP
    OVER THIRD PICK, while with a register architecture the programmer has
    little chance to help the compiler.

    Ideally, the programmer would not need =
    to be aware of the details of the CPU involved.

    That's not an ideal of every Forther, nor for other programming
    languages.


    That's what optimizing com=
    pilers do.

    Sometimes, unreliably.

    The only hand work is in the very rare cases of needing to furt=
    her optimize, but now it becomes a game of trial and error to find particul= >ar code sequences that produce particular assembly that is not obviously op= >imized for a particular CPU variant. I think it was in this group where pe= >ople explored this finding that both CPUs and compilers evolve, resulting i= >n very unpredictable results for any particular combination.=20

    I'd be happy with a Forth compiler that would easily produce approximately = >optimal code for a given instruction set, but that is beyond my experience = >and knowledge.=20


    One question I have about locals... in C and other languages, they are on t= >he stack,

    More typically in registers.

    so are only valid when that portion of code is "in scope". My un= >derstanding is Forth does not allocate locals on the data stack. Is that = >right?

    Yes.

    Is there a locals stack or is it dedicated storage, like an otherwi=
    se declared variable?=20

    Gforth uses a locals stack. Most others use the return stack. Some
    people propose using ordinary variables with beheading, but that does
    not work reentrantly.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rick C@21:1/5 to Anton Ertl on Fri Jul 22 16:51:52 2022
    On Friday, July 22, 2022 at 5:00:23 PM UTC-4, Anton Ertl wrote:
    Rick C <gnuarm.del...@gmail.com> writes:
    On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:
    Yes, unless you leave that job to the programmer. At least with this=20 >> kind of architecture the programmer can express this by using DUP OVER=20 >> THIRD PICK instead of SWAP ROT TUCK.=20

    So you are suggesting a developer can program this CPU in Forth by using pa=
    rticular idioms? =20

    Yes, but my point was that the programmer can make the compiler's job
    easy by programming such an architecture with Forth and preferring DUP
    OVER THIRD PICK, while with a register architecture the programmer has little chance to help the compiler.

    I suppose using idioms has the advantage of retaining portability. Otherwise they seem verbose and counter productive. Maybe this would be a better discussion to have some source code, but I'm feeling a bit lazy about this at the moment. It has been
    some years since I worked on this, but it is available if I dig up the file.


    Ideally, the programmer would not need =
    to be aware of the details of the CPU involved.
    That's not an ideal of every Forther, nor for other programming
    languages.

    I don't really care what various "Forthers" may or may not think. I'm pretty sure it is a major goal of all the most commonly used languages. Why would you say otherwise. Are you thinking of various quirky languages?


    That's what optimizing com=
    pilers do.

    Sometimes, unreliably.

    You are saying programs have bugs? Yes, I've heard that!


    The only hand work is in the very rare cases of needing to furt=
    her optimize, but now it becomes a game of trial and error to find particul=
    ar code sequences that produce particular assembly that is not obviously op=
    imized for a particular CPU variant. I think it was in this group where pe= >ople explored this finding that both CPUs and compilers evolve, resulting i=
    n very unpredictable results for any particular combination.=20

    I'd be happy with a Forth compiler that would easily produce approximately =
    optimal code for a given instruction set, but that is beyond my experience =
    and knowledge.=20


    One question I have about locals... in C and other languages, they are on t=
    he stack,

    More typically in registers.

    so are only valid when that portion of code is "in scope". My un= >derstanding is Forth does not allocate locals on the data stack. Is that = >right?

    Yes.

    Is there a locals stack or is it dedicated storage, like an otherwi=
    se declared variable?=20

    Gforth uses a locals stack. Most others use the return stack. Some
    people propose using ordinary variables with beheading, but that does
    not work reentrantly.

    Of course, the return stack makes perfect sense. I should have seen that. If I recall, because I wanted to keep the instruction size down, I limited the offset field to two or three bits, so probably not enough space to reliably implement locals on the
    data stack. Any stack without addressability would be hard to use as local storage.

    --

    Rick C.

    ++ Get 1,000 miles of free Supercharging
    ++ Tesla referral code - https://ts.la/richard11209

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to All on Fri Jul 22 19:11:14 2022
    I forgot, maybe it would be useful for the standards community
    to develop optimising sections for compilers to use on various
    architectures, and as a starting points on some other theoretical
    types? Then language developers can just start with those
    routines in their compilers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to gnuarm.del...@gmail.com on Fri Jul 22 19:03:41 2022
    On Saturday, July 23, 2022 at 3:27:09 AM UTC+10, gnuarm.del...@gmail.com wrote:
    On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:
    Rick C <gnuarm.del...@gmail.com> writes:
    Forth is an easy thing to write. You either use the stack architecture of =
    the stack CPU or you emulate a virtual stack machine. It seems to me that =
    to optimize a stack processor which has stack addressing, it would require =
    some of the same features found in compilers for register based machines. =
    My understanding is this would be significantly more work.=20

    Yes, unless you leave that job to the programmer. At least with this
    kind of architecture the programmer can express this by using DUP OVER THIRD PICK instead of SWAP ROT TUCK.
    So you are suggesting a developer can program this CPU in Forth by using particular idioms?
    I think that a Forth compiler for such an architecture could be as=20 >> simple as optimizing sequences like OVER + into a single instruction,=20
    and programmers could then design their code to prefer DUP, OVER,=20
    THIRD and the like over SWAP, ROT, TUCK, and the like.=20


    So, this is a design meant for others to develop on rather
    then just yourself as the developer.

    One of the issues, I notice, is you often give far too little
    information, to deeply and quickly address things, unlike
    some others.


    I don't think it is that simple. I found that, for all practical purposes,=
    every stack manipulation could be optimized away. But it requires careful=
    ordering of operands, even more so than with traditional Forth.
    Yes, you have an even smaller set of operations for arranging data.
    Nobody said it was simple.
    That's my point. It's not simple. Ideally, the programmer would not need to be aware of the details of the CPU involved. That's what optimizing compilers do. The only hand work is in the very rare cases of needing to further optimize, but now it
    becomes a game of trial and error to find particular code sequences that produce particular assembly that is not obviously opimized for a particular CPU variant. I think it was in this group where people explored this finding that both CPUs and compilers
    evolve, resulting in very unpredictable results for any particular combination.

    I'd be happy with a Forth compiler that would easily produce approximately optimal code for a given instruction set, but that is beyond my experience and knowledge.
    Already with traditional Forth many
    programmers don't want to go to the necessary lengths, and most
    switched to other languages over time, and of the rest many use
    locals, more or less frequently, much to the outrage of Forth purists.
    Mostly, programming in Forth doesn't take much work as optimization is seldom required.

    One question I have about locals... in C and other languages, they are on the stack, so are only valid when that portion of code is "in scope". My understanding is Forth does not allocate locals on the data stack. Is that right? Is there a locals stack
    or is it dedicated storage, like an otherwise declared variable?


    If you want code optimised compiler, I would suggest
    talking to Steven or Elisabeth, about a version of their
    compilers that optimise for your design instead. Maybe
    Stephen could help suggest optimised architecture
    festures to accommodate this too. You ussually do
    military contracts, so factoring in a commercial
    optimising compiler would be worth looking at, as aslong as
    the military approves an improved performance based
    amendment to a contract, they can afford to do this.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to gnuarm.del...@gmail.com on Fri Jul 22 18:40:19 2022
    On Saturday, July 23, 2022 at 9:51:53 AM UTC+10, gnuarm.del...@gmail.com wrote:
    On Friday, July 22, 2022 at 5:00:23 PM UTC-4, Anton Ertl wrote:
    Rick C <gnuarm.del...@gmail.com> writes:
    On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:
    Yes, unless you leave that job to the programmer. At least with this=20 >> kind of architecture the programmer can express this by using DUP OVER=20
    THIRD PICK instead of SWAP ROT TUCK.=20

    So you are suggesting a developer can program this CPU in Forth by using pa=
    rticular idioms? =20

    Yes, but my point was that the programmer can make the compiler's job
    easy by programming such an architecture with Forth and preferring DUP OVER THIRD PICK, while with a register architecture the programmer has little chance to help the compiler.
    I suppose using idioms has the advantage of retaining portability. Otherwise they seem verbose and counter productive. Maybe this would be a better discussion to have some source code, but I'm feeling a bit lazy about this at the moment. It has been
    some years since I worked on this, but it is available if I dig up the file.
    Ideally, the programmer would not need =
    to be aware of the details of the CPU involved.
    That's not an ideal of every Forther, nor for other programming
    languages.
    I don't really care what various "Forthers" may or may not think. I'm pretty sure it is a major goal of all the most commonly used languages. Why would you say otherwise. Are you thinking of various quirky languages?
    That's what optimizing com=
    pilers do.

    Sometimes, unreliably.
    You are saying programs have bugs? Yes, I've heard that!
    The only hand work is in the very rare cases of needing to furt=
    her optimize, but now it becomes a game of trial and error to find particul=
    ar code sequences that produce particular assembly that is not obviously op=
    imized for a particular CPU variant. I think it was in this group where pe=
    ople explored this finding that both CPUs and compilers evolve, resulting i=
    n very unpredictable results for any particular combination.=20

    I'd be happy with a Forth compiler that would easily produce approximately =
    optimal code for a given instruction set, but that is beyond my experience =
    and knowledge.=20


    One question I have about locals... in C and other languages, they are on t=
    he stack,

    More typically in registers.

    so are only valid when that portion of code is "in scope". My un= >derstanding is Forth does not allocate locals on the data stack. Is that =
    right?

    Yes.

    Is there a locals stack or is it dedicated storage, like an otherwi=
    se declared variable?=20

    Gforth uses a locals stack. Most others use the return stack. Some
    people propose using ordinary variables with beheading, but that does
    not work reentrantly.
    Of course, the return stack makes perfect sense. I should have seen that. If I recall, because I wanted to keep the instruction size down, I limited the offset field to two or three bits, so probably not enough space to reliably implement locals on the
    data stack. Any stack without addressability would be hard to use as local storage.

    --

    Rick C.

    ++ Get 1,000 miles of free Supercharging
    ++ Tesla referral code - https://ts.la/richard11209

    In the end, what I decided many years back designing
    ISA's, is eventually the stack is going get too big
    (depending on what you want to do) and you would need to
    cache it (the stack used in place of a cache). I think
    silicon composers or somebody had a simplified
    scheme like this. But if you the silicon understood their
    was a range for normal and a range to keep in cache
    memory. So, here, the stack is split to become a third
    locals stack as well. A register file, or more
    conventional C like stack you might say But let's split that
    sideways, and simplify the design. Let's say there is a
    pointer register that comes with a definition or task
    context etc, that is passed on the stack and loaded into the
    register, then instructions can reference smaller offset fields
    to access those values, offsetted by the bale in the
    associative memory register. I forget the range of solutions
    I came up with to select an optimal implementation here,
    as the information to too obscure. However, my
    designs were designed to include a range of advanced
    operating system functions, so it was useful to pass
    advanced information in as a range of values, order for
    maximum code efficiency, where the programmer
    presents the values in the API definition order, which
    optimises much of the processing (as much of the
    time is spent in API's in such an system). Spent 20-30
    years on this stuff. It's a bit obscure, but really about
    picking which architecture best suits the situation. Most
    of my stuff is to suit virtual machines, so violated Chuck's
    self adjusting code support (but I actually did come up
    with a self adjusting solution eventually, I think) so I could
    afford to look at extra stuff without that. I forget exactly
    why that was important to mention.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Rick C on Sat Jul 23 12:35:48 2022
    Rick C <gnuarm.deletethisbit@gmail.com> writes:
    On Friday, July 22, 2022 at 5:00:23 PM UTC-4, Anton Ertl wrote:
    Yes, but my point was that the programmer can make the compiler's job=20
    easy by programming such an architecture with Forth and preferring DUP=20
    OVER THIRD PICK, while with a register architecture the programmer has=20
    little chance to help the compiler.=20

    I suppose using idioms has the advantage of retaining portability. Otherwi= >se they seem verbose and counter productive.

    Compared to what?

    Maybe this would be a better =
    discussion to have some source code, but I'm feeling a bit lazy about this = >at the moment. It has been some years since I worked on this, but it is av= >ailable if I dig up the file. =20

    That would certainly help advance the discussion.

    Ideally, the programmer would not need =3D
    to be aware of the details of the CPU involved.
    That's not an ideal of every Forther, nor for other programming=20
    languages.=20

    I don't really care what various "Forthers" may or may not think. I'm pret= >ty sure it is a major goal of all the most commonly used languages. Why wo= >uld you say otherwise. Are you thinking of various quirky languages? =20

    It certainly is a goal of portable programming languages. But they
    don't completely achieve this goal, especially when it comes to
    performance, and that's why programmers often have to be aware of the
    CPU. And in the case of Forth there's the teachings of Chuck Moore,
    which include shifting work from the compiler to the programmer. So
    if you follow these teachings, you will prefer a simple compiler, and
    accept the cost of having to write the code for it (if you want
    performance).

    That's what optimizing com=3D=20
    pilers do.=20
    =20
    Sometimes, unreliably.=20

    You are saying programs have bugs? Yes, I've heard that!=20

    Not correctness bugs in this case. It's just that if you rely on
    optimizing compilers, you are occasionally going to suffer an
    unpleasant surprise (at if you look at the result and know enough to
    have an expectation). And these unpleasant surprises are usually not
    bugs.

    Is there a locals stack or is it dedicated storage, like an otherwi=3D= >=20
    se declared variable?=3D20=20
    =20
    Gforth uses a locals stack. Most others use the return stack. Some=20
    people propose using ordinary variables with beheading, but that does=20
    not work reentrantly.

    Of course, the return stack makes perfect sense. I should have seen that. =
    If I recall, because I wanted to keep the instruction size down, I limited= the offset field to two or three bits, so probably not enough space to rel=
    iably implement locals on the data stack.

    3 bits should be enough for most uses, even if you use the data stack.

    If you want to use the data stack, Tevet's work [tevet89] may be of
    interest to you.

    @string(jfar="Journal of Forth Application and Research")
    @Article{tevet89,
    author = "Adin Tevet",
    title = "Symbolic Stack Addressing",
    journal = jfar,
    year = "1989",
    volume = "5",
    number = "3",
    pages = "365--379",
    url = "http://soton.mpeforth.com/flag/jfar/vol5/no3/article2.pdf",
    annote = "A local variable mechanism that uses the data stack
    for storage. The variables are accessed by {\tt PICK}
    and {\tt POST} (its opposite), which means that the
    compiler must keep track of the stack depth. Includes
    source code for 8086 F83."
    }

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to gnuarm.del...@gmail.com on Sat Jul 23 14:42:35 2022
    gnuarm.del...@gmail.com schrieb am Samstag, 23. Juli 2022 um 01:51:53 UTC+2:
    Of course, the return stack makes perfect sense. I should have seen that. If I recall, because I wanted to keep the instruction size down, I limited the offset field to two or three bits, so probably not enough space to reliably implement locals on the
    data stack. Any stack without addressability would be hard to use as local storage.

    There is also another solution: move locals from the data stack
    to the other end of the data stack segment, so that you have
    a (pseudo) locals stack in the data stack memory segment but
    not within the data stack itself, nor in the return stack.

    One stack grows downwards while the other stack grows upwards,
    so that their "top-of-stacks" face each other. When the (pseudo)
    locals stack grows, the data stack shrinks. 3 bits are sufficient to
    address (offset to) 8 locals.

    Advantage: no return stack cluttering or complicated locals frame
    addressing schemes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to Rick C on Sat Jul 23 18:35:29 2022
    Rick C <gnuarm.deletethisbit@gmail.com> writes:
    I limited the offset field to two or three bits, so probably not
    enough space to reliably implement locals on the data stack. Any
    stack without addressability would be hard to use as local storage.

    The main obstacle to using the data stack for locals is that it's hard
    for the compiler to know the exact state of the data stack all the time.
    You could have a conditional DROP and the compiler wouldn't know the
    stack picture afterwards.

    C compilers traditionally reserve a register to use as a frame pointer,
    so locals would be addressed as offsets from the frame pointer. You can
    get by without it, maybe resulting in slightly tighter code, but it
    complicates the compiler and makes debugging a lot harder, since you can
    no longer easily figure out the call stack by examining memory after a
    crash.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to Paul Rubin on Sun Jul 24 00:23:56 2022
    On Sunday, July 24, 2022 at 11:35:33 AM UTC+10, Paul Rubin wrote:
    Rick C <gnuarm.del...@gmail.com> writes:
    I limited the offset field to two or three bits, so probably not
    enough space to reliably implement locals on the data stack. Any
    stack without addressability would be hard to use as local storage.
    The main obstacle to using the data stack for locals is that it's hard
    for the compiler to know the exact state of the data stack all the time.
    You could have a conditional DROP and the compiler wouldn't know the
    stack picture afterwards.

    C compilers traditionally reserve a register to use as a frame pointer,
    so locals would be addressed as offsets from the frame pointer. You can
    get by without it, maybe resulting in slightly tighter code, but it complicates the compiler and makes debugging a lot harder, since you can
    no longer easily figure out the call stack by examining memory after a
    crash.

    Exactly. You can use a register to store the current context lo also area (has
    to be passed with return be or have a return stack for it). The area can be a
    buffer area that can act as a stack with the register as an index register.

    There are a lot of ways it can be designed, it depends on wherever it's a general
    purpose processor, or task specific, if one programmer does it, or future
    programmers will do it. Your application space includes knowing the work space
    The person who understands how to program it correctly being the only one,
    can optimise it. If others are going change and maintain it, not of that level
    (presumably) then an optimising compiler makes sense.

    But, optimisation. I don't kow how they optimise it. But, let's say, for each
    sequence, you lost the best substitute sequence. This means you have to
    reduce things to an intermediate code to get rid to state local dependencies that
    need to be known for the compiler to pick the best substitute. If the compiler is
    really good, it can see that code in a call is not efficient, affecting optimisation
    around the call, and rearrange that and suggest optimisation to change it too. But,
    a really good optimising compiler should photo.ize left right, up down and the data,
    and the logic which transcends time (3D and 4D). Basically. The compiler needs
    to do advanced factoring. So here Rick, you go for 2D or 3D to simplify compiler
    construction.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to minf...@arcor.de on Sun Jul 24 14:06:57 2022
    In article <3b1e09c5-21f3-4e80-b14b-1f6e605adc98n@googlegroups.com>, minf...@arcor.de <minforth@arcor.de> wrote:
    gnuarm.del...@gmail.com schrieb am Samstag, 23. Juli 2022 um 01:51:53 UTC+2: >> Of course, the return stack makes perfect sense. I should have seen
    that. If I recall, because I wanted to keep the instruction size down, I >limited the offset field to two or three bits, so probably not enough
    space to reliably implement locals on the data stack. Any stack without >addressability would be hard to use as local storage.

    There is also another solution: move locals from the data stack
    to the other end of the data stack segment, so that you have
    a (pseudo) locals stack in the data stack memory segment but
    not within the data stack itself, nor in the return stack.

    One stack grows downwards while the other stack grows upwards,
    so that their "top-of-stacks" face each other. When the (pseudo)
    locals stack grows, the data stack shrinks. 3 bits are sufficient to
    address (offset to) 8 locals.

    Advantage: no return stack cluttering or complicated locals frame
    addressing schemes.

    Brilliant. That is called a separate locals stack.
    Brilliant, but of course not original.

    Groetjes
    Albert
    --
    "in our communism country Viet Nam, people are forced to be
    alive and in the western country like US, people are free to
    die from Covid 19 lol" duc ha
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Wayne morellini@21:1/5 to none albert on Sun Jul 24 08:38:12 2022
    On Sunday, July 24, 2022 at 10:07:00 PM UTC+10, none albert wrote:
    In article <3b1e09c5-21f3-4e80...@googlegroups.com>,
    minf...@arcor.de <minf...@arcor.de> wrote:
    gnuarm.del...@gmail.com schrieb am Samstag, 23. Juli 2022 um 01:51:53 UTC+2: >> Of course, the return stack makes perfect sense. I should have seen
    that. If I recall, because I wanted to keep the instruction size down, I >limited the offset field to two or three bits, so probably not enough
    space to reliably implement locals on the data stack. Any stack without >addressability would be hard to use as local storage.

    There is also another solution: move locals from the data stack
    to the other end of the data stack segment, so that you have
    a (pseudo) locals stack in the data stack memory segment but
    not within the data stack itself, nor in the return stack.

    One stack grows downwards while the other stack grows upwards,
    so that their "top-of-stacks" face each other. When the (pseudo)
    locals stack grows, the data stack shrinks. 3 bits are sufficient to >address (offset to) 8 locals.

    Advantage: no return stack cluttering or complicated locals frame >addressing schemes.
    Brilliant. That is called a separate locals stack.
    Brilliant, but of course not original.
    Groetjes
    Albert

    Spoken as one who comes up with these things without reading them?

    It's not worth criticising somebody for coming up with something descent without reading about it first. If he had independent good creativity, good
    on him. I had to learn a lot of skills on top of the ability.to make better results myself. On the decline, I can't define the complete nature of that, but a
    shadow of understanding is still there.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to none albert on Sun Jul 24 14:05:01 2022
    On Sunday, July 24, 2022 at 2:07:00 PM UTC+2, none albert wrote:
    In article <3b1e09c5-21f3-4e80...@googlegroups.com>,
    minf...@arcor.de <minf...@arcor.de> wrote:
    gnuarm.del...@gmail.com schrieb am Samstag, 23. Juli 2022 um 01:51:53 UTC+2: >> Of course, the return stack makes perfect sense. I should have seen
    that. If I recall, because I wanted to keep the instruction size down, I >limited the offset field to two or three bits, so probably not enough
    space to reliably implement locals on the data stack. Any stack without >addressability would be hard to use as local storage.

    There is also another solution: move locals from the data stack
    to the other end of the data stack segment, so that you have
    a (pseudo) locals stack in the data stack memory segment but
    not within the data stack itself, nor in the return stack.

    One stack grows downwards while the other stack grows upwards,
    so that their "top-of-stacks" face each other. When the (pseudo)
    locals stack grows, the data stack shrinks. 3 bits are sufficient to >address (offset to) 8 locals.

    Advantage: no return stack cluttering or complicated locals frame >addressing schemes.
    Brilliant. That is called a separate locals stack.
    Brilliant, but of course not original.

    We have that in 32-bit iForth and IIRC, you were the one
    proposing it (sorry I forget the details).

    It worked ok, but a separate stack with its own SP is
    much more elegant (i.e., simpler and easier to
    understand). Modern CPUs have enough registers.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)