Forum: >>> Magnum BBS <<<

Forth for a Addressible Stack CPU

From Rick C@21:1/5 to All on Wed Jul 20 13:14:34 2022

The last stack processor design I worked on had instructions for addressing operands on the stack rather than just the top data items. It didn't reach very deep, but the difference was noticeable in code density. A sample program (interrupt routine for
managing the data for a Numerically Controlled Oscillator - NCO) was reduced in size by a third, by eliminating stack manipulations.

But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would take a fair amount of work to design something like a compiler for a more typical register based CPU.

I may resurrect work on the design. Even if it is not supported with a Forth tool, it can be programmed in assembly which is not so much different from Forth, other than the addressability, which can be ignored for non-optimized code.

--

Rick C.

- Get 1,000 miles of free Supercharging
- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Clive Arthur@21:1/5 to Rick C on Wed Jul 20 22:50:07 2022

On 20/07/2022 21:14, Rick C wrote:

The last stack processor design I worked on had instructions for addressing operands on the stack rather than just the top data items. It didn't reach very deep, but the difference was noticeable in code density. A sample program (interrupt routine

for managing the data for a Numerically Controlled Oscillator - NCO) was reduced in size by a third, by eliminating stack manipulations.

But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would take a fair amount of work to design something like a compiler for a more typical register based CPU.

I may resurrect work on the design. Even if it is not supported with a Forth tool, it can be programmed in assembly which is not so much different from Forth, other than the addressability, which can be ignored for non-optimized code.

The Patriot Scientific PTSC1000 had single cycle instructions for
fetching and storing return stack items as well as dropping them. - IIRC
the top 15. This meant that you could use >r to push locals and then
r1@, r5@, r3! etc (my names) to manipulate and use them then say r8drop
to tidy up.

I found it very useful, but sadly the PTSC1000 was discontinued. It was
based on Chuck's ShBoom processor, so maybe he saw the utility too.

--
Cheers
Clive

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Rick C on Wed Jul 20 18:13:42 2022

Rick C <gnuarm.deletethisbit@gmail.com> writes:

The last stack processor design I worked on had instructions for
addressing operands on the stack rather than just the top data items.
It didn't reach very deep, but the difference was noticeable in code
density.

Congratulations, you have re-invented locals ;)

But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would take
a fair amount of work to design something like a compiler for a more
typical register based CPU.

Idk about Forth compilers but it is a usual thing for C compilers,
particularly for machines with not many registers, so they have to put
locals in the stack. Some cpus like the SPARC had "register windows"
which basically meant the register file was aliased to the top N stack
slots. That got rid of the overhead of saving and restoring registers
on subroutine call and return. I guess it had other costs since the
scheme hasn't been popular. I've never tried to program it at low
level.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Wayne morellini@21:1/5 to Paul Rubin on Thu Jul 21 03:46:44 2022

On Thursday, July 21, 2022 at 11:13:44 AM UTC+10, Paul Rubin wrote:

Rick C <gnuarm.del...@gmail.com> writes:

The last stack processor design I worked on had instructions for
addressing operands on the stack rather than just the top data items.
It didn't reach very deep, but the difference was noticeable in code density.

Congratulations, you have re-invented locals ;)

But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would take
a fair amount of work to design something like a compiler for a more typical register based CPU.

Idk about Forth compilers but it is a usual thing for C compilers, particularly for machines with not many registers, so they have to put

locals in the stack. Some cpus like the SPARC had "register windows"
which basically meant the register file was aliased to the top N stack
slots. That got rid of the overhead of saving and restoring registers
on subroutine call and return. I guess it had other costs since the
scheme hasn't been popular. I've never tried to program it at low
level.

Well, as it is presumably a custom design, with no changes in processor,
he could just make forth words to do the operation, and project segment
them in a lost of words which have to be re-written if the processor
changes. The original design would just use the assembly for that word.
I don't see what the problem is. But, I'm presuming it's a forth processor, and not another some stack derivative. Still, would like to inspect how
good this ISA is.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Rick C on Thu Jul 21 14:02:01 2022

Rick C <gnuarm.deletethisbit@gmail.com> writes:

The last stack processor design I worked on had instructions for addressing=
operands on the stack rather than just the top data items. It didn't reac=
h very deep, but the difference was noticeable in code density. A sample p= >rogram (interrupt routine for managing the data for a Numerically Controlle= >d Oscillator - NCO) was reduced in size by a third, by eliminating stack m= >anipulations. =20

But I never saw a way to write a Forth compiler that would be able to optim= >ize the code for such an architecture. I suppose that would take a fair am= >ount of work to design something like a compiler for a more typical registe= >r based CPU.

The last sentence is confusing. How is it related to the rest?

I think that a Forth compiler for such an architecture could be as
simple as optimizing sequences like OVER + into a single instruction,
and programmers could then design their code to prefer DUP, OVER,
THIRD and the like over SWAP, ROT, TUCK, and the like.

If you are more ambitious, I think you can do something with similar
complexity of an analytic compiler for a register machine. I have no
idea how much more that would buy.

BTW, such an architecture is widely available: The 387. Looking at
what iForth, lxf, and VFX 4.72 produce for

: foo fover f+ ;

they don't make use of this optimization opportunity, but all generate
stuff like:

( 080C6C70 D9C1 ) FLD ST(1)
( 080C6C72 DEC1 ) FADDP ST(1), ST
( 080C6C74 C3 ) NEXT,
( 5 bytes, 3 instructions )

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to Wayne morellini on Thu Jul 21 06:33:33 2022

On Thursday, July 21, 2022 at 6:46:45 AM UTC-4, Wayne morellini wrote:

On Thursday, July 21, 2022 at 11:13:44 AM UTC+10, Paul Rubin wrote:

Rick C <gnuarm.del...@gmail.com> writes:

The last stack processor design I worked on had instructions for addressing operands on the stack rather than just the top data items.
It didn't reach very deep, but the difference was noticeable in code density.

Congratulations, you have re-invented locals ;)

But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would take
a fair amount of work to design something like a compiler for a more typical register based CPU.

Idk about Forth compilers but it is a usual thing for C compilers, particularly for machines with not many registers, so they have to put

locals in the stack. Some cpus like the SPARC had "register windows"
which basically meant the register file was aliased to the top N stack slots. That got rid of the overhead of saving and restoring registers
on subroutine call and return. I guess it had other costs since the
scheme hasn't been popular. I've never tried to program it at low
level.

Well, as it is presumably a custom design, with no changes in processor,
he could just make forth words to do the operation, and project segment
them in a lost of words which have to be re-written if the processor
changes. The original design would just use the assembly for that word.
I don't see what the problem is. But, I'm presuming it's a forth processor, and not another some stack derivative. Still, would like to inspect how
good this ISA is.

What is a "Forth processor" as distinct from a stack processor?

--

Rick C.

+ Get 1,000 miles of free Supercharging
+ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Paul Rubin on Thu Jul 21 14:14:03 2022

Paul Rubin <no.email@nospam.invalid> writes:

Some cpus like the SPARC had "register windows"
which basically meant the register file was aliased to the top N stack
slots.

No. Registers on SPARC (and rotating register files on AMD29K, and
IA-64) are not addressable as memory. If you use more register
windows than the hardware has registers, the contents of the most
remote windows are stored to memory in an interrupt routine; if you
then pop windows back until you reach one that is no longer in the
register file, another interrupt loads the contents from memory into
the register window.

Similarly on the AMD29K, and AFAIK in practice on IA-64
implementations; IA-64 implementations were intended to do this
spilling and refilling in the background in hardware, but all I heard
is that it did not work, and that the interrupt approach continued to
be used.

The Burroughs B5500 had a stack where an entry also has a memory
address. Similarly, the AT&T CRISP/Hobbit also used an adressable
stack. For performance in modern times (already in the CRISP days,
less so in the B5500 days) this is a bad idea, because it makes
pipelining harder and you have to check against a memory alias all the
time, and deal with it when it occurs. There is a reason why Berkeley
RISC, its child SPARC, the AMD29K and its descendant IA-64 did not go
there.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Wayne morellini@21:1/5 to Anton Ertl on Thu Jul 21 07:54:34 2022

On Friday, July 22, 2022 at 12:13:55 AM UTC+10, Anton Ertl wrote:

Rick C <gnuarm.del...@gmail.com> writes:

The last stack processor design I worked on had instructions for addressing=
operands on the stack rather than just the top data items. It didn't reac=
h very deep, but the difference was noticeable in code density. A sample p= >rogram (interrupt routine for managing the data for a Numerically Controlle= >d Oscillator - NCO) was reduced in size by a third, by eliminating stack m= >anipulations. =20

But I never saw a way to write a Forth compiler that would be able to optim= >ize the code for such an architecture. I suppose that would take a fair am= >ount of work to design something like a compiler for a more typical registe= >r based CPU.

The last sentence is confusing. How is it related to the rest?

I think that a Forth compiler for such an architecture could be as
simple as optimizing sequences like OVER + into a single instruction,
and programmers could then design their code to prefer DUP, OVER,
THIRD and the like over SWAP, ROT, TUCK, and the like.

If you are more ambitious, I think you can do something with similar complexity of an analytic compiler for a register machine. I have no
idea how much more that would buy.

BTW, such an architecture is widely available: The 387. Looking at
what iForth, lxf, and VFX 4.72 produce for

: foo fover f+ ;

they don't make use of this optimization opportunity, but all generate
stuff like:

( 080C6C70 D9C1 ) FLD ST(1)
( 080C6C72 DEC1 ) FADDP ST(1), ST
( 080C6C74 C3 ) NEXT,
( 5 bytes, 3 instructions )

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

I see, it's to run existing libraries (of which?).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Rick C on Thu Jul 21 09:43:07 2022

Rick C <gnuarm.deletethisbit@gmail.com> writes:

What is a "Forth processor" as distinct from a stack processor?

Forth processors traditionally have two stacks, I think. I don't know
if non-Forth "stack processors" have that.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Wayne morellini@21:1/5 to Paul Rubin on Thu Jul 21 19:24:21 2022

On Friday, July 22, 2022 at 2:43:23 AM UTC+10, Paul Rubin wrote:

Rick C <gnuarm.del...@gmail.com> writes:

What is a "Forth processor" as distinct from a stack processor?

Forth processors traditionally have two stacks, I think. I don't know
if non-Forth "stack processors" have that.

I guessed we are not talking about a C machine here. There is more
than forth, and if I added a few instructions to the 6502 instruction
set, to handle data on stacks, I could then program it with interpreted
forth and use those new machine instructions, fur data processing,
without it being a forth processor. So, still a bit obscure. The
definition has been avoided "what is ..forth processor" so, I guess it
isn't, it's something that uses stacks they can use forth. Well see the clarity.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to Anton Ertl on Thu Jul 21 22:05:01 2022

On Thursday, July 21, 2022 at 10:13:55 AM UTC-4, Anton Ertl wrote:

Rick C <gnuarm.del...@gmail.com> writes:

The last stack processor design I worked on had instructions for addressing=
operands on the stack rather than just the top data items. It didn't reac=
h very deep, but the difference was noticeable in code density. A sample p= >rogram (interrupt routine for managing the data for a Numerically Controlle=
d Oscillator - NCO) was reduced in size by a third, by eliminating stack m= >anipulations. =20

But I never saw a way to write a Forth compiler that would be able to optim=
ize the code for such an architecture. I suppose that would take a fair am= >ount of work to design something like a compiler for a more typical registe=
r based CPU.

The last sentence is confusing. How is it related to the rest?

Forth is an easy thing to write. You either use the stack architecture of the stack CPU or you emulate a virtual stack machine. It seems to me that to optimize a stack processor which has stack addressing, it would require some of the same features
found in compilers for register based machines. My understanding is this would be significantly more work.

I think that a Forth compiler for such an architecture could be as
simple as optimizing sequences like OVER + into a single instruction,
and programmers could then design their code to prefer DUP, OVER,
THIRD and the like over SWAP, ROT, TUCK, and the like.

I don't think it is that simple. I found that, for all practical purposes, every stack manipulation could be optimized away. But it requires careful ordering of operands, even more so than with traditional Forth. I had the code in front of me for a
standard stack architecture, and I found myself continually working backwards to optimize the sequencing of data and calculations. But then, I know little about writing optimizing compilers.

If you are more ambitious, I think you can do something with similar complexity of an analytic compiler for a register machine. I have no
idea how much more that would buy.

BTW, such an architecture is widely available: The 387. Looking at
what iForth, lxf, and VFX 4.72 produce for

: foo fover f+ ;

they don't make use of this optimization opportunity, but all generate
stuff like:

( 080C6C70 D9C1 ) FLD ST(1)
( 080C6C72 DEC1 ) FADDP ST(1), ST
( 080C6C74 C3 ) NEXT,
( 5 bytes, 3 instructions )

Ok

--

Rick C.

-- Get 1,000 miles of free Supercharging
-- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to Paul Rubin on Thu Jul 21 22:07:09 2022

On Thursday, July 21, 2022 at 12:43:23 PM UTC-4, Paul Rubin wrote:

Rick C <gnuarm.del...@gmail.com> writes:

What is a "Forth processor" as distinct from a stack processor?

Forth processors traditionally have two stacks, I think. I don't know
if non-Forth "stack processors" have that.

I suppose it depends on how you define, "non-Forth" stack processors.

--

Rick C.

-+ Get 1,000 miles of free Supercharging
-+ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Rick C on Fri Jul 22 00:33:29 2022

Rick C <gnuarm.deletethisbit@gmail.com> writes:

I suppose it depends on how you define, "non-Forth" stack processors.

The Burroughs B5500 etc. predated Forth for a while. Their main system programming language was Algol-60. Koopman talks about them in his book
"Stack Machines" but it's been a while since I read that book, so idr
what it said.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to Anton Ertl on Fri Jul 22 11:58:52 2022

In article <2022Jul21.161403@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

Paul Rubin <no.email@nospam.invalid> writes:

Some cpus like the SPARC had "register windows"
which basically meant the register file was aliased to the top N stack >>slots.

No. Registers on SPARC (and rotating register files on AMD29K, and
IA-64) are not addressable as memory. If you use more register
windows than the hardware has registers, the contents of the most
remote windows are stored to memory in an interrupt routine; if you
then pop windows back until you reach one that is no longer in the
register file, another interrupt loads the contents from memory into
the register window.

Similarly on the AMD29K, and AFAIK in practice on IA-64
implementations; IA-64 implementations were intended to do this
spilling and refilling in the background in hardware, but all I heard
is that it did not work, and that the interrupt approach continued to
be used.

This is interesting. Were they not able to accomplish in a
c-compiler? What about a Forth compiler?

- anton

Groetjes Albert
--
"in our communism country Viet Nam, people are forced to be
alive and in the western country like US, people are free to
die from Covid 19 lol" duc ha
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Rick C on Fri Jul 22 10:52:42 2022

Rick C <gnuarm.deletethisbit@gmail.com> writes:

Forth is an easy thing to write. You either use the stack architecture of = >the stack CPU or you emulate a virtual stack machine. It seems to me that = >to optimize a stack processor which has stack addressing, it would require = >some of the same features found in compilers for register based machines. = >My understanding is this would be significantly more work.=20

Yes, unless you leave that job to the programmer. At least with this
kind of architecture the programmer can express this by using DUP OVER
THIRD PICK instead of SWAP ROT TUCK.

I think that a Forth compiler for such an architecture could be as=20
simple as optimizing sequences like OVER + into a single instruction,=20
and programmers could then design their code to prefer DUP, OVER,=20
THIRD and the like over SWAP, ROT, TUCK, and the like.=20

I don't think it is that simple. I found that, for all practical purposes,=
every stack manipulation could be optimized away. But it requires careful= ordering of operands, even more so than with traditional Forth.

Yes, you have an even smaller set of operations for arranging data.
Nobody said it was simple. Already with traditional Forth many
programmers don't want to go to the necessary lengths, and most
switched to other languages over time, and of the rest many use
locals, more or less frequently, much to the outrage of Forth purists.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to albert@cherry. on Fri Jul 22 10:49:49 2022

albert@cherry.(none) (albert) writes:

In article <2022Jul21.161403@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

Paul Rubin <no.email@nospam.invalid> writes:

Some cpus like the SPARC had "register windows"
which basically meant the register file was aliased to the top N stack >>>slots.

No. Registers on SPARC (and rotating register files on AMD29K, and
IA-64) are not addressable as memory. If you use more register
windows than the hardware has registers, the contents of the most
remote windows are stored to memory in an interrupt routine; if you
then pop windows back until you reach one that is no longer in the
register file, another interrupt loads the contents from memory into
the register window.

Similarly on the AMD29K, and AFAIK in practice on IA-64
implementations; IA-64 implementations were intended to do this
spilling and refilling in the background in hardware, but all I heard
is that it did not work, and that the interrupt approach continued to
be used.

This is interesting. Were they not able to accomplish in a
c-compiler? What about a Forth compiler?

Accomplish what?

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to Anton Ertl on Fri Jul 22 10:27:07 2022

On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:

Rick C <gnuarm.del...@gmail.com> writes:

Forth is an easy thing to write. You either use the stack architecture of = >the stack CPU or you emulate a virtual stack machine. It seems to me that = >to optimize a stack processor which has stack addressing, it would require =
some of the same features found in compilers for register based machines. = >My understanding is this would be significantly more work.=20

Yes, unless you leave that job to the programmer. At least with this
kind of architecture the programmer can express this by using DUP OVER
THIRD PICK instead of SWAP ROT TUCK.

So you are suggesting a developer can program this CPU in Forth by using particular idioms?

I think that a Forth compiler for such an architecture could be as=20
simple as optimizing sequences like OVER + into a single instruction,=20 >> and programmers could then design their code to prefer DUP, OVER,=20
THIRD and the like over SWAP, ROT, TUCK, and the like.=20

I don't think it is that simple. I found that, for all practical purposes,=
every stack manipulation could be optimized away. But it requires careful= ordering of operands, even more so than with traditional Forth.

Yes, you have an even smaller set of operations for arranging data.
Nobody said it was simple.

That's my point. It's not simple. Ideally, the programmer would not need to be aware of the details of the CPU involved. That's what optimizing compilers do. The only hand work is in the very rare cases of needing to further optimize, but now it
becomes a game of trial and error to find particular code sequences that produce particular assembly that is not obviously opimized for a particular CPU variant. I think it was in this group where people explored this finding that both CPUs and
compilers evolve, resulting in very unpredictable results for any particular combination.

I'd be happy with a Forth compiler that would easily produce approximately optimal code for a given instruction set, but that is beyond my experience and knowledge.

Already with traditional Forth many
programmers don't want to go to the necessary lengths, and most
switched to other languages over time, and of the rest many use
locals, more or less frequently, much to the outrage of Forth purists.

Mostly, programming in Forth doesn't take much work as optimization is seldom required.

One question I have about locals... in C and other languages, they are on the stack, so are only valid when that portion of code is "in scope". My understanding is Forth does not allocate locals on the data stack. Is that right? Is there a locals
stack or is it dedicated storage, like an otherwise declared variable?

--

Rick C.

+- Get 1,000 miles of free Supercharging
+- Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Rick C on Fri Jul 22 20:50:33 2022

Rick C <gnuarm.deletethisbit@gmail.com> writes:

On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:

Yes, unless you leave that job to the programmer. At least with this=20
kind of architecture the programmer can express this by using DUP OVER=20
THIRD PICK instead of SWAP ROT TUCK.=20

So you are suggesting a developer can program this CPU in Forth by using pa= >rticular idioms? =20

Yes, but my point was that the programmer can make the compiler's job
easy by programming such an architecture with Forth and preferring DUP
OVER THIRD PICK, while with a register architecture the programmer has
little chance to help the compiler.

Ideally, the programmer would not need =
to be aware of the details of the CPU involved.

That's not an ideal of every Forther, nor for other programming
languages.

That's what optimizing com=
pilers do.

Sometimes, unreliably.

The only hand work is in the very rare cases of needing to furt=

her optimize, but now it becomes a game of trial and error to find particul= >ar code sequences that produce particular assembly that is not obviously op= >imized for a particular CPU variant. I think it was in this group where pe= >ople explored this finding that both CPUs and compilers evolve, resulting i= >n very unpredictable results for any particular combination.=20

I'd be happy with a Forth compiler that would easily produce approximately = >optimal code for a given instruction set, but that is beyond my experience = >and knowledge.=20

One question I have about locals... in C and other languages, they are on t= >he stack,

More typically in registers.

so are only valid when that portion of code is "in scope". My un= >derstanding is Forth does not allocate locals on the data stack. Is that = >right?

Yes.

Is there a locals stack or is it dedicated storage, like an otherwi=
se declared variable?=20

Gforth uses a locals stack. Most others use the return stack. Some
people propose using ordinary variables with beheading, but that does
not work reentrantly.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Rick C@21:1/5 to Anton Ertl on Fri Jul 22 16:51:52 2022

On Friday, July 22, 2022 at 5:00:23 PM UTC-4, Anton Ertl wrote:

Rick C <gnuarm.del...@gmail.com> writes:

On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:

Yes, unless you leave that job to the programmer. At least with this=20 >> kind of architecture the programmer can express this by using DUP OVER=20 >> THIRD PICK instead of SWAP ROT TUCK.=20

So you are suggesting a developer can program this CPU in Forth by using pa=
rticular idioms? =20

Yes, but my point was that the programmer can make the compiler's job
easy by programming such an architecture with Forth and preferring DUP
OVER THIRD PICK, while with a register architecture the programmer has little chance to help the compiler.

I suppose using idioms has the advantage of retaining portability. Otherwise they seem verbose and counter productive. Maybe this would be a better discussion to have some source code, but I'm feeling a bit lazy about this at the moment. It has been
some years since I worked on this, but it is available if I dig up the file.

Ideally, the programmer would not need =
to be aware of the details of the CPU involved.

That's not an ideal of every Forther, nor for other programming
languages.

I don't really care what various "Forthers" may or may not think. I'm pretty sure it is a major goal of all the most commonly used languages. Why would you say otherwise. Are you thinking of various quirky languages?

That's what optimizing com=
pilers do.

Sometimes, unreliably.

You are saying programs have bugs? Yes, I've heard that!

The only hand work is in the very rare cases of needing to furt=

her optimize, but now it becomes a game of trial and error to find particul=
ar code sequences that produce particular assembly that is not obviously op=
imized for a particular CPU variant. I think it was in this group where pe= >ople explored this finding that both CPUs and compilers evolve, resulting i=
n very unpredictable results for any particular combination.=20

I'd be happy with a Forth compiler that would easily produce approximately =
optimal code for a given instruction set, but that is beyond my experience =
and knowledge.=20

One question I have about locals... in C and other languages, they are on t=
he stack,

More typically in registers.

so are only valid when that portion of code is "in scope". My un= >derstanding is Forth does not allocate locals on the data stack. Is that = >right?

Yes.

Is there a locals stack or is it dedicated storage, like an otherwi=
se declared variable?=20

Gforth uses a locals stack. Most others use the return stack. Some
people propose using ordinary variables with beheading, but that does
not work reentrantly.

Of course, the return stack makes perfect sense. I should have seen that. If I recall, because I wanted to keep the instruction size down, I limited the offset field to two or three bits, so probably not enough space to reliably implement locals on the
data stack. Any stack without addressability would be hard to use as local storage.

--

Rick C.

++ Get 1,000 miles of free Supercharging
++ Tesla referral code - https://ts.la/richard11209

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Wayne morellini@21:1/5 to All on Fri Jul 22 19:11:14 2022

I forgot, maybe it would be useful for the standards community
to develop optimising sections for compilers to use on various
architectures, and as a starting points on some other theoretical
types? Then language developers can just start with those
routines in their compilers.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Wayne morellini@21:1/5 to gnuarm.del...@gmail.com on Fri Jul 22 19:03:41 2022

On Saturday, July 23, 2022 at 3:27:09 AM UTC+10, gnuarm.del...@gmail.com wrote:

On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:

Rick C <gnuarm.del...@gmail.com> writes:

Forth is an easy thing to write. You either use the stack architecture of =
the stack CPU or you emulate a virtual stack machine. It seems to me that =
to optimize a stack processor which has stack addressing, it would require =
some of the same features found in compilers for register based machines. =
My understanding is this would be significantly more work.=20

Yes, unless you leave that job to the programmer. At least with this
kind of architecture the programmer can express this by using DUP OVER THIRD PICK instead of SWAP ROT TUCK.

So you are suggesting a developer can program this CPU in Forth by using particular idioms?

I think that a Forth compiler for such an architecture could be as=20 >> simple as optimizing sequences like OVER + into a single instruction,=20
and programmers could then design their code to prefer DUP, OVER,=20
THIRD and the like over SWAP, ROT, TUCK, and the like.=20

So, this is a design meant for others to develop on rather
then just yourself as the developer.

One of the issues, I notice, is you often give far too little
information, to deeply and quickly address things, unlike
some others.

I don't think it is that simple. I found that, for all practical purposes,=
every stack manipulation could be optimized away. But it requires careful=
ordering of operands, even more so than with traditional Forth.

Yes, you have an even smaller set of operations for arranging data.
Nobody said it was simple.

That's my point. It's not simple. Ideally, the programmer would not need to be aware of the details of the CPU involved. That's what optimizing compilers do. The only hand work is in the very rare cases of needing to further optimize, but now it

becomes a game of trial and error to find particular code sequences that produce particular assembly that is not obviously opimized for a particular CPU variant. I think it was in this group where people explored this finding that both CPUs and compilers
evolve, resulting in very unpredictable results for any particular combination.

I'd be happy with a Forth compiler that would easily produce approximately optimal code for a given instruction set, but that is beyond my experience and knowledge.

Already with traditional Forth many
programmers don't want to go to the necessary lengths, and most
switched to other languages over time, and of the rest many use
locals, more or less frequently, much to the outrage of Forth purists.

Mostly, programming in Forth doesn't take much work as optimization is seldom required.

One question I have about locals... in C and other languages, they are on the stack, so are only valid when that portion of code is "in scope". My understanding is Forth does not allocate locals on the data stack. Is that right? Is there a locals stack

or is it dedicated storage, like an otherwise declared variable?

If you want code optimised compiler, I would suggest
talking to Steven or Elisabeth, about a version of their
compilers that optimise for your design instead. Maybe
Stephen could help suggest optimised architecture
festures to accommodate this too. You ussually do
military contracts, so factoring in a commercial
optimising compiler would be worth looking at, as aslong as
the military approves an improved performance based
amendment to a contract, they can afford to do this.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Wayne morellini@21:1/5 to gnuarm.del...@gmail.com on Fri Jul 22 18:40:19 2022

On Saturday, July 23, 2022 at 9:51:53 AM UTC+10, gnuarm.del...@gmail.com wrote:

On Friday, July 22, 2022 at 5:00:23 PM UTC-4, Anton Ertl wrote:

Rick C <gnuarm.del...@gmail.com> writes:

On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:

Yes, unless you leave that job to the programmer. At least with this=20 >> kind of architecture the programmer can express this by using DUP OVER=20
THIRD PICK instead of SWAP ROT TUCK.=20

So you are suggesting a developer can program this CPU in Forth by using pa=
rticular idioms? =20

Yes, but my point was that the programmer can make the compiler's job
easy by programming such an architecture with Forth and preferring DUP OVER THIRD PICK, while with a register architecture the programmer has little chance to help the compiler.

I suppose using idioms has the advantage of retaining portability. Otherwise they seem verbose and counter productive. Maybe this would be a better discussion to have some source code, but I'm feeling a bit lazy about this at the moment. It has been

some years since I worked on this, but it is available if I dig up the file.

Ideally, the programmer would not need =
to be aware of the details of the CPU involved.

That's not an ideal of every Forther, nor for other programming
languages.

I don't really care what various "Forthers" may or may not think. I'm pretty sure it is a major goal of all the most commonly used languages. Why would you say otherwise. Are you thinking of various quirky languages?

That's what optimizing com=
pilers do.

Sometimes, unreliably.

You are saying programs have bugs? Yes, I've heard that!

The only hand work is in the very rare cases of needing to furt=

her optimize, but now it becomes a game of trial and error to find particul=
ar code sequences that produce particular assembly that is not obviously op=
imized for a particular CPU variant. I think it was in this group where pe=
ople explored this finding that both CPUs and compilers evolve, resulting i=
n very unpredictable results for any particular combination.=20

I'd be happy with a Forth compiler that would easily produce approximately =
optimal code for a given instruction set, but that is beyond my experience =
and knowledge.=20

One question I have about locals... in C and other languages, they are on t=
he stack,

More typically in registers.

so are only valid when that portion of code is "in scope". My un= >derstanding is Forth does not allocate locals on the data stack. Is that =
right?

Yes.

Is there a locals stack or is it dedicated storage, like an otherwi=
se declared variable?=20

Gforth uses a locals stack. Most others use the return stack. Some
people propose using ordinary variables with beheading, but that does
not work reentrantly.

Of course, the return stack makes perfect sense. I should have seen that. If I recall, because I wanted to keep the instruction size down, I limited the offset field to two or three bits, so probably not enough space to reliably implement locals on the

data stack. Any stack without addressability would be hard to use as local storage.

--

Rick C.

++ Get 1,000 miles of free Supercharging
++ Tesla referral code - https://ts.la/richard11209

In the end, what I decided many years back designing
ISA's, is eventually the stack is going get too big
(depending on what you want to do) and you would need to
cache it (the stack used in place of a cache). I think
silicon composers or somebody had a simplified
scheme like this. But if you the silicon understood their
was a range for normal and a range to keep in cache
memory. So, here, the stack is split to become a third
locals stack as well. A register file, or more
conventional C like stack you might say But let's split that
sideways, and simplify the design. Let's say there is a
pointer register that comes with a definition or task
context etc, that is passed on the stack and loaded into the
register, then instructions can reference smaller offset fields
to access those values, offsetted by the bale in the
associative memory register. I forget the range of solutions
I came up with to select an optimal implementation here,
as the information to too obscure. However, my
designs were designed to include a range of advanced
operating system functions, so it was useful to pass
advanced information in as a range of values, order for
maximum code efficiency, where the programmer
presents the values in the API definition order, which
optimises much of the processing (as much of the
time is spent in API's in such an system). Spent 20-30
years on this stuff. It's a bit obscure, but really about
picking which architecture best suits the situation. Most
of my stuff is to suit virtual machines, so violated Chuck's
self adjusting code support (but I actually did come up
with a self adjusting solution eventually, I think) so I could
afford to look at extra stuff without that. I forget exactly
why that was important to mention.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to Rick C on Sat Jul 23 12:35:48 2022

Rick C <gnuarm.deletethisbit@gmail.com> writes:

On Friday, July 22, 2022 at 5:00:23 PM UTC-4, Anton Ertl wrote:

Yes, but my point was that the programmer can make the compiler's job=20
easy by programming such an architecture with Forth and preferring DUP=20
OVER THIRD PICK, while with a register architecture the programmer has=20
little chance to help the compiler.=20

I suppose using idioms has the advantage of retaining portability. Otherwi= >se they seem verbose and counter productive.

Compared to what?

Maybe this would be a better =
discussion to have some source code, but I'm feeling a bit lazy about this = >at the moment. It has been some years since I worked on this, but it is av= >ailable if I dig up the file. =20

That would certainly help advance the discussion.

Ideally, the programmer would not need =3D
to be aware of the details of the CPU involved.

That's not an ideal of every Forther, nor for other programming=20
languages.=20

I don't really care what various "Forthers" may or may not think. I'm pret= >ty sure it is a major goal of all the most commonly used languages. Why wo= >uld you say otherwise. Are you thinking of various quirky languages? =20

It certainly is a goal of portable programming languages. But they
don't completely achieve this goal, especially when it comes to
performance, and that's why programmers often have to be aware of the
CPU. And in the case of Forth there's the teachings of Chuck Moore,
which include shifting work from the compiler to the programmer. So
if you follow these teachings, you will prefer a simple compiler, and
accept the cost of having to write the code for it (if you want
performance).

That's what optimizing com=3D=20
pilers do.=20

=20
Sometimes, unreliably.=20

You are saying programs have bugs? Yes, I've heard that!=20

Not correctness bugs in this case. It's just that if you rely on
optimizing compilers, you are occasionally going to suffer an
unpleasant surprise (at if you look at the result and know enough to
have an expectation). And these unpleasant surprises are usually not
bugs.

Is there a locals stack or is it dedicated storage, like an otherwi=3D= >=20
se declared variable?=3D20=20

=20
Gforth uses a locals stack. Most others use the return stack. Some=20
people propose using ordinary variables with beheading, but that does=20
not work reentrantly.

Of course, the return stack makes perfect sense. I should have seen that. =
If I recall, because I wanted to keep the instruction size down, I limited= the offset field to two or three bits, so probably not enough space to rel=
iably implement locals on the data stack.

3 bits should be enough for most uses, even if you use the data stack.

If you want to use the data stack, Tevet's work [tevet89] may be of
interest to you.

@string(jfar="Journal of Forth Application and Research")
@Article{tevet89,
author = "Adin Tevet",
title = "Symbolic Stack Addressing",
journal = jfar,
year = "1989",
volume = "5",
number = "3",
pages = "365--379",
url = "http://soton.mpeforth.com/flag/jfar/vol5/no3/article2.pdf",
annote = "A local variable mechanism that uses the data stack
for storage. The variables are accessed by {\tt PICK}
and {\tt POST} (its opposite), which means that the
compiler must keep track of the stack depth. Includes
source code for 8086 F83."
}

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: http://www.euroforth.org/ef22/cfp.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@arcor.de@21:1/5 to gnuarm.del...@gmail.com on Sat Jul 23 14:42:35 2022

gnuarm.del...@gmail.com schrieb am Samstag, 23. Juli 2022 um 01:51:53 UTC+2:

Of course, the return stack makes perfect sense. I should have seen that. If I recall, because I wanted to keep the instruction size down, I limited the offset field to two or three bits, so probably not enough space to reliably implement locals on the

data stack. Any stack without addressability would be hard to use as local storage.

There is also another solution: move locals from the data stack
to the other end of the data stack segment, so that you have
a (pseudo) locals stack in the data stack memory segment but
not within the data stack itself, nor in the return stack.

One stack grows downwards while the other stack grows upwards,
so that their "top-of-stacks" face each other. When the (pseudo)
locals stack grows, the data stack shrinks. 3 bits are sufficient to
address (offset to) 8 locals.

Advantage: no return stack cluttering or complicated locals frame
addressing schemes.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to Rick C on Sat Jul 23 18:35:29 2022

Rick C <gnuarm.deletethisbit@gmail.com> writes:

I limited the offset field to two or three bits, so probably not
enough space to reliably implement locals on the data stack. Any
stack without addressability would be hard to use as local storage.

The main obstacle to using the data stack for locals is that it's hard
for the compiler to know the exact state of the data stack all the time.
You could have a conditional DROP and the compiler wouldn't know the
stack picture afterwards.

C compilers traditionally reserve a register to use as a frame pointer,
so locals would be addressed as offsets from the frame pointer. You can
get by without it, maybe resulting in slightly tighter code, but it
complicates the compiler and makes debugging a lot harder, since you can
no longer easily figure out the call stack by examining memory after a
crash.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Wayne morellini@21:1/5 to Paul Rubin on Sun Jul 24 00:23:56 2022

On Sunday, July 24, 2022 at 11:35:33 AM UTC+10, Paul Rubin wrote:

Rick C <gnuarm.del...@gmail.com> writes:

I limited the offset field to two or three bits, so probably not
enough space to reliably implement locals on the data stack. Any
stack without addressability would be hard to use as local storage.

The main obstacle to using the data stack for locals is that it's hard
for the compiler to know the exact state of the data stack all the time.
You could have a conditional DROP and the compiler wouldn't know the
stack picture afterwards.

C compilers traditionally reserve a register to use as a frame pointer,
so locals would be addressed as offsets from the frame pointer. You can
get by without it, maybe resulting in slightly tighter code, but it complicates the compiler and makes debugging a lot harder, since you can
no longer easily figure out the call stack by examining memory after a
crash.

Exactly. You can use a register to store the current context lo also area (has
to be passed with return be or have a return stack for it). The area can be a
buffer area that can act as a stack with the register as an index register.

There are a lot of ways it can be designed, it depends on wherever it's a general
purpose processor, or task specific, if one programmer does it, or future
programmers will do it. Your application space includes knowing the work space
The person who understands how to program it correctly being the only one,
can optimise it. If others are going change and maintain it, not of that level
(presumably) then an optimising compiler makes sense.

But, optimisation. I don't kow how they optimise it. But, let's say, for each
sequence, you lost the best substitute sequence. This means you have to
reduce things to an intermediate code to get rid to state local dependencies that
need to be known for the compiler to pick the best substitute. If the compiler is
really good, it can see that code in a call is not efficient, affecting optimisation
around the call, and rearrange that and suggest optimisation to change it too. But,
a really good optimising compiler should photo.ize left right, up down and the data,
and the logic which transcends time (3D and 4D). Basically. The compiler needs
to do advanced factoring. So here Rick, you go for 2D or 3D to simplify compiler
construction.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to minf...@arcor.de on Sun Jul 24 14:06:57 2022

In article <3b1e09c5-21f3-4e80-b14b-1f6e605adc98n@googlegroups.com>, minf...@arcor.de <minforth@arcor.de> wrote:

gnuarm.del...@gmail.com schrieb am Samstag, 23. Juli 2022 um 01:51:53 UTC+2: >> Of course, the return stack makes perfect sense. I should have seen
that. If I recall, because I wanted to keep the instruction size down, I >limited the offset field to two or three bits, so probably not enough
space to reliably implement locals on the data stack. Any stack without >addressability would be hard to use as local storage.

There is also another solution: move locals from the data stack
to the other end of the data stack segment, so that you have
a (pseudo) locals stack in the data stack memory segment but
not within the data stack itself, nor in the return stack.

One stack grows downwards while the other stack grows upwards,
so that their "top-of-stacks" face each other. When the (pseudo)
locals stack grows, the data stack shrinks. 3 bits are sufficient to
address (offset to) 8 locals.

Advantage: no return stack cluttering or complicated locals frame
addressing schemes.

Brilliant. That is called a separate locals stack.
Brilliant, but of course not original.

Groetjes
Albert
--
"in our communism country Viet Nam, people are forced to be
alive and in the western country like US, people are free to
die from Covid 19 lol" duc ha
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Wayne morellini@21:1/5 to none albert on Sun Jul 24 08:38:12 2022

On Sunday, July 24, 2022 at 10:07:00 PM UTC+10, none albert wrote:

In article <3b1e09c5-21f3-4e80...@googlegroups.com>,
minf...@arcor.de <minf...@arcor.de> wrote:

gnuarm.del...@gmail.com schrieb am Samstag, 23. Juli 2022 um 01:51:53 UTC+2: >> Of course, the return stack makes perfect sense. I should have seen
that. If I recall, because I wanted to keep the instruction size down, I >limited the offset field to two or three bits, so probably not enough
space to reliably implement locals on the data stack. Any stack without >addressability would be hard to use as local storage.

There is also another solution: move locals from the data stack
to the other end of the data stack segment, so that you have
a (pseudo) locals stack in the data stack memory segment but
not within the data stack itself, nor in the return stack.

One stack grows downwards while the other stack grows upwards,
so that their "top-of-stacks" face each other. When the (pseudo)
locals stack grows, the data stack shrinks. 3 bits are sufficient to >address (offset to) 8 locals.

Advantage: no return stack cluttering or complicated locals frame >addressing schemes.

Brilliant. That is called a separate locals stack.
Brilliant, but of course not original.
Groetjes
Albert

Spoken as one who comes up with these things without reading them?

It's not worth criticising somebody for coming up with something descent without reading about it first. If he had independent good creativity, good
on him. I had to learn a lot of skills on top of the ability.to make better results myself. On the decline, I can't define the complete nature of that, but a
shadow of understanding is still there.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marcel Hendrix@21:1/5 to none albert on Sun Jul 24 14:05:01 2022

On Sunday, July 24, 2022 at 2:07:00 PM UTC+2, none albert wrote:

In article <3b1e09c5-21f3-4e80...@googlegroups.com>,
minf...@arcor.de <minf...@arcor.de> wrote:

gnuarm.del...@gmail.com schrieb am Samstag, 23. Juli 2022 um 01:51:53 UTC+2: >> Of course, the return stack makes perfect sense. I should have seen
that. If I recall, because I wanted to keep the instruction size down, I >limited the offset field to two or three bits, so probably not enough
space to reliably implement locals on the data stack. Any stack without >addressability would be hard to use as local storage.

There is also another solution: move locals from the data stack
to the other end of the data stack segment, so that you have
a (pseudo) locals stack in the data stack memory segment but
not within the data stack itself, nor in the return stack.

One stack grows downwards while the other stack grows upwards,
so that their "top-of-stacks" face each other. When the (pseudo)
locals stack grows, the data stack shrinks. 3 bits are sufficient to >address (offset to) 8 locals.

Advantage: no return stack cluttering or complicated locals frame >addressing schemes.

Brilliant. That is called a separate locals stack.
Brilliant, but of course not original.

We have that in 32-bit iForth and IIRC, you were the one
proposing it (sorry I forget the details).

It worked ok, but a separate stack with its own SP is
much more elegant (i.e., simpler and easier to
understand). Modern CPUs have enough registers.

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	37:05:13
Calls:	6,707
Files:	12,239
Messages:	5,353,496

Forth for a Addressible Stack CPU

Who's Online

System Info