The last stack processor design I worked on had instructions for addressing operands on the stack rather than just the top data items. It didn't reach very deep, but the difference was noticeable in code density. A sample program (interrupt routinefor managing the data for a Numerically Controlled Oscillator - NCO) was reduced in size by a third, by eliminating stack manipulations.
But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would take a fair amount of work to design something like a compiler for a more typical register based CPU.
I may resurrect work on the design. Even if it is not supported with a Forth tool, it can be programmed in assembly which is not so much different from Forth, other than the addressability, which can be ignored for non-optimized code.
The last stack processor design I worked on had instructions for
addressing operands on the stack rather than just the top data items.
It didn't reach very deep, but the difference was noticeable in code
density.
But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would take
a fair amount of work to design something like a compiler for a more
typical register based CPU.
Rick C <gnuarm.del...@gmail.com> writes:
The last stack processor design I worked on had instructions forCongratulations, you have re-invented locals ;)
addressing operands on the stack rather than just the top data items.
It didn't reach very deep, but the difference was noticeable in code density.
But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would takeIdk about Forth compilers but it is a usual thing for C compilers, particularly for machines with not many registers, so they have to put
a fair amount of work to design something like a compiler for a more typical register based CPU.
locals in the stack. Some cpus like the SPARC had "register windows"
which basically meant the register file was aliased to the top N stack
slots. That got rid of the overhead of saving and restoring registers
on subroutine call and return. I guess it had other costs since the
scheme hasn't been popular. I've never tried to program it at low
level.
The last stack processor design I worked on had instructions for addressing=
operands on the stack rather than just the top data items. It didn't reac=
h very deep, but the difference was noticeable in code density. A sample p= >rogram (interrupt routine for managing the data for a Numerically Controlle= >d Oscillator - NCO) was reduced in size by a third, by eliminating stack m= >anipulations. =20
But I never saw a way to write a Forth compiler that would be able to optim= >ize the code for such an architecture. I suppose that would take a fair am= >ount of work to design something like a compiler for a more typical registe= >r based CPU.
On Thursday, July 21, 2022 at 11:13:44 AM UTC+10, Paul Rubin wrote:
Rick C <gnuarm.del...@gmail.com> writes:
The last stack processor design I worked on had instructions for addressing operands on the stack rather than just the top data items.Congratulations, you have re-invented locals ;)
It didn't reach very deep, but the difference was noticeable in code density.
But I never saw a way to write a Forth compiler that would be able to optimize the code for such an architecture. I suppose that would takeIdk about Forth compilers but it is a usual thing for C compilers, particularly for machines with not many registers, so they have to put
a fair amount of work to design something like a compiler for a more typical register based CPU.
locals in the stack. Some cpus like the SPARC had "register windows"Well, as it is presumably a custom design, with no changes in processor,
which basically meant the register file was aliased to the top N stack slots. That got rid of the overhead of saving and restoring registers
on subroutine call and return. I guess it had other costs since the
scheme hasn't been popular. I've never tried to program it at low
level.
he could just make forth words to do the operation, and project segment
them in a lost of words which have to be re-written if the processor
changes. The original design would just use the assembly for that word.
I don't see what the problem is. But, I'm presuming it's a forth processor, and not another some stack derivative. Still, would like to inspect how
good this ISA is.
Some cpus like the SPARC had "register windows"
which basically meant the register file was aliased to the top N stack
slots.
Rick C <gnuarm.del...@gmail.com> writes:
The last stack processor design I worked on had instructions for addressing=
operands on the stack rather than just the top data items. It didn't reac=
h very deep, but the difference was noticeable in code density. A sample p= >rogram (interrupt routine for managing the data for a Numerically Controlle= >d Oscillator - NCO) was reduced in size by a third, by eliminating stack m= >anipulations. =20
But I never saw a way to write a Forth compiler that would be able to optim= >ize the code for such an architecture. I suppose that would take a fair am= >ount of work to design something like a compiler for a more typical registe= >r based CPU.
The last sentence is confusing. How is it related to the rest?
I think that a Forth compiler for such an architecture could be as
simple as optimizing sequences like OVER + into a single instruction,
and programmers could then design their code to prefer DUP, OVER,
THIRD and the like over SWAP, ROT, TUCK, and the like.
If you are more ambitious, I think you can do something with similar complexity of an analytic compiler for a register machine. I have no
idea how much more that would buy.
BTW, such an architecture is widely available: The 387. Looking at
what iForth, lxf, and VFX 4.72 produce for
: foo fover f+ ;
they don't make use of this optimization opportunity, but all generate
stuff like:
( 080C6C70 D9C1 ) FLD ST(1)
( 080C6C72 DEC1 ) FADDP ST(1), ST
( 080C6C74 C3 ) NEXT,
( 5 bytes, 3 instructions )
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: http://www.euroforth.org/ef22/cfp.html
What is a "Forth processor" as distinct from a stack processor?
Rick C <gnuarm.del...@gmail.com> writes:I guessed we are not talking about a C machine here. There is more
What is a "Forth processor" as distinct from a stack processor?Forth processors traditionally have two stacks, I think. I don't know
if non-Forth "stack processors" have that.
Rick C <gnuarm.del...@gmail.com> writes:
The last stack processor design I worked on had instructions for addressing=
operands on the stack rather than just the top data items. It didn't reac=
h very deep, but the difference was noticeable in code density. A sample p= >rogram (interrupt routine for managing the data for a Numerically Controlle=
d Oscillator - NCO) was reduced in size by a third, by eliminating stack m= >anipulations. =20
But I never saw a way to write a Forth compiler that would be able to optim=
ize the code for such an architecture. I suppose that would take a fair am= >ount of work to design something like a compiler for a more typical registe=
r based CPU.
The last sentence is confusing. How is it related to the rest?
I think that a Forth compiler for such an architecture could be as
simple as optimizing sequences like OVER + into a single instruction,
and programmers could then design their code to prefer DUP, OVER,
THIRD and the like over SWAP, ROT, TUCK, and the like.
If you are more ambitious, I think you can do something with similar complexity of an analytic compiler for a register machine. I have no
idea how much more that would buy.
BTW, such an architecture is widely available: The 387. Looking at
what iForth, lxf, and VFX 4.72 produce for
: foo fover f+ ;
they don't make use of this optimization opportunity, but all generate
stuff like:
( 080C6C70 D9C1 ) FLD ST(1)
( 080C6C72 DEC1 ) FADDP ST(1), ST
( 080C6C74 C3 ) NEXT,
( 5 bytes, 3 instructions )
Rick C <gnuarm.del...@gmail.com> writes:
What is a "Forth processor" as distinct from a stack processor?Forth processors traditionally have two stacks, I think. I don't know
if non-Forth "stack processors" have that.
I suppose it depends on how you define, "non-Forth" stack processors.
Paul Rubin <no.email@nospam.invalid> writes:
Some cpus like the SPARC had "register windows"
which basically meant the register file was aliased to the top N stack >>slots.
No. Registers on SPARC (and rotating register files on AMD29K, and
IA-64) are not addressable as memory. If you use more register
windows than the hardware has registers, the contents of the most
remote windows are stored to memory in an interrupt routine; if you
then pop windows back until you reach one that is no longer in the
register file, another interrupt loads the contents from memory into
the register window.
Similarly on the AMD29K, and AFAIK in practice on IA-64
implementations; IA-64 implementations were intended to do this
spilling and refilling in the background in hardware, but all I heard
is that it did not work, and that the interrupt approach continued to
be used.
- anton
Forth is an easy thing to write. You either use the stack architecture of = >the stack CPU or you emulate a virtual stack machine. It seems to me that = >to optimize a stack processor which has stack addressing, it would require = >some of the same features found in compilers for register based machines. = >My understanding is this would be significantly more work.=20
I think that a Forth compiler for such an architecture could be as=20
simple as optimizing sequences like OVER + into a single instruction,=20
and programmers could then design their code to prefer DUP, OVER,=20
THIRD and the like over SWAP, ROT, TUCK, and the like.=20
I don't think it is that simple. I found that, for all practical purposes,=
every stack manipulation could be optimized away. But it requires careful= ordering of operands, even more so than with traditional Forth.
In article <2022Jul21.161403@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
Paul Rubin <no.email@nospam.invalid> writes:
Some cpus like the SPARC had "register windows"
which basically meant the register file was aliased to the top N stack >>>slots.
No. Registers on SPARC (and rotating register files on AMD29K, and
IA-64) are not addressable as memory. If you use more register
windows than the hardware has registers, the contents of the most
remote windows are stored to memory in an interrupt routine; if you
then pop windows back until you reach one that is no longer in the
register file, another interrupt loads the contents from memory into
the register window.
Similarly on the AMD29K, and AFAIK in practice on IA-64
implementations; IA-64 implementations were intended to do this
spilling and refilling in the background in hardware, but all I heard
is that it did not work, and that the interrupt approach continued to
be used.
This is interesting. Were they not able to accomplish in a
c-compiler? What about a Forth compiler?
Rick C <gnuarm.del...@gmail.com> writes:
Forth is an easy thing to write. You either use the stack architecture of = >the stack CPU or you emulate a virtual stack machine. It seems to me that = >to optimize a stack processor which has stack addressing, it would require =
some of the same features found in compilers for register based machines. = >My understanding is this would be significantly more work.=20
Yes, unless you leave that job to the programmer. At least with this
kind of architecture the programmer can express this by using DUP OVER
THIRD PICK instead of SWAP ROT TUCK.
I think that a Forth compiler for such an architecture could be as=20
simple as optimizing sequences like OVER + into a single instruction,=20 >> and programmers could then design their code to prefer DUP, OVER,=20
THIRD and the like over SWAP, ROT, TUCK, and the like.=20
I don't think it is that simple. I found that, for all practical purposes,=Yes, you have an even smaller set of operations for arranging data.
every stack manipulation could be optimized away. But it requires careful= ordering of operands, even more so than with traditional Forth.
Nobody said it was simple.
Already with traditional Forth many
programmers don't want to go to the necessary lengths, and most
switched to other languages over time, and of the rest many use
locals, more or less frequently, much to the outrage of Forth purists.
On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:
Yes, unless you leave that job to the programmer. At least with this=20
kind of architecture the programmer can express this by using DUP OVER=20
THIRD PICK instead of SWAP ROT TUCK.=20
So you are suggesting a developer can program this CPU in Forth by using pa= >rticular idioms? =20
Ideally, the programmer would not need =
to be aware of the details of the CPU involved.
That's what optimizing com=
pilers do.
her optimize, but now it becomes a game of trial and error to find particul= >ar code sequences that produce particular assembly that is not obviously op= >imized for a particular CPU variant. I think it was in this group where pe= >ople explored this finding that both CPUs and compilers evolve, resulting i= >n very unpredictable results for any particular combination.=20
I'd be happy with a Forth compiler that would easily produce approximately = >optimal code for a given instruction set, but that is beyond my experience = >and knowledge.=20
One question I have about locals... in C and other languages, they are on t= >he stack,
so are only valid when that portion of code is "in scope". My un= >derstanding is Forth does not allocate locals on the data stack. Is that = >right?
Is there a locals stack or is it dedicated storage, like an otherwi=
se declared variable?=20
Rick C <gnuarm.del...@gmail.com> writes:
On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:
Yes, unless you leave that job to the programmer. At least with this=20 >> kind of architecture the programmer can express this by using DUP OVER=20 >> THIRD PICK instead of SWAP ROT TUCK.=20
So you are suggesting a developer can program this CPU in Forth by using pa=
rticular idioms? =20
Yes, but my point was that the programmer can make the compiler's job
easy by programming such an architecture with Forth and preferring DUP
OVER THIRD PICK, while with a register architecture the programmer has little chance to help the compiler.
Ideally, the programmer would not need =That's not an ideal of every Forther, nor for other programming
to be aware of the details of the CPU involved.
languages.
That's what optimizing com=
pilers do.
Sometimes, unreliably.
The only hand work is in the very rare cases of needing to furt=
her optimize, but now it becomes a game of trial and error to find particul=
ar code sequences that produce particular assembly that is not obviously op=
imized for a particular CPU variant. I think it was in this group where pe= >ople explored this finding that both CPUs and compilers evolve, resulting i=
n very unpredictable results for any particular combination.=20
I'd be happy with a Forth compiler that would easily produce approximately =
optimal code for a given instruction set, but that is beyond my experience =
and knowledge.=20
One question I have about locals... in C and other languages, they are on t=
he stack,
More typically in registers.
so are only valid when that portion of code is "in scope". My un= >derstanding is Forth does not allocate locals on the data stack. Is that = >right?
Yes.
Is there a locals stack or is it dedicated storage, like an otherwi=
se declared variable?=20
Gforth uses a locals stack. Most others use the return stack. Some
people propose using ordinary variables with beheading, but that does
not work reentrantly.
On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:
Rick C <gnuarm.del...@gmail.com> writes:
Forth is an easy thing to write. You either use the stack architecture of =
the stack CPU or you emulate a virtual stack machine. It seems to me that =
to optimize a stack processor which has stack addressing, it would require =
some of the same features found in compilers for register based machines. =
My understanding is this would be significantly more work.=20
Yes, unless you leave that job to the programmer. At least with thisSo you are suggesting a developer can program this CPU in Forth by using particular idioms?
kind of architecture the programmer can express this by using DUP OVER THIRD PICK instead of SWAP ROT TUCK.
I think that a Forth compiler for such an architecture could be as=20 >> simple as optimizing sequences like OVER + into a single instruction,=20
and programmers could then design their code to prefer DUP, OVER,=20
THIRD and the like over SWAP, ROT, TUCK, and the like.=20
becomes a game of trial and error to find particular code sequences that produce particular assembly that is not obviously opimized for a particular CPU variant. I think it was in this group where people explored this finding that both CPUs and compilersThat's my point. It's not simple. Ideally, the programmer would not need to be aware of the details of the CPU involved. That's what optimizing compilers do. The only hand work is in the very rare cases of needing to further optimize, but now itI don't think it is that simple. I found that, for all practical purposes,=Yes, you have an even smaller set of operations for arranging data.
every stack manipulation could be optimized away. But it requires careful=
ordering of operands, even more so than with traditional Forth.
Nobody said it was simple.
I'd be happy with a Forth compiler that would easily produce approximately optimal code for a given instruction set, but that is beyond my experience and knowledge.or is it dedicated storage, like an otherwise declared variable?
Already with traditional Forth manyMostly, programming in Forth doesn't take much work as optimization is seldom required.
programmers don't want to go to the necessary lengths, and most
switched to other languages over time, and of the rest many use
locals, more or less frequently, much to the outrage of Forth purists.
One question I have about locals... in C and other languages, they are on the stack, so are only valid when that portion of code is "in scope". My understanding is Forth does not allocate locals on the data stack. Is that right? Is there a locals stack
On Friday, July 22, 2022 at 5:00:23 PM UTC-4, Anton Ertl wrote:some years since I worked on this, but it is available if I dig up the file.
Rick C <gnuarm.del...@gmail.com> writes:
On Friday, July 22, 2022 at 7:13:53 AM UTC-4, Anton Ertl wrote:
Yes, unless you leave that job to the programmer. At least with this=20 >> kind of architecture the programmer can express this by using DUP OVER=20
THIRD PICK instead of SWAP ROT TUCK.=20
So you are suggesting a developer can program this CPU in Forth by using pa=
rticular idioms? =20
Yes, but my point was that the programmer can make the compiler's jobI suppose using idioms has the advantage of retaining portability. Otherwise they seem verbose and counter productive. Maybe this would be a better discussion to have some source code, but I'm feeling a bit lazy about this at the moment. It has been
easy by programming such an architecture with Forth and preferring DUP OVER THIRD PICK, while with a register architecture the programmer has little chance to help the compiler.
I don't really care what various "Forthers" may or may not think. I'm pretty sure it is a major goal of all the most commonly used languages. Why would you say otherwise. Are you thinking of various quirky languages?Ideally, the programmer would not need =That's not an ideal of every Forther, nor for other programming
to be aware of the details of the CPU involved.
languages.
That's what optimizing com=
pilers do.
Sometimes, unreliably.You are saying programs have bugs? Yes, I've heard that!
The only hand work is in the very rare cases of needing to furt=
her optimize, but now it becomes a game of trial and error to find particul=
ar code sequences that produce particular assembly that is not obviously op=
imized for a particular CPU variant. I think it was in this group where pe=
ople explored this finding that both CPUs and compilers evolve, resulting i=
n very unpredictable results for any particular combination.=20
I'd be happy with a Forth compiler that would easily produce approximately =
optimal code for a given instruction set, but that is beyond my experience =
and knowledge.=20
data stack. Any stack without addressability would be hard to use as local storage.One question I have about locals... in C and other languages, they are on t=
he stack,
More typically in registers.
so are only valid when that portion of code is "in scope". My un= >derstanding is Forth does not allocate locals on the data stack. Is that =
right?
Yes.
Is there a locals stack or is it dedicated storage, like an otherwi=
se declared variable?=20
Gforth uses a locals stack. Most others use the return stack. SomeOf course, the return stack makes perfect sense. I should have seen that. If I recall, because I wanted to keep the instruction size down, I limited the offset field to two or three bits, so probably not enough space to reliably implement locals on the
people propose using ordinary variables with beheading, but that does
not work reentrantly.
--
Rick C.
++ Get 1,000 miles of free Supercharging
++ Tesla referral code - https://ts.la/richard11209
On Friday, July 22, 2022 at 5:00:23 PM UTC-4, Anton Ertl wrote:
Yes, but my point was that the programmer can make the compiler's job=20
easy by programming such an architecture with Forth and preferring DUP=20
OVER THIRD PICK, while with a register architecture the programmer has=20
little chance to help the compiler.=20
I suppose using idioms has the advantage of retaining portability. Otherwi= >se they seem verbose and counter productive.
Maybe this would be a better =
discussion to have some source code, but I'm feeling a bit lazy about this = >at the moment. It has been some years since I worked on this, but it is av= >ailable if I dig up the file. =20
Ideally, the programmer would not need =3DThat's not an ideal of every Forther, nor for other programming=20
to be aware of the details of the CPU involved.
languages.=20
I don't really care what various "Forthers" may or may not think. I'm pret= >ty sure it is a major goal of all the most commonly used languages. Why wo= >uld you say otherwise. Are you thinking of various quirky languages? =20
That's what optimizing com=3D=20=20
pilers do.=20
Sometimes, unreliably.=20
You are saying programs have bugs? Yes, I've heard that!=20
Is there a locals stack or is it dedicated storage, like an otherwi=3D= >=20=20
se declared variable?=3D20=20
Gforth uses a locals stack. Most others use the return stack. Some=20
people propose using ordinary variables with beheading, but that does=20
not work reentrantly.
Of course, the return stack makes perfect sense. I should have seen that. =
If I recall, because I wanted to keep the instruction size down, I limited= the offset field to two or three bits, so probably not enough space to rel=
iably implement locals on the data stack.
Of course, the return stack makes perfect sense. I should have seen that. If I recall, because I wanted to keep the instruction size down, I limited the offset field to two or three bits, so probably not enough space to reliably implement locals on thedata stack. Any stack without addressability would be hard to use as local storage.
I limited the offset field to two or three bits, so probably not
enough space to reliably implement locals on the data stack. Any
stack without addressability would be hard to use as local storage.
Rick C <gnuarm.del...@gmail.com> writes:
I limited the offset field to two or three bits, so probably notThe main obstacle to using the data stack for locals is that it's hard
enough space to reliably implement locals on the data stack. Any
stack without addressability would be hard to use as local storage.
for the compiler to know the exact state of the data stack all the time.
You could have a conditional DROP and the compiler wouldn't know the
stack picture afterwards.
C compilers traditionally reserve a register to use as a frame pointer,
so locals would be addressed as offsets from the frame pointer. You can
get by without it, maybe resulting in slightly tighter code, but it complicates the compiler and makes debugging a lot harder, since you can
no longer easily figure out the call stack by examining memory after a
crash.
gnuarm.del...@gmail.com schrieb am Samstag, 23. Juli 2022 um 01:51:53 UTC+2: >> Of course, the return stack makes perfect sense. I should have seen
that. If I recall, because I wanted to keep the instruction size down, I >limited the offset field to two or three bits, so probably not enough
space to reliably implement locals on the data stack. Any stack without >addressability would be hard to use as local storage.
There is also another solution: move locals from the data stack
to the other end of the data stack segment, so that you have
a (pseudo) locals stack in the data stack memory segment but
not within the data stack itself, nor in the return stack.
One stack grows downwards while the other stack grows upwards,
so that their "top-of-stacks" face each other. When the (pseudo)
locals stack grows, the data stack shrinks. 3 bits are sufficient to
address (offset to) 8 locals.
Advantage: no return stack cluttering or complicated locals frame
addressing schemes.
In article <3b1e09c5-21f3-4e80...@googlegroups.com>,
minf...@arcor.de <minf...@arcor.de> wrote:
gnuarm.del...@gmail.com schrieb am Samstag, 23. Juli 2022 um 01:51:53 UTC+2: >> Of course, the return stack makes perfect sense. I should have seen
that. If I recall, because I wanted to keep the instruction size down, I >limited the offset field to two or three bits, so probably not enough
space to reliably implement locals on the data stack. Any stack without >addressability would be hard to use as local storage.
There is also another solution: move locals from the data stack
to the other end of the data stack segment, so that you have
a (pseudo) locals stack in the data stack memory segment but
not within the data stack itself, nor in the return stack.
One stack grows downwards while the other stack grows upwards,
so that their "top-of-stacks" face each other. When the (pseudo)
locals stack grows, the data stack shrinks. 3 bits are sufficient to >address (offset to) 8 locals.
Advantage: no return stack cluttering or complicated locals frame >addressing schemes.Brilliant. That is called a separate locals stack.
Brilliant, but of course not original.
Groetjes
Albert
In article <3b1e09c5-21f3-4e80...@googlegroups.com>,
minf...@arcor.de <minf...@arcor.de> wrote:
gnuarm.del...@gmail.com schrieb am Samstag, 23. Juli 2022 um 01:51:53 UTC+2: >> Of course, the return stack makes perfect sense. I should have seen
that. If I recall, because I wanted to keep the instruction size down, I >limited the offset field to two or three bits, so probably not enough
space to reliably implement locals on the data stack. Any stack without >addressability would be hard to use as local storage.
There is also another solution: move locals from the data stack
to the other end of the data stack segment, so that you have
a (pseudo) locals stack in the data stack memory segment but
not within the data stack itself, nor in the return stack.
One stack grows downwards while the other stack grows upwards,
so that their "top-of-stacks" face each other. When the (pseudo)
locals stack grows, the data stack shrinks. 3 bits are sufficient to >address (offset to) 8 locals.
Advantage: no return stack cluttering or complicated locals frame >addressing schemes.Brilliant. That is called a separate locals stack.
Brilliant, but of course not original.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 37:05:13 |
Calls: | 6,707 |
Files: | 12,239 |
Messages: | 5,353,496 |