Forum: >>> Magnum BBS <<<

Single-pass or multi-pass Forth compiler

From minforth@arcor.de@21:1/5 to All on Mon Jan 16 03:50:26 2023

Historically Forth features a single-pass compiler that is triggered e.g.
when the interpreter hits a : colon. Some control flows require code back-patching to resolve jump targets. This can be awkward.

OTOH by going over the code in several passes offers easy paths for simplification and optimization. e.g.
- elimination of comments
- resolving of labels for jump targets
- intermediate language e.g. program as database of instruction lists
- peephole optimization, superinstructions
- tail call elimination

Is there any Forth out there that compiles high-level definitions in this way? Anybody willing to share some other experience in this field?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jan Coombs@21:1/5 to minf...@arcor.de on Mon Jan 16 14:26:59 2023

On Mon, 16 Jan 2023 03:50:26 -0800 (PST)
"minf...@arcor.de" <minforth@arcor.de> wrote:

- intermediate language e.g. program as database of instruction lists

Maybe this one in 'Holon Forth' and 'Fifth':

Planet Holonforth Wolf Wejgaard EuroForth 2016 http://www.euroforth.org/ef16/papers/wejgaard.pdf

1989-09-00 Paper presented at the EuroFORML conference 1989
Not Screens nor Files but Words Wolf Wejgaard https://holonforth.com/ef89.html

Further papers at:
https://www.holonforth.com/papers.html

Ref to Fifth was in:
A Fresh Look at the Forth Dictionary
Wolf Wejgaard
Presented at the EuroFORTH conference 1995
https://www.holonforth.com/ef95.html

Jan Coombs
--

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minf...@arcor.de on Mon Jan 16 16:27:17 2023

"minf...@arcor.de" <minforth@arcor.de> writes:

Historically Forth features a single-pass compiler that is triggered e.g. >when the interpreter hits a : colon. Some control flows require code >back-patching to resolve jump targets. This can be awkward.

OTOH by going over the code in several passes offers easy paths for >simplification and optimization. e.g.
- elimination of comments
- resolving of labels for jump targets
- intermediate language e.g. program as database of instruction lists
- peephole optimization, superinstructions
- tail call elimination

Is there any Forth out there that compiles high-level definitions in this way? >Anybody willing to share some other experience in this field?

Passes in the original sense have gone out of fashion once RAM sizes
were big enough. Compilers often have intermediate representations
for purposes such as analysis and optimization, and for modularization purposes.

iForth and VFX have an intermediate representation that they use for
inlining. AFAIK they don't use it for elimination of comments, jump
target resolution, peephole optimization, superinstructions, or
tail-call elimination.

Gforth uses the threaded-code representation (and its relocatable
variant) either directly, or as a step to a mixture of native code and threaded-code. It also buffers the threaded code for a straight-line
piece of code before generating native code, to allow better code
generation.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Hans Bezemer@21:1/5 to minf...@arcor.de on Mon Jan 16 08:43:06 2023

On Monday, January 16, 2023 at 12:50:28 PM UTC+1, minf...@arcor.de wrote:

OTOH by going over the code in several passes offers easy paths for simplification and optimization. e.g.
- elimination of comments

I can't think of a Forth compiler that DOESN'T. It simply ignores the stuff:

: \ blk @ IF >in @ c/l /f 1+ c/l * >in ! EXIT THEN source >in ! drop ; immediate

- resolving of labels for jump targets

Again - I can't think of a Forth compiler that doesn't. References are placed on
some stack and used as target addresses when jumping back or as source addresses to be patched later when jumping forward.

- peephole optimization, superinstructions

4tH has a peephole optimizer used for strength reduction, constant folding and dead code elimination. It only looks back a single instruction, but can in some circumstances work recursive.

- tail call elimination

4tH's optimizer does that one as well.

Is there any Forth out there that compiles high-level definitions in this way?
Anybody willing to share some other experience in this field?

It works fine and transparent - unless you insist to meddle with the return stack
in non-portable ways.

https://sourceforge.net/p/forth-4th/wiki/Optimization/

Hans Bezemer

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Marcel Hendrix@21:1/5 to minf...@arcor.de on Mon Jan 16 09:39:28 2023

On Monday, January 16, 2023 at 12:50:28 PM UTC+1, minf...@arcor.de wrote:

Historically Forth features a single-pass compiler that is triggered e.g. when the interpreter hits a : colon. Some control flows require code back-patching to resolve jump targets. This can be awkward.

OTOH by going over the code in several passes offers easy paths for simplification and optimization. e.g.

[..]

Is there any Forth out there that compiles high-level definitions in this way?

Using an intermediate representation is far more powerful, as one can
do recursive 'context-sensitive' inlining (delay compiling as long as possible).

In iForth's iSPICE package I'll eventually use run-time code generation (e.g. to eliminate pointer-chasing). This delays instruction selection to the limit. However, it only works if the target application is sufficiently unsophisticated,
or has predictable execution behavior (like SPICE). I'm sure that there would be many snags would I try it for Forth itself.

-marcel

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@arcor.de@21:1/5 to Marcel Hendrix on Mon Jan 16 11:47:13 2023

Marcel Hendrix schrieb am Montag, 16. Januar 2023 um 18:39:29 UTC+1:

On Monday, January 16, 2023 at 12:50:28 PM UTC+1, minf...@arcor.de wrote:

Historically Forth features a single-pass compiler that is triggered e.g. when the interpreter hits a : colon. Some control flows require code back-patching to resolve jump targets. This can be awkward.

OTOH by going over the code in several passes offers easy paths for simplification and optimization. e.g.

[..]

Is there any Forth out there that compiles high-level definitions in this way?

Using an intermediate representation is far more powerful, as one can
do recursive 'context-sensitive' inlining (delay compiling as long as possible).

In iForth's iSPICE package I'll eventually use run-time code generation (e.g. to eliminate pointer-chasing). This delays instruction selection to the limit.
However, it only works if the target application is sufficiently unsophisticated,
or has predictable execution behavior (like SPICE). I'm sure that there would be many snags would I try it for Forth itself.

Interesting. I never considered lazy evaluation for Forth.

Without climbing up to Haskell's heights, there are simple "lazy evaluators in Python
(copied from Stackoverflow):
######
In a nutshell, lazy evaluation means that the object is evaluated when it is needed,
not when it is created.
In Python 2, range will return a list - this means that if you give it a large number,
it will calculate the range and return at the time of creation:

i = range(100)
type(i)

<type 'list'>
In Python 3, however you get a special range object:

i = range(100)
type(i)

<class 'range'>
Only when you consume it, will it actually be evaluated - in other words, it will only return
the numbers in the range when you actually need them.
######

As knee reflex ISTM to have many similarities with returning macros. It would be
quite easy in Forth to return xt's of words containing EVALUATEs or [[ ]] gforthisms
for delayed evaluation.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@arcor.de@21:1/5 to Jan Coombs on Mon Jan 16 11:29:33 2023

Jan Coombs schrieb am Montag, 16. Januar 2023 um 16:05:04 UTC+1:

On Mon, 16 Jan 2023 03:50:26 -0800 (PST)
"minf...@arcor.de" <minf...@arcor.de> wrote:

- intermediate language e.g. program as database of instruction lists

Maybe this one in 'Holon Forth' and 'Fifth':

Planet Holonforth Wolf Wejgaard EuroForth 2016 http://www.euroforth.org/ef16/papers/wejgaard.pdf

1989-09-00 Paper presented at the EuroFORML conference 1989
Not Screens nor Files but Words Wolf Wejgaard https://holonforth.com/ef89.html

Further papers at:
https://www.holonforth.com/papers.html

Ref to Fifth was in:
A Fresh Look at the Forth Dictionary
Wolf Wejgaard
Presented at the EuroFORTH conference 1995 https://www.holonforth.com/ef95.html

I seem to remember Holon running on DOS. Last time I looked I got the impression
that it was a nice mixture of Tcl and Forth, with most fun parts included from Tcl libraries.
Good work, not criticizing, but running an interpreter through another interpreter
seemed rather heavy to me.

Nevertheless the "modules/groups/dictionary/words in a database" is an appealing
concept.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gerry Jackson@21:1/5 to minf...@arcor.de on Mon Jan 16 20:29:23 2023

On 16/01/2023 11:50, minf...@arcor.de wrote:

Historically Forth features a single-pass compiler that is triggered e.g. when the interpreter hits a : colon. Some control flows require code back-patching to resolve jump targets. This can be awkward.

OTOH by going over the code in several passes offers easy paths for simplification and optimization. e.g.
- elimination of comments
- resolving of labels for jump targets
- intermediate language e.g. program as database of instruction lists
- peephole optimization, superinstructions
- tail call elimination

Is there any Forth out there that compiles high-level definitions in this way?
Anybody willing to share some other experience in this field?

My sytem, that has never been released (and won't be), which was
developed from about 2005, compiles Forth code to an intermediate form,
carries out some optimisations, then generates the executable code. I
did it this way (I think) to:
- widen the range of optimisations beyond simple peephole optimisation
- making it more readily portable to different processors and OS's (one
front end, n back ends).
- generating different types of executable code including machine code.

and probably other reasons that I've forgotten.

I did peephole optimisations and super-instructions but then found other
things took priority e.g. using the system, so I've never got round to
getting on with other aims.

It did make things such as quotations easier as it just involves
suspending the outer definition, compiling the quotation, then resuming
the suspended definition. Nested as much as desired.

It enabled me to experiment with other capabilities such as nested colon definitions, declaration of variables inside a definition, different
entry points, not that I've used them for real.

--
Gerry

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Lars Brinkhoff@21:1/5 to All on Tue Jan 17 06:39:48 2023

Leeloo says: multipass.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to mhx@iae.nl on Tue Jan 17 08:36:02 2023

In article <c211daee-ca84-4ec0-993d-c0982b238d9an@googlegroups.com>,
Marcel Hendrix <mhx@iae.nl> wrote:

On Monday, January 16, 2023 at 12:50:28 PM UTC+1, minf...@arcor.de wrote:

Historically Forth features a single-pass compiler that is triggered e.g.
when the interpreter hits a : colon. Some control flows require code
back-patching to resolve jump targets. This can be awkward.

OTOH by going over the code in several passes offers easy paths for
simplification and optimization. e.g.

[..]

Is there any Forth out there that compiles high-level definitions in this way?

Using an intermediate representation is far more powerful, as one can
do recursive 'context-sensitive' inlining (delay compiling as long as possible).

In iForth's iSPICE package I'll eventually use run-time code generation (e.g. >to eliminate pointer-chasing). This delays instruction selection to the limit.

What is "pointer chasing"?

-marcel

--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@arcor.de@21:1/5 to Gerry Jackson on Tue Jan 17 00:11:31 2023

Gerry Jackson schrieb am Montag, 16. Januar 2023 um 21:29:24 UTC+1:

My sytem, that has never been released (and won't be), which was
developed from about 2005, compiles Forth code to an intermediate form, carries out some optimisations, then generates the executable code. I
did it this way (I think) to:
- widen the range of optimisations beyond simple peephole optimisation
- making it more readily portable to different processors and OS's (one
front end, n back ends).
- generating different types of executable code including machine code.

and probably other reasons that I've forgotten.

I did peephole optimisations and super-instructions but then found other things took priority e.g. using the system, so I've never got round to getting on with other aims.

It did make things such as quotations easier as it just involves
suspending the outer definition, compiling the quotation, then resuming
the suspended definition. Nested as much as desired.

So it compiles the innermost definitions/quotations first before compiling outer definitions?

The "classic" way is to jump over quotations, nesting included.
From the reference test:
T{ : q1 [: 1 ;] ; q1 EXECUTE -> 1 }T
T{ : q2 [: [: 2 ;] ;] ; q2 EXECUTE EXECUTE -> 2 }T

Did you experiment with enclosures?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From none) (albert@21:1/5 to minf...@arcor.de on Tue Jan 17 08:50:42 2023

In article <74bc3d99-4179-47d7-bcd2-42114b39feb9n@googlegroups.com>, minf...@arcor.de <minforth@arcor.de> wrote:

Historically Forth features a single-pass compiler that is triggered e.g. >when the interpreter hits a : colon. Some control flows require code >back-patching to resolve jump targets. This can be awkward.

OTOH by going over the code in several passes offers easy paths for >simplification and optimization. e.g.
- elimination of comments
- resolving of labels for jump targets
- intermediate language e.g. program as database of instruction lists
- peephole optimization, superinstructions
- tail call elimination

My optimiser goes through the steps (experimental)

1. find stack effect for all primitive words (through a sophisticated
assembler analysis) ( + stack effects properties)
2. find stack effects for other words, ( + stack effects properties)
3. optimise words from the bottom up,high level, a generalisation of
constant folding
4. inline the machine code. A number of blocks that jumps to each
other results
5. optimise the blocks, individually and as a whole.

It results (in some examples) to a speed up comparably to gforth
versus mpeforth.

Highly experimental.

Is there any Forth out there that compiles high-level definitions in this way? >Anybody willing to share some other experience in this field?

Go to my site below. forthlectures.html View lecture 5.
(This has been announced several times.)

Tail call elimination?
If you're programming for speed, you probably avoid tail calls.
(Not that it is hard to do.)

Groetjes Albert
--
Don't praise the day before the evening. One swallow doesn't make spring.
You must not say "hey" before you have crossed the bridge. Don't sell the
hide of the bear until you shot it. Better one bird in the hand than ten in
the air. First gain is a cat spinning. - the Wise from Antrim -

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@arcor.de@21:1/5 to Anton Ertl on Tue Jan 17 00:23:41 2023

Anton Ertl schrieb am Montag, 16. Januar 2023 um 17:41:20 UTC+1:

iForth and VFX have an intermediate representation that they use for inlining. AFAIK they don't use it for elimination of comments, jump
target resolution, peephole optimization, superinstructions, or
tail-call elimination.

Gforth uses the threaded-code representation (and its relocatable
variant) either directly, or as a step to a mixture of native code and threaded-code. It also buffers the threaded code for a straight-line
piece of code before generating native code, to allow better code
generation.

The VFX docs speak of code inlining vs source inlining, the latter now
improved through tokenizing. This looks similar to gforth's inlining method.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to minf...@arcor.de on Tue Jan 17 09:08:14 2023

"minf...@arcor.de" <minforth@arcor.de> writes:

Anton Ertl schrieb am Montag, 16. Januar 2023 um 17:41:20 UTC+1:

iForth and VFX have an intermediate representation that they use for
inlining. AFAIK they don't use it for elimination of comments, jump
target resolution, peephole optimization, superinstructions, or
tail-call elimination.

Gforth uses the threaded-code representation (and its relocatable
variant) either directly, or as a step to a mixture of native code and
threaded-code. It also buffers the threaded code for a straight-line
piece of code before generating native code, to allow better code
generation.

The VFX docs speak of code inlining vs source inlining, the latter now >improved through tokenizing.

The original source inliner of VFX had the same correctness problem as EVALUATE-based macros. AFAIK The "tokenizing" resolves the names,
BASE etc. when the code is tokenized and therefore does not have this
problem.

This looks similar to gforth's inlining method.

Gforth currently does not inline colon definitions. The native-code
approach has been called "selective inlining" (of primitives) by
Piumarta, and is probably similar to what VFX documents as "code
inlining", but I normally don't call it inlining.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2022: https://euro.theforth.net

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@arcor.de@21:1/5 to Anton Ertl on Tue Jan 17 02:01:41 2023

Anton Ertl schrieb am Dienstag, 17. Januar 2023 um 10:14:42 UTC+1:

"minf...@arcor.de" <minf...@arcor.de> writes:

Anton Ertl schrieb am Montag, 16. Januar 2023 um 17:41:20 UTC+1:

iForth and VFX have an intermediate representation that they use for
inlining. AFAIK they don't use it for elimination of comments, jump
target resolution, peephole optimization, superinstructions, or
tail-call elimination.

Gforth uses the threaded-code representation (and its relocatable
variant) either directly, or as a step to a mixture of native code and
threaded-code. It also buffers the threaded code for a straight-line
piece of code before generating native code, to allow better code
generation.

The VFX docs speak of code inlining vs source inlining, the latter now >improved through tokenizing.

The original source inliner of VFX had the same correctness problem as EVALUATE-based macros. AFAIK The "tokenizing" resolves the names,
BASE etc. when the code is tokenized and therefore does not have this problem.

Macro correctness is an interesting topic as it can depend on changed
machine states. Off topic, but correct selection of what has to be included
in saved closure enviroments falls into the same problem category.

I wonder if the Holon approach (keeping everything in a realtime source database) did not suffer from the same dilemma.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Gerry Jackson@21:1/5 to minf...@arcor.de on Tue Jan 17 10:47:26 2023

On 17/01/2023 08:11, minf...@arcor.de wrote:

Gerry Jackson schrieb am Montag, 16. Januar 2023 um 21:29:24 UTC+1:

My sytem, that has never been released (and won't be), which was
developed from about 2005, compiles Forth code to an intermediate form,
carries out some optimisations, then generates the executable code. I
did it this way (I think) to:
- widen the range of optimisations beyond simple peephole optimisation
- making it more readily portable to different processors and OS's (one
front end, n back ends).
- generating different types of executable code including machine code.

and probably other reasons that I've forgotten.

I did peephole optimisations and super-instructions but then found other
things took priority e.g. using the system, so I've never got round to
getting on with other aims.

It did make things such as quotations easier as it just involves
suspending the outer definition, compiling the quotation, then resuming
the suspended definition. Nested as much as desired.

So it compiles the innermost definitions/quotations first before compiling outer definitions?

The "classic" way is to jump over quotations, nesting included.

Yes a crude, but effective, solution imposed by the traditional single
pass compilation.

From the reference test:
T{ : q1 [: 1 ;] ; q1 EXECUTE -> 1 }T
T{ : q2 [: [: 2 ;] ;] ; q2 EXECUTE EXECUTE -> 2 }T

Did you experiment with enclosures?

Not with the built in quotations. But before quotations were
standardised I did develop an ANS Forth implementation of closures in
2011 that I can let you have if you are interested.

--
Gerry

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From dxforth@21:1/5 to minf...@arcor.de on Tue Jan 17 21:23:54 2023

On 16/01/2023 10:50 pm, minf...@arcor.de wrote:

Historically Forth features a single-pass compiler that is triggered e.g. when the interpreter hits a : colon. Some control flows require code back-patching to resolve jump targets. This can be awkward.

m/c assemblers went from two-pass to one-pass. Trust forth to want to go
in the opposite direction :)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@arcor.de@21:1/5 to Gerry Jackson on Tue Jan 17 04:18:57 2023

Gerry Jackson schrieb am Dienstag, 17. Januar 2023 um 11:47:27 UTC+1:

On 17/01/2023 08:11, minf...@arcor.de wrote:

Gerry Jackson schrieb am Montag, 16. Januar 2023 um 21:29:24 UTC+1:

My sytem, that has never been released (and won't be), which was
developed from about 2005, compiles Forth code to an intermediate form,
carries out some optimisations, then generates the executable code. I
did it this way (I think) to:
- widen the range of optimisations beyond simple peephole optimisation
- making it more readily portable to different processors and OS's (one
front end, n back ends).
- generating different types of executable code including machine code.

and probably other reasons that I've forgotten.

I did peephole optimisations and super-instructions but then found other >> things took priority e.g. using the system, so I've never got round to
getting on with other aims.

It did make things such as quotations easier as it just involves
suspending the outer definition, compiling the quotation, then resuming
the suspended definition. Nested as much as desired.

So it compiles the innermost definitions/quotations first before compiling outer definitions?

The "classic" way is to jump over quotations, nesting included.

Yes a crude, but effective, solution imposed by the traditional single
pass compilation.

From the reference test:
T{ : q1 [: 1 ;] ; q1 EXECUTE -> 1 }T
T{ : q2 [: [: 2 ;] ;] ; q2 EXECUTE EXECUTE -> 2 }T

Did you experiment with enclosures?

Not with the built in quotations. But before quotations were
standardised I did develop an ANS Forth implementation of closures in
2011 that I can let you have if you are interested.

Thank you for your kind offer! I just sent you a private mail.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Matthias Koch@21:1/5 to All on Tue Jan 17 16:06:05 2023

I found multiple passes necessary in Mecrisp-Across to re-join different control flow branches that use a different local register allocation without relying on canonical fallback. If the number of stack elements held in registers at the joint differ,
one needs to use the smaller number of stack elements, which may require additional passes to achieve.

Using the larger number of elements causes non-existing elements to be popped from the stack, at least temporarily. Of course, in valid Forth code, these elements will be pushed back, but may cause reads/writes beyond the beginning of the stack area in
memory. Simply adding a guard space at the beginning of the stack would be a solution to keep it single-pass, but in combination with nested interrupts, this may be riscy. I solved this with multiple passes in the compiler, and generate elegant code
along the way.

Note this compiler is capable of register allocation across ?dup and similiar constructs.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@arcor.de@21:1/5 to Matthias Koch on Wed Jan 18 01:54:03 2023

Matthias Koch schrieb am Dienstag, 17. Januar 2023 um 16:06:07 UTC+1:

I found multiple passes necessary in Mecrisp-Across to re-join different control flow branches that use a different local register allocation without relying on canonical fallback. If the number of stack elements held in registers at the joint differ,

one needs to use the smaller number of stack elements, which may require additional passes to achieve.

Using the larger number of elements causes non-existing elements to be popped from the stack, at least temporarily. Of course, in valid Forth code, these elements will be pushed back, but may cause reads/writes beyond the beginning of the stack area in

memory. Simply adding a guard space at the beginning of the stack would be a solution to keep it single-pass, but in combination with nested interrupts, this may be riscy. I solved this with multiple passes in the compiler, and generate elegant code
along the way.

Note this compiler is capable of register allocation across ?dup and similiar constructs.

Looks like some 'spill and iterate' algorithm. Did you consider spilling costs? I imagine that moves between registers and memory take MUCH more time
than some little register reshuffling at control flow joints.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Matthias Koch@21:1/5 to All on Wed Jan 18 11:16:18 2023

Can you point me to a text describing the solution or existing implementation? I am reshuffling registers as much as possible without spilling them to the stack, but I do not know on how to handle the situation without partial spilling (and subsequent
reshuffling of the common numer of elements in registers) when I have a larger total amount of stack elements in registers than in the other branch, as different branches may have different depth stack effects, and these elements do not exist on the
stack at all for one of the two branches.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Matthias Koch@21:1/5 to All on Wed Jan 18 13:10:52 2023

These slides are a very nice intro to register allocation difficulties - but register spilling therein happens when "we have more elements than registers", whereas in Mecrisp-Across I have enough registers for most of the reasonable code. For example,
2OVER gets 6 deep, so for an embedded MSP430 Forth, it is OK to start spilling at 10 elements deep because of lack of registers. The variable liveness analysis is a different kind of beast than a stack. I do not try to perform liveness analysis at all.

What I do is: When stack elements are handled first time, these are popped from the stack(s) and put in registers, and the compiler tries to keep these in registers unless forced to get back to canonical state by non-optimiseable structures.

This is done in Mecrisp-Stellaris RA (data stack only) and Mecrisp-Quintus with the loadable Acrobatics compiler extension (both data and return stack) which is written in Forth itself, my recommendation if you want a starting point. These do canonical
fallback when encountering control structures.

Mecrisp-Across is one step ahead and keeps the current set of registers when control flow forks execution, and tries to re-merge the two resulting register sets at the joint, if possible by reshuffling one of the branches to get back to a common register
set. When not, it spills the additional elements from the larger branch (which may require an additional pass in my compiler), and reshuffles the remaining ones. This is not done because of lack of registers, but because of the nature of the stacks and
the limited scope of the compiler. At the end of a definition, or before an unknown immediate word is executed, a full canonical fallback is done.

A curious mind one out there wrote a nice blog post analysing the output of the Mecrisp-Across compiler:

https://joldosh.blogspot.com/2019/12/optimized-msp430-assembly-in-mecrisp.html

Certainly there is no caching, as I am targeting tiny classic MSP430 chips like the MSP430F2012 with Mecrisp-Across, therefore register spills are the same cost under all circumstances.

Caching effects apply when running Mecrisp-Stellaris or Mecrisp-Quintus hosted on ARM/RISC-V/MIPS Linux (and thanks to Robert Clausecker FreeBSD), but the family of Mecrisp Forths is designed for deep embedded and the hosted ports are not the main focus
of my efforts.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@arcor.de@21:1/5 to minf...@arcor.de on Wed Jan 18 03:46:59 2023

minf...@arcor.de schrieb am Mittwoch, 18. Januar 2023 um 12:17:46 UTC+1:

Matthias Koch schrieb am Mittwoch, 18. Januar 2023 um 11:16:21 UTC+1:

Can you point me to a text describing the solution or existing implementation? I am reshuffling registers as much as possible without spilling them to the stack, but I do not know on how to handle the situation without partial spilling (and

subsequent reshuffling of the common numer of elements in registers) when I have a larger total amount of stack elements in registers than in the other branch, as different branches may have different depth stack effects, and these elements do not exist
on the stack at all for one of the two branches.

I am certainly much less experienced than you are in this field. I just happen to know
that Chaitin's algortihm includes spilling until new liveliness analysis comes up with
an acceptably colored graph. But AFAIK optimal spilling is still a thing for heuristics
or brute force iteration. Until today there is no known 'formula' for it.

Hence my question out of curiosity whether you have gained new insights. It had not
been clear whether you do liveliness analysis at all.

Add to this that spilling costs are not constant, given the caching effects in many CPUs.
I found only one lecture note that at least mentions this (but does not deal with it later on):
https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/lectures/17/Slides17.pdf

Of course it remains also a design decision of where for a small Forth compiler for
embedded systems reasonable usability ends and where overkill starts.

P.S. since you won't do graph coloring, perhaps this paper may be of more interest because
it also treats with data flow analysis: http://www.christianwimmer.at/Publications/Wimmer10a/Wimmer10a.pdf

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From minforth@arcor.de@21:1/5 to Matthias Koch on Wed Jan 18 03:17:44 2023

Matthias Koch schrieb am Mittwoch, 18. Januar 2023 um 11:16:21 UTC+1:

Can you point me to a text describing the solution or existing implementation? I am reshuffling registers as much as possible without spilling them to the stack, but I do not know on how to handle the situation without partial spilling (and subsequent

reshuffling of the common numer of elements in registers) when I have a larger total amount of stack elements in registers than in the other branch, as different branches may have different depth stack effects, and these elements do not exist on the
stack at all for one of the two branches.

I am certainly much less experienced than you are in this field. I just happen to know
that Chaitin's algortihm includes spilling until new liveliness analysis comes up with
an acceptably colored graph. But AFAIK optimal spilling is still a thing for heuristics
or brute force iteration. Until today there is no known 'formula' for it.

Hence my question out of curiosity whether you have gained new insights. It had not
been clear whether you do liveliness analysis at all.

Add to this that spilling costs are not constant, given the caching effects in many CPUs.
I found only one lecture note that at least mentions this (but does not deal with it later on):
https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/lectures/17/Slides17.pdf

Of course it remains also a design decision of where for a small Forth compiler for
embedded systems reasonable usability ends and where overkill starts.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Matthias Koch@21:1/5 to All on Wed Jan 18 13:18:05 2023

P.S. since you won't do graph coloring, perhaps this paper may be of more interest because
it also treats with data flow analysis: http://www.christianwimmer.at/Publications/Wimmer10a/Wimmer10a.pdf

Thanks, looks nice!
By the way, I do not even use SSA intermediate form.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	36:39:17
Calls:	6,707
Files:	12,239
Messages:	5,353,438

Single-pass or multi-pass Forth compiler

Who's Online

System Info