• Single-pass or multi-pass Forth compiler

    From minforth@arcor.de@21:1/5 to All on Mon Jan 16 03:50:26 2023
    Historically Forth features a single-pass compiler that is triggered e.g.
    when the interpreter hits a : colon. Some control flows require code back-patching to resolve jump targets. This can be awkward.

    OTOH by going over the code in several passes offers easy paths for simplification and optimization. e.g.
    - elimination of comments
    - resolving of labels for jump targets
    - intermediate language e.g. program as database of instruction lists
    - peephole optimization, superinstructions
    - tail call elimination

    Is there any Forth out there that compiles high-level definitions in this way? Anybody willing to share some other experience in this field?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jan Coombs@21:1/5 to minf...@arcor.de on Mon Jan 16 14:26:59 2023
    On Mon, 16 Jan 2023 03:50:26 -0800 (PST)
    "minf...@arcor.de" <minforth@arcor.de> wrote:

    - intermediate language e.g. program as database of instruction lists

    Maybe this one in 'Holon Forth' and 'Fifth':

    Planet Holonforth Wolf Wejgaard EuroForth 2016 http://www.euroforth.org/ef16/papers/wejgaard.pdf

    1989-09-00 Paper presented at the EuroFORML conference 1989
    Not Screens nor Files but Words Wolf Wejgaard https://holonforth.com/ef89.html

    Further papers at:
    https://www.holonforth.com/papers.html

    Ref to Fifth was in:
    A Fresh Look at the Forth Dictionary
    Wolf Wejgaard
    Presented at the EuroFORTH conference 1995
    https://www.holonforth.com/ef95.html

    Jan Coombs
    --

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to minf...@arcor.de on Mon Jan 16 16:27:17 2023
    "minf...@arcor.de" <minforth@arcor.de> writes:
    Historically Forth features a single-pass compiler that is triggered e.g. >when the interpreter hits a : colon. Some control flows require code >back-patching to resolve jump targets. This can be awkward.

    OTOH by going over the code in several passes offers easy paths for >simplification and optimization. e.g.
    - elimination of comments
    - resolving of labels for jump targets
    - intermediate language e.g. program as database of instruction lists
    - peephole optimization, superinstructions
    - tail call elimination

    Is there any Forth out there that compiles high-level definitions in this way? >Anybody willing to share some other experience in this field?

    Passes in the original sense have gone out of fashion once RAM sizes
    were big enough. Compilers often have intermediate representations
    for purposes such as analysis and optimization, and for modularization purposes.

    iForth and VFX have an intermediate representation that they use for
    inlining. AFAIK they don't use it for elimination of comments, jump
    target resolution, peephole optimization, superinstructions, or
    tail-call elimination.

    Gforth uses the threaded-code representation (and its relocatable
    variant) either directly, or as a step to a mixture of native code and threaded-code. It also buffers the threaded code for a straight-line
    piece of code before generating native code, to allow better code
    generation.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans Bezemer@21:1/5 to minf...@arcor.de on Mon Jan 16 08:43:06 2023
    On Monday, January 16, 2023 at 12:50:28 PM UTC+1, minf...@arcor.de wrote:
    OTOH by going over the code in several passes offers easy paths for simplification and optimization. e.g.
    - elimination of comments
    I can't think of a Forth compiler that DOESN'T. It simply ignores the stuff:

    : \ blk @ IF >in @ c/l /f 1+ c/l * >in ! EXIT THEN source >in ! drop ; immediate

    - resolving of labels for jump targets
    Again - I can't think of a Forth compiler that doesn't. References are placed on
    some stack and used as target addresses when jumping back or as source addresses to be patched later when jumping forward.

    - peephole optimization, superinstructions
    4tH has a peephole optimizer used for strength reduction, constant folding and dead code elimination. It only looks back a single instruction, but can in some circumstances work recursive.

    - tail call elimination
    4tH's optimizer does that one as well.

    Is there any Forth out there that compiles high-level definitions in this way?
    Anybody willing to share some other experience in this field?
    It works fine and transparent - unless you insist to meddle with the return stack
    in non-portable ways.

    https://sourceforge.net/p/forth-4th/wiki/Optimization/

    Hans Bezemer

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marcel Hendrix@21:1/5 to minf...@arcor.de on Mon Jan 16 09:39:28 2023
    On Monday, January 16, 2023 at 12:50:28 PM UTC+1, minf...@arcor.de wrote:
    Historically Forth features a single-pass compiler that is triggered e.g. when the interpreter hits a : colon. Some control flows require code back-patching to resolve jump targets. This can be awkward.

    OTOH by going over the code in several passes offers easy paths for simplification and optimization. e.g.
    [..]
    Is there any Forth out there that compiles high-level definitions in this way?

    Using an intermediate representation is far more powerful, as one can
    do recursive 'context-sensitive' inlining (delay compiling as long as possible).

    In iForth's iSPICE package I'll eventually use run-time code generation (e.g. to eliminate pointer-chasing). This delays instruction selection to the limit. However, it only works if the target application is sufficiently unsophisticated,
    or has predictable execution behavior (like SPICE). I'm sure that there would be many snags would I try it for Forth itself.

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to Marcel Hendrix on Mon Jan 16 11:47:13 2023
    Marcel Hendrix schrieb am Montag, 16. Januar 2023 um 18:39:29 UTC+1:
    On Monday, January 16, 2023 at 12:50:28 PM UTC+1, minf...@arcor.de wrote:
    Historically Forth features a single-pass compiler that is triggered e.g. when the interpreter hits a : colon. Some control flows require code back-patching to resolve jump targets. This can be awkward.

    OTOH by going over the code in several passes offers easy paths for simplification and optimization. e.g.
    [..]
    Is there any Forth out there that compiles high-level definitions in this way?
    Using an intermediate representation is far more powerful, as one can
    do recursive 'context-sensitive' inlining (delay compiling as long as possible).

    In iForth's iSPICE package I'll eventually use run-time code generation (e.g. to eliminate pointer-chasing). This delays instruction selection to the limit.
    However, it only works if the target application is sufficiently unsophisticated,
    or has predictable execution behavior (like SPICE). I'm sure that there would be many snags would I try it for Forth itself.


    Interesting. I never considered lazy evaluation for Forth.

    Without climbing up to Haskell's heights, there are simple "lazy evaluators in Python
    (copied from Stackoverflow):
    ######
    In a nutshell, lazy evaluation means that the object is evaluated when it is needed,
    not when it is created.
    In Python 2, range will return a list - this means that if you give it a large number,
    it will calculate the range and return at the time of creation:
    i = range(100)
    type(i)
    <type 'list'>
    In Python 3, however you get a special range object:
    i = range(100)
    type(i)
    <class 'range'>
    Only when you consume it, will it actually be evaluated - in other words, it will only return
    the numbers in the range when you actually need them.
    ######

    As knee reflex ISTM to have many similarities with returning macros. It would be
    quite easy in Forth to return xt's of words containing EVALUATEs or [[ ]] gforthisms
    for delayed evaluation.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to Jan Coombs on Mon Jan 16 11:29:33 2023
    Jan Coombs schrieb am Montag, 16. Januar 2023 um 16:05:04 UTC+1:
    On Mon, 16 Jan 2023 03:50:26 -0800 (PST)
    "minf...@arcor.de" <minf...@arcor.de> wrote:

    - intermediate language e.g. program as database of instruction lists
    Maybe this one in 'Holon Forth' and 'Fifth':

    Planet Holonforth Wolf Wejgaard EuroForth 2016 http://www.euroforth.org/ef16/papers/wejgaard.pdf

    1989-09-00 Paper presented at the EuroFORML conference 1989
    Not Screens nor Files but Words Wolf Wejgaard https://holonforth.com/ef89.html

    Further papers at:
    https://www.holonforth.com/papers.html

    Ref to Fifth was in:
    A Fresh Look at the Forth Dictionary
    Wolf Wejgaard
    Presented at the EuroFORTH conference 1995 https://www.holonforth.com/ef95.html

    I seem to remember Holon running on DOS. Last time I looked I got the impression
    that it was a nice mixture of Tcl and Forth, with most fun parts included from Tcl libraries.
    Good work, not criticizing, but running an interpreter through another interpreter
    seemed rather heavy to me.

    Nevertheless the "modules/groups/dictionary/words in a database" is an appealing
    concept.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gerry Jackson@21:1/5 to minf...@arcor.de on Mon Jan 16 20:29:23 2023
    On 16/01/2023 11:50, minf...@arcor.de wrote:
    Historically Forth features a single-pass compiler that is triggered e.g. when the interpreter hits a : colon. Some control flows require code back-patching to resolve jump targets. This can be awkward.

    OTOH by going over the code in several passes offers easy paths for simplification and optimization. e.g.
    - elimination of comments
    - resolving of labels for jump targets
    - intermediate language e.g. program as database of instruction lists
    - peephole optimization, superinstructions
    - tail call elimination

    Is there any Forth out there that compiles high-level definitions in this way?
    Anybody willing to share some other experience in this field?

    My sytem, that has never been released (and won't be), which was
    developed from about 2005, compiles Forth code to an intermediate form,
    carries out some optimisations, then generates the executable code. I
    did it this way (I think) to:
    - widen the range of optimisations beyond simple peephole optimisation
    - making it more readily portable to different processors and OS's (one
    front end, n back ends).
    - generating different types of executable code including machine code.

    and probably other reasons that I've forgotten.

    I did peephole optimisations and super-instructions but then found other
    things took priority e.g. using the system, so I've never got round to
    getting on with other aims.

    It did make things such as quotations easier as it just involves
    suspending the outer definition, compiling the quotation, then resuming
    the suspended definition. Nested as much as desired.

    It enabled me to experiment with other capabilities such as nested colon definitions, declaration of variables inside a definition, different
    entry points, not that I've used them for real.

    --
    Gerry

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Lars Brinkhoff@21:1/5 to All on Tue Jan 17 06:39:48 2023
    Leeloo says: multipass.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to mhx@iae.nl on Tue Jan 17 08:36:02 2023
    In article <c211daee-ca84-4ec0-993d-c0982b238d9an@googlegroups.com>,
    Marcel Hendrix <mhx@iae.nl> wrote:
    On Monday, January 16, 2023 at 12:50:28 PM UTC+1, minf...@arcor.de wrote:
    Historically Forth features a single-pass compiler that is triggered e.g.
    when the interpreter hits a : colon. Some control flows require code
    back-patching to resolve jump targets. This can be awkward.

    OTOH by going over the code in several passes offers easy paths for
    simplification and optimization. e.g.
    [..]
    Is there any Forth out there that compiles high-level definitions in this way?

    Using an intermediate representation is far more powerful, as one can
    do recursive 'context-sensitive' inlining (delay compiling as long as possible).

    In iForth's iSPICE package I'll eventually use run-time code generation (e.g. >to eliminate pointer-chasing). This delays instruction selection to the limit.

    What is "pointer chasing"?


    -marcel
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to Gerry Jackson on Tue Jan 17 00:11:31 2023
    Gerry Jackson schrieb am Montag, 16. Januar 2023 um 21:29:24 UTC+1:
    My sytem, that has never been released (and won't be), which was
    developed from about 2005, compiles Forth code to an intermediate form, carries out some optimisations, then generates the executable code. I
    did it this way (I think) to:
    - widen the range of optimisations beyond simple peephole optimisation
    - making it more readily portable to different processors and OS's (one
    front end, n back ends).
    - generating different types of executable code including machine code.

    and probably other reasons that I've forgotten.

    I did peephole optimisations and super-instructions but then found other things took priority e.g. using the system, so I've never got round to getting on with other aims.

    It did make things such as quotations easier as it just involves
    suspending the outer definition, compiling the quotation, then resuming
    the suspended definition. Nested as much as desired.

    So it compiles the innermost definitions/quotations first before compiling outer definitions?

    The "classic" way is to jump over quotations, nesting included.
    From the reference test:
    T{ : q1 [: 1 ;] ; q1 EXECUTE -> 1 }T
    T{ : q2 [: [: 2 ;] ;] ; q2 EXECUTE EXECUTE -> 2 }T

    Did you experiment with enclosures?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From none) (albert@21:1/5 to minf...@arcor.de on Tue Jan 17 08:50:42 2023
    In article <74bc3d99-4179-47d7-bcd2-42114b39feb9n@googlegroups.com>, minf...@arcor.de <minforth@arcor.de> wrote:
    Historically Forth features a single-pass compiler that is triggered e.g. >when the interpreter hits a : colon. Some control flows require code >back-patching to resolve jump targets. This can be awkward.

    OTOH by going over the code in several passes offers easy paths for >simplification and optimization. e.g.
    - elimination of comments
    - resolving of labels for jump targets
    - intermediate language e.g. program as database of instruction lists
    - peephole optimization, superinstructions
    - tail call elimination

    My optimiser goes through the steps (experimental)

    1. find stack effect for all primitive words (through a sophisticated
    assembler analysis) ( + stack effects properties)
    2. find stack effects for other words, ( + stack effects properties)
    3. optimise words from the bottom up,high level, a generalisation of
    constant folding
    4. inline the machine code. A number of blocks that jumps to each
    other results
    5. optimise the blocks, individually and as a whole.

    It results (in some examples) to a speed up comparably to gforth
    versus mpeforth.

    Highly experimental.


    Is there any Forth out there that compiles high-level definitions in this way? >Anybody willing to share some other experience in this field?

    Go to my site below. forthlectures.html View lecture 5.
    (This has been announced several times.)

    Tail call elimination?
    If you're programming for speed, you probably avoid tail calls.
    (Not that it is hard to do.)

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat spinning. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to Anton Ertl on Tue Jan 17 00:23:41 2023
    Anton Ertl schrieb am Montag, 16. Januar 2023 um 17:41:20 UTC+1:
    iForth and VFX have an intermediate representation that they use for inlining. AFAIK they don't use it for elimination of comments, jump
    target resolution, peephole optimization, superinstructions, or
    tail-call elimination.

    Gforth uses the threaded-code representation (and its relocatable
    variant) either directly, or as a step to a mixture of native code and threaded-code. It also buffers the threaded code for a straight-line
    piece of code before generating native code, to allow better code
    generation.

    The VFX docs speak of code inlining vs source inlining, the latter now
    improved through tokenizing. This looks similar to gforth's inlining method.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to minf...@arcor.de on Tue Jan 17 09:08:14 2023
    "minf...@arcor.de" <minforth@arcor.de> writes:
    Anton Ertl schrieb am Montag, 16. Januar 2023 um 17:41:20 UTC+1:
    iForth and VFX have an intermediate representation that they use for
    inlining. AFAIK they don't use it for elimination of comments, jump
    target resolution, peephole optimization, superinstructions, or
    tail-call elimination.

    Gforth uses the threaded-code representation (and its relocatable
    variant) either directly, or as a step to a mixture of native code and
    threaded-code. It also buffers the threaded code for a straight-line
    piece of code before generating native code, to allow better code
    generation.

    The VFX docs speak of code inlining vs source inlining, the latter now >improved through tokenizing.

    The original source inliner of VFX had the same correctness problem as EVALUATE-based macros. AFAIK The "tokenizing" resolves the names,
    BASE etc. when the code is tokenized and therefore does not have this
    problem.

    This looks similar to gforth's inlining method.

    Gforth currently does not inline colon definitions. The native-code
    approach has been called "selective inlining" (of primitives) by
    Piumarta, and is probably similar to what VFX documents as "code
    inlining", but I normally don't call it inlining.

    - anton
    --
    M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
    comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
    New standard: https://forth-standard.org/
    EuroForth 2022: https://euro.theforth.net

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to Anton Ertl on Tue Jan 17 02:01:41 2023
    Anton Ertl schrieb am Dienstag, 17. Januar 2023 um 10:14:42 UTC+1:
    "minf...@arcor.de" <minf...@arcor.de> writes:
    Anton Ertl schrieb am Montag, 16. Januar 2023 um 17:41:20 UTC+1:
    iForth and VFX have an intermediate representation that they use for
    inlining. AFAIK they don't use it for elimination of comments, jump
    target resolution, peephole optimization, superinstructions, or
    tail-call elimination.

    Gforth uses the threaded-code representation (and its relocatable
    variant) either directly, or as a step to a mixture of native code and
    threaded-code. It also buffers the threaded code for a straight-line
    piece of code before generating native code, to allow better code
    generation.

    The VFX docs speak of code inlining vs source inlining, the latter now >improved through tokenizing.
    The original source inliner of VFX had the same correctness problem as EVALUATE-based macros. AFAIK The "tokenizing" resolves the names,
    BASE etc. when the code is tokenized and therefore does not have this problem.

    Macro correctness is an interesting topic as it can depend on changed
    machine states. Off topic, but correct selection of what has to be included
    in saved closure enviroments falls into the same problem category.

    I wonder if the Holon approach (keeping everything in a realtime source database) did not suffer from the same dilemma.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gerry Jackson@21:1/5 to minf...@arcor.de on Tue Jan 17 10:47:26 2023
    On 17/01/2023 08:11, minf...@arcor.de wrote:
    Gerry Jackson schrieb am Montag, 16. Januar 2023 um 21:29:24 UTC+1:
    My sytem, that has never been released (and won't be), which was
    developed from about 2005, compiles Forth code to an intermediate form,
    carries out some optimisations, then generates the executable code. I
    did it this way (I think) to:
    - widen the range of optimisations beyond simple peephole optimisation
    - making it more readily portable to different processors and OS's (one
    front end, n back ends).
    - generating different types of executable code including machine code.

    and probably other reasons that I've forgotten.

    I did peephole optimisations and super-instructions but then found other
    things took priority e.g. using the system, so I've never got round to
    getting on with other aims.

    It did make things such as quotations easier as it just involves
    suspending the outer definition, compiling the quotation, then resuming
    the suspended definition. Nested as much as desired.

    So it compiles the innermost definitions/quotations first before compiling outer definitions?

    The "classic" way is to jump over quotations, nesting included.

    Yes a crude, but effective, solution imposed by the traditional single
    pass compilation.

    From the reference test:
    T{ : q1 [: 1 ;] ; q1 EXECUTE -> 1 }T
    T{ : q2 [: [: 2 ;] ;] ; q2 EXECUTE EXECUTE -> 2 }T

    Did you experiment with enclosures?

    Not with the built in quotations. But before quotations were
    standardised I did develop an ANS Forth implementation of closures in
    2011 that I can let you have if you are interested.

    --
    Gerry

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From dxforth@21:1/5 to minf...@arcor.de on Tue Jan 17 21:23:54 2023
    On 16/01/2023 10:50 pm, minf...@arcor.de wrote:
    Historically Forth features a single-pass compiler that is triggered e.g. when the interpreter hits a : colon. Some control flows require code back-patching to resolve jump targets. This can be awkward.

    m/c assemblers went from two-pass to one-pass. Trust forth to want to go
    in the opposite direction :)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to Gerry Jackson on Tue Jan 17 04:18:57 2023
    Gerry Jackson schrieb am Dienstag, 17. Januar 2023 um 11:47:27 UTC+1:
    On 17/01/2023 08:11, minf...@arcor.de wrote:
    Gerry Jackson schrieb am Montag, 16. Januar 2023 um 21:29:24 UTC+1:
    My sytem, that has never been released (and won't be), which was
    developed from about 2005, compiles Forth code to an intermediate form,
    carries out some optimisations, then generates the executable code. I
    did it this way (I think) to:
    - widen the range of optimisations beyond simple peephole optimisation
    - making it more readily portable to different processors and OS's (one
    front end, n back ends).
    - generating different types of executable code including machine code.

    and probably other reasons that I've forgotten.

    I did peephole optimisations and super-instructions but then found other >> things took priority e.g. using the system, so I've never got round to
    getting on with other aims.

    It did make things such as quotations easier as it just involves
    suspending the outer definition, compiling the quotation, then resuming
    the suspended definition. Nested as much as desired.

    So it compiles the innermost definitions/quotations first before compiling outer definitions?

    The "classic" way is to jump over quotations, nesting included.
    Yes a crude, but effective, solution imposed by the traditional single
    pass compilation.
    From the reference test:
    T{ : q1 [: 1 ;] ; q1 EXECUTE -> 1 }T
    T{ : q2 [: [: 2 ;] ;] ; q2 EXECUTE EXECUTE -> 2 }T

    Did you experiment with enclosures?
    Not with the built in quotations. But before quotations were
    standardised I did develop an ANS Forth implementation of closures in
    2011 that I can let you have if you are interested.

    Thank you for your kind offer! I just sent you a private mail.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matthias Koch@21:1/5 to All on Tue Jan 17 16:06:05 2023
    I found multiple passes necessary in Mecrisp-Across to re-join different control flow branches that use a different local register allocation without relying on canonical fallback. If the number of stack elements held in registers at the joint differ,
    one needs to use the smaller number of stack elements, which may require additional passes to achieve.

    Using the larger number of elements causes non-existing elements to be popped from the stack, at least temporarily. Of course, in valid Forth code, these elements will be pushed back, but may cause reads/writes beyond the beginning of the stack area in
    memory. Simply adding a guard space at the beginning of the stack would be a solution to keep it single-pass, but in combination with nested interrupts, this may be riscy. I solved this with multiple passes in the compiler, and generate elegant code
    along the way.

    Note this compiler is capable of register allocation across ?dup and similiar constructs.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to Matthias Koch on Wed Jan 18 01:54:03 2023
    Matthias Koch schrieb am Dienstag, 17. Januar 2023 um 16:06:07 UTC+1:
    I found multiple passes necessary in Mecrisp-Across to re-join different control flow branches that use a different local register allocation without relying on canonical fallback. If the number of stack elements held in registers at the joint differ,
    one needs to use the smaller number of stack elements, which may require additional passes to achieve.

    Using the larger number of elements causes non-existing elements to be popped from the stack, at least temporarily. Of course, in valid Forth code, these elements will be pushed back, but may cause reads/writes beyond the beginning of the stack area in
    memory. Simply adding a guard space at the beginning of the stack would be a solution to keep it single-pass, but in combination with nested interrupts, this may be riscy. I solved this with multiple passes in the compiler, and generate elegant code
    along the way.

    Note this compiler is capable of register allocation across ?dup and similiar constructs.

    Looks like some 'spill and iterate' algorithm. Did you consider spilling costs? I imagine that moves between registers and memory take MUCH more time
    than some little register reshuffling at control flow joints.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matthias Koch@21:1/5 to All on Wed Jan 18 11:16:18 2023
    Can you point me to a text describing the solution or existing implementation? I am reshuffling registers as much as possible without spilling them to the stack, but I do not know on how to handle the situation without partial spilling (and subsequent
    reshuffling of the common numer of elements in registers) when I have a larger total amount of stack elements in registers than in the other branch, as different branches may have different depth stack effects, and these elements do not exist on the
    stack at all for one of the two branches.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matthias Koch@21:1/5 to All on Wed Jan 18 13:10:52 2023
    These slides are a very nice intro to register allocation difficulties - but register spilling therein happens when "we have more elements than registers", whereas in Mecrisp-Across I have enough registers for most of the reasonable code. For example,
    2OVER gets 6 deep, so for an embedded MSP430 Forth, it is OK to start spilling at 10 elements deep because of lack of registers. The variable liveness analysis is a different kind of beast than a stack. I do not try to perform liveness analysis at all.

    What I do is: When stack elements are handled first time, these are popped from the stack(s) and put in registers, and the compiler tries to keep these in registers unless forced to get back to canonical state by non-optimiseable structures.

    This is done in Mecrisp-Stellaris RA (data stack only) and Mecrisp-Quintus with the loadable Acrobatics compiler extension (both data and return stack) which is written in Forth itself, my recommendation if you want a starting point. These do canonical
    fallback when encountering control structures.

    Mecrisp-Across is one step ahead and keeps the current set of registers when control flow forks execution, and tries to re-merge the two resulting register sets at the joint, if possible by reshuffling one of the branches to get back to a common register
    set. When not, it spills the additional elements from the larger branch (which may require an additional pass in my compiler), and reshuffles the remaining ones. This is not done because of lack of registers, but because of the nature of the stacks and
    the limited scope of the compiler. At the end of a definition, or before an unknown immediate word is executed, a full canonical fallback is done.

    A curious mind one out there wrote a nice blog post analysing the output of the Mecrisp-Across compiler:

    https://joldosh.blogspot.com/2019/12/optimized-msp430-assembly-in-mecrisp.html

    Certainly there is no caching, as I am targeting tiny classic MSP430 chips like the MSP430F2012 with Mecrisp-Across, therefore register spills are the same cost under all circumstances.

    Caching effects apply when running Mecrisp-Stellaris or Mecrisp-Quintus hosted on ARM/RISC-V/MIPS Linux (and thanks to Robert Clausecker FreeBSD), but the family of Mecrisp Forths is designed for deep embedded and the hosted ports are not the main focus
    of my efforts.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to minf...@arcor.de on Wed Jan 18 03:46:59 2023
    minf...@arcor.de schrieb am Mittwoch, 18. Januar 2023 um 12:17:46 UTC+1:
    Matthias Koch schrieb am Mittwoch, 18. Januar 2023 um 11:16:21 UTC+1:
    Can you point me to a text describing the solution or existing implementation? I am reshuffling registers as much as possible without spilling them to the stack, but I do not know on how to handle the situation without partial spilling (and
    subsequent reshuffling of the common numer of elements in registers) when I have a larger total amount of stack elements in registers than in the other branch, as different branches may have different depth stack effects, and these elements do not exist
    on the stack at all for one of the two branches.
    I am certainly much less experienced than you are in this field. I just happen to know
    that Chaitin's algortihm includes spilling until new liveliness analysis comes up with
    an acceptably colored graph. But AFAIK optimal spilling is still a thing for heuristics
    or brute force iteration. Until today there is no known 'formula' for it.

    Hence my question out of curiosity whether you have gained new insights. It had not
    been clear whether you do liveliness analysis at all.

    Add to this that spilling costs are not constant, given the caching effects in many CPUs.
    I found only one lecture note that at least mentions this (but does not deal with it later on):
    https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/lectures/17/Slides17.pdf

    Of course it remains also a design decision of where for a small Forth compiler for
    embedded systems reasonable usability ends and where overkill starts.

    P.S. since you won't do graph coloring, perhaps this paper may be of more interest because
    it also treats with data flow analysis: http://www.christianwimmer.at/Publications/Wimmer10a/Wimmer10a.pdf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to Matthias Koch on Wed Jan 18 03:17:44 2023
    Matthias Koch schrieb am Mittwoch, 18. Januar 2023 um 11:16:21 UTC+1:
    Can you point me to a text describing the solution or existing implementation? I am reshuffling registers as much as possible without spilling them to the stack, but I do not know on how to handle the situation without partial spilling (and subsequent
    reshuffling of the common numer of elements in registers) when I have a larger total amount of stack elements in registers than in the other branch, as different branches may have different depth stack effects, and these elements do not exist on the
    stack at all for one of the two branches.

    I am certainly much less experienced than you are in this field. I just happen to know
    that Chaitin's algortihm includes spilling until new liveliness analysis comes up with
    an acceptably colored graph. But AFAIK optimal spilling is still a thing for heuristics
    or brute force iteration. Until today there is no known 'formula' for it.

    Hence my question out of curiosity whether you have gained new insights. It had not
    been clear whether you do liveliness analysis at all.

    Add to this that spilling costs are not constant, given the caching effects in many CPUs.
    I found only one lecture note that at least mentions this (but does not deal with it later on):
    https://web.stanford.edu/class/archive/cs/cs143/cs143.1128/lectures/17/Slides17.pdf

    Of course it remains also a design decision of where for a small Forth compiler for
    embedded systems reasonable usability ends and where overkill starts.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matthias Koch@21:1/5 to All on Wed Jan 18 13:18:05 2023
    P.S. since you won't do graph coloring, perhaps this paper may be of more interest because
    it also treats with data flow analysis: http://www.christianwimmer.at/Publications/Wimmer10a/Wimmer10a.pdf

    Thanks, looks nice!
    By the way, I do not even use SSA intermediate form.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)