• Good explanation of Recursive Ascent Parsing wanted

    From Aaron Gray@21:1/5 to All on Wed Sep 28 20:26:49 2022
    I am after a good explanation of Recursive Ascent Parsing as I wish to implement a parser generator to generate recursive ascent C/C++ code from an LR1 grammar.

    Regards,

    Aaron
    [The Wikipedia article isn't bad and has several promising references,
    but I wonder if it's worth the effort. RA parsing does what yacc or
    bison does, but by turning each state into a function. It's supposed
    to be faster than a table driven parser, but LALR parsers are already
    so fast, how much faster will it be? Maybe it'll even be slower since
    there is a lot more code so it's less likely to fit in a CPU cache.
    -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Aaron Gray on Thu Sep 29 17:49:06 2022
    On 2022-09-29, Aaron Gray <aaronngray@gmail.com> wrote:
    I am after a good explanation of Recursive Ascent Parsing as I wish to implement a parser generator to generate recursive ascent C/C++ code
    from an LR1 grammar.

    Regards,

    Aaron
    [The Wikipedia article isn't bad and has several promising references,
    but I wonder if it's worth the effort. RA parsing does what yacc or
    bison does, but by turning each state into a function. It's supposed
    to be faster than a table driven parser, but LALR parsers are already
    so fast, how much faster will it be? Maybe it'll even be slower since
    there is a lot more code so it's less likely to fit in a CPU cache.
    -John]

    John, I suspect the idea is that this is suited for languages that
    are compiled well and have good tail call support, to that that
    the recursion is stackless.

    A table-driven parser is an interpreter for a table. Tables can
    be compiled to a goto graph, and a goto graph can be represented
    by tail recursion which compiles to goto. This can be faster than the
    original table-interpreting loop.

    Although, those improvements will succumb to Amdahl's law; you would
    best be able to demonstrate the improvement in a program which does
    nothing but parse a canned token stream: no lexing is going on and the reductions do nothing (basically the syntax is being validated and
    that's all).

    It be useful in applications like, say, validating some syntax that is
    arriving as a real-time input.

    I'd ping the GNU Bison mailing list to see if they are interested
    in the technique. It's doable in C; C compilers have good tail
    call support. And in any case, the technique doesn't strictly
    require functions: a giant yyparse() can be generated which just
    contains a self-contained goto graph.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    [You're right but my eyeballs hurt when I think about looking at or
    trying to debug the giant goto graph. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Aaron Gray@21:1/5 to Kaz Kylheku on Thu Sep 29 11:55:52 2022
    On Thursday, 29 September 2022 at 19:04:47 UTC+1, Kaz Kylheku wrote:
    On 2022-09-29, Aaron Gray <> wrote:
    I am after a good explanation of Recursive Ascent Parsing as I wish to implement a parser generator to generate recursive ascent C/C++ code
    from an LR1 grammar.

    Regards,

    Aaron
    [The Wikipedia article isn't bad and has several promising references,
    but I wonder if it's worth the effort. RA parsing does what yacc or
    bison does, but by turning each state into a function. It's supposed
    to be faster than a table driven parser, but LALR parsers are already
    so fast, how much faster will it be? Maybe it'll even be slower since
    there is a lot more code so it's less likely to fit in a CPU cache.
    -John]

    Hi John, I am working on a modern (initially C++) Source to Source
    Compiler Compiler that supports all parsing algorithms. I already have
    a LEX and YACC like tools LG and PG, that are tools which are library
    based.

    I did not find the WikiPedia code very enightening. What I did find
    was an obvious explanation, follow the item closure spaces !

    What I am actually doing is working on marrying Recursive Descent with backtracking with Recursive Ascent for the LR expression bits. Rather
    than using an operator grammar for them, well that as well, but yes.

    John, I suspect the idea is that this is suited for languages that
    are compiled well and have good tail call support, to that that
    the recursion is stackless.

    Kaz, Oh this sounds potentially very efficient, thank you, I will have
    a serious play and see if I can get this to work !

    A table-driven parser is an interpreter for a table. Tables can
    be compiled to a goto graph, and a goto graph can be represented
    by tail recursion which compiles to goto. This can be faster than the original table-interpreting loop.

    Although, those improvements will succumb to Amdahl's law; you would
    best be able to demonstrate the improvement in a program which does
    nothing but parse a canned token stream: no lexing is going on and the reductions do nothing (basically the syntax is being validated and
    that's all).

    It be useful in applications like, say, validating some syntax that is arriving as a real-time input.

    I'd ping the GNU Bison mailing list to see if they are interested
    in the technique. It's doable in C; C compilers have good tail
    call support. And in any case, the technique doesn't strictly
    require functions: a giant yyparse() can be generated which just
    contains a self-contained goto graph.

    Many thanks !

    Aaron

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Aaron Gray on Thu Sep 29 22:32:10 2022
    On 2022-09-29, Aaron Gray <aaronngray@gmail.com> wrote:
    Kaz, Oh this sounds potentially very efficient, thank you, I will have
    a serious play and see if I can get this to work !

    If you look at how Yacc programs typically work: they use goto anyway.

    The reason I say this is that the parser actions all get compiled into
    one big yyparse() function, right into its body, and end up as targets
    of the labels of a switch() statement, or possibly gotos.

    It's just that the gotos are computed, with the help of the parser
    tables.

    So basically what you're doing is eliminating the computed goto; instead
    of changing some state variable and returning to the top of a loop to
    switch on it, you just jump to the destination directly.

    I think there are still situations where a computed goto is inevitable.
    The LALR parser is a push-down automaton: where the LR(0) items form the
    basic state transitions for recognizing the regular language of the
    senential patterns. This is augmented by the stack to handle
    the recursion in the grammar.

    When reductions occur and the stack is popped, it is not statically
    known which state the machine will end up in; so there cannot tbe
    a hard coded goto or tail call. This is because the same phrase
    structure, e,g, "expression" can occur in multiple contexts.

    So here, the goto-based or tail-recursion-based implementation still
    requires a computed goto. Thus in C a switch statement would be
    used, or the GNU C computed labels; or else if tail calls are used,
    then perhaps function pointers could be pushed into the stack.

    I'm not looking at this at the required level of detail, but my
    intuition for the problem space affords me a modicum of assurance. :)

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to All on Fri Sep 30 08:05:43 2022
    [Maybe it'll even be slower since
    there is a lot more code so it's less likely to fit in a CPU cache.
    -John]

    My experience is: Don't worry about I-cache misses until you have
    them. I have seen cases where a lot of code was produced, but it had
    enough temporal and spatial locality that I-cache misses were no
    problem, as well as a case where we applied a technique that reduced
    the code size (with the intent of reducing I-cache misses), but as a
    result we saw a slowdown from increased I-cache misses, because the
    resulting code had worse spatial locality.

    Looking at the numbers in 'Thomas J Penello (1986). "Very fast LR
    parsing"', one need not worry much about the I-cache: He describes an experiment with a grammar with 126 productions, resulting in 192
    states, 305 terminal transitions and 226 nonterminal transitions in
    the LALR(1) parser; the resulting parse tables for the table-driven
    parser has 2022 bytes, while the recursive-ascent parser has 5602
    bytes (including 397 bytes in common with the table-driven parser).
    I-cache sizes these days are typically 32KB, so this parser would be
    completely within the I-cache (even if we assume a factor of two from
    going from 80286 code to AMD64 code), so one does not even need to
    think about locality. Penello also reports that a 158-production
    grammar for standard Pascal with a few minor extensions yielded a
    5326-line assembly program (which MASM 3.0 failed to assemble); this
    should also easily fit into 32KB.

    Concerning recursive ascent, my dim memory of the article I read about
    it a long time ago says that on reduction a routine does often not
    return to its parent caller, but to some ancestor. Since the
    introduction of the return stack predictor in microarchitectures in
    the mid-1990s, implementing that directly causes at least one
    misprediction (~20 cycles), and possibly there are followup
    mispredictions from having the predictor stack go out-of-sync with the
    real returns. Looking at the Wikipedia page, it says that on
    reduction a counter is returned, so you take the returns one at a time
    and count down until you arrive at the desired level. This costs some
    cycles, too.

    Overall the performance of recursive ascent relative to table-driven
    is less advantageous than it may have been in the 1980s (before branch prediction and I-caches). Penello reports a speedup by a factor 5.5
    for the recursive-ascent parser (which uses assembly language) over
    "an interpretive LR parser written in Pascal and compiled by a good
    optimizing Pascal compiler" on an 80286. It would be interesting to
    see a performance comparison between todays Bison skeletons compiled
    with a present-day optimizing C compiler and a recursive ascent parser
    on present-day hardware.

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Johann 'Myrkraverk' Oskarsson@21:1/5 to Aaron Gray on Fri Sep 30 10:45:05 2022
    On 9/29/2022 3:26 AM, Aaron Gray wrote:
    I am after a good explanation of Recursive Ascent Parsing as I wish to implement a parser generator to generate recursive ascent C/C++ code from an LR1 grammar.

    The best explanation I've read for all of this is Holub's Compiler at

    https://holub.com/compiler/

    though I don't recall if he covers RA specifically. In any case, since
    other posters talked about Lex, Yacc, etc., I want to point out this
    book since 1) it's the one I personally learned best from, and 2) isn't recommended much [anymore.] It's where I learned more about implement-
    ing parser generators that any other more modern resource.

    If he does talk about RA at all, it's probably as an exercise or in an
    off-hand manner, for which the book may not be worth it.

    Good luck,
    --
    Johann | email: invalid -> com | www.myrkraverk.com/blog/
    I'm not from the Internet, I just work there. | twitter: @myrkraverk
    [I have the book, and he didn't. Keep in mind that the book had a
    stupendous number of errors, so be sure to read the 52 pages of
    errata at the back. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Anton Ertl on Thu Oct 6 15:30:53 2022
    Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    [Maybe it'll even be slower since
    there is a lot more code so it's less likely to fit in a CPU cache.
    -John]

    My experience is: Don't worry about I-cache misses until you have
    them. I have seen cases where a lot of code was produced, but it had
    enough temporal and spatial locality that I-cache misses were no
    problem, as well as a case where we applied a technique that reduced
    the code size (with the intent of reducing I-cache misses), but as a
    result we saw a slowdown from increased I-cache misses, because the
    resulting code had worse spatial locality.

    Looking at the numbers in 'Thomas J Penello (1986). "Very fast LR
    parsing"', one need not worry much about the I-cache: He describes an experiment with a grammar with 126 productions, resulting in 192
    states, 305 terminal transitions and 226 nonterminal transitions in
    the LALR(1) parser; the resulting parse tables for the table-driven
    parser has 2022 bytes, while the recursive-ascent parser has 5602
    bytes (including 397 bytes in common with the table-driven parser).
    I-cache sizes these days are typically 32KB, so this parser would be completely within the I-cache (even if we assume a factor of two from
    going from 80286 code to AMD64 code), so one does not even need to
    think about locality. Penello also reports that a 158-production
    grammar for standard Pascal with a few minor extensions yielded a
    5326-line assembly program (which MASM 3.0 failed to assemble); this
    should also easily fit into 32KB.

    Karlsruhe team (Grosch and collaborators) had somewhat different
    conclusions. When they added extra features needed for production
    compiler they noted slowdowns on bigger grammars when translating
    to code. OTOH they reported quite good results from table-driven
    parsers. Reports were available online, but I do not have links
    handy (quite possible that old links are broken now).

    Of couse, modern machines tend to have larger caches than the
    old ones. But also modern machines are troughput oriented
    and my suffer more from latency.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to antispam@math.uni.wroc.pl on Fri Oct 7 18:57:01 2022
    On 2022-10-06, antispam@math.uni.wroc.pl <antispam@math.uni.wroc.pl> wrote:
    Of couse, modern machines tend to have larger caches than the
    old ones. But also modern machines are throughput oriented
    and my suffer more from latency.

    But that's the latency of a single instruction you're talking about,
    right?. E.g. a deeper pipeline can worsen the time from instruction
    fetch to completion, if the clock isn't jacked up and propagation
    delays reduced to compensate.

    When do you care about single-instruction-level latency, other than if concerned about pipeline stalls in some scenarios?

    Throughput translates to low latency at the level of blocks of
    instructions or procedures. The procedure call returns a result faster
    due to throughput, which is less latency.

    The latency you perceive on modern machines as an interactive user is
    due to cra^H^H^Hrichly functional software stacks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)