Forum: >>> Magnum BBS <<<

Good explanation of Recursive Ascent Parsing wanted

From Aaron Gray@21:1/5 to All on Wed Sep 28 20:26:49 2022

I am after a good explanation of Recursive Ascent Parsing as I wish to implement a parser generator to generate recursive ascent C/C++ code from an LR1 grammar.

Regards,

Aaron
[The Wikipedia article isn't bad and has several promising references,
but I wonder if it's worth the effort. RA parsing does what yacc or
bison does, but by turning each state into a function. It's supposed
to be faster than a table driven parser, but LALR parsers are already
so fast, how much faster will it be? Maybe it'll even be slower since
there is a lot more code so it's less likely to fit in a CPU cache.
-John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Aaron Gray on Thu Sep 29 17:49:06 2022

On 2022-09-29, Aaron Gray <aaronngray@gmail.com> wrote:

I am after a good explanation of Recursive Ascent Parsing as I wish to implement a parser generator to generate recursive ascent C/C++ code
from an LR1 grammar.

Regards,

Aaron
[The Wikipedia article isn't bad and has several promising references,
but I wonder if it's worth the effort. RA parsing does what yacc or
bison does, but by turning each state into a function. It's supposed
to be faster than a table driven parser, but LALR parsers are already
so fast, how much faster will it be? Maybe it'll even be slower since
there is a lot more code so it's less likely to fit in a CPU cache.
-John]

John, I suspect the idea is that this is suited for languages that
are compiled well and have good tail call support, to that that
the recursion is stackless.

A table-driven parser is an interpreter for a table. Tables can
be compiled to a goto graph, and a goto graph can be represented
by tail recursion which compiles to goto. This can be faster than the
original table-interpreting loop.

Although, those improvements will succumb to Amdahl's law; you would
best be able to demonstrate the improvement in a program which does
nothing but parse a canned token stream: no lexing is going on and the reductions do nothing (basically the syntax is being validated and
that's all).

It be useful in applications like, say, validating some syntax that is
arriving as a real-time input.

I'd ping the GNU Bison mailing list to see if they are interested
in the technique. It's doable in C; C compilers have good tail
call support. And in any case, the technique doesn't strictly
require functions: a giant yyparse() can be generated which just
contains a self-contained goto graph.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
[You're right but my eyeballs hurt when I think about looking at or
trying to debug the giant goto graph. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Aaron Gray@21:1/5 to Kaz Kylheku on Thu Sep 29 11:55:52 2022

On Thursday, 29 September 2022 at 19:04:47 UTC+1, Kaz Kylheku wrote:

On 2022-09-29, Aaron Gray <> wrote:

I am after a good explanation of Recursive Ascent Parsing as I wish to implement a parser generator to generate recursive ascent C/C++ code
from an LR1 grammar.

Regards,

Aaron
[The Wikipedia article isn't bad and has several promising references,
but I wonder if it's worth the effort. RA parsing does what yacc or
bison does, but by turning each state into a function. It's supposed
to be faster than a table driven parser, but LALR parsers are already
so fast, how much faster will it be? Maybe it'll even be slower since
there is a lot more code so it's less likely to fit in a CPU cache.
-John]

Hi John, I am working on a modern (initially C++) Source to Source
Compiler Compiler that supports all parsing algorithms. I already have
a LEX and YACC like tools LG and PG, that are tools which are library
based.

I did not find the WikiPedia code very enightening. What I did find
was an obvious explanation, follow the item closure spaces !

What I am actually doing is working on marrying Recursive Descent with backtracking with Recursive Ascent for the LR expression bits. Rather
than using an operator grammar for them, well that as well, but yes.

John, I suspect the idea is that this is suited for languages that
are compiled well and have good tail call support, to that that
the recursion is stackless.

Kaz, Oh this sounds potentially very efficient, thank you, I will have
a serious play and see if I can get this to work !

A table-driven parser is an interpreter for a table. Tables can
be compiled to a goto graph, and a goto graph can be represented
by tail recursion which compiles to goto. This can be faster than the original table-interpreting loop.

Although, those improvements will succumb to Amdahl's law; you would
best be able to demonstrate the improvement in a program which does
nothing but parse a canned token stream: no lexing is going on and the reductions do nothing (basically the syntax is being validated and
that's all).

It be useful in applications like, say, validating some syntax that is arriving as a real-time input.

I'd ping the GNU Bison mailing list to see if they are interested
in the technique. It's doable in C; C compilers have good tail
call support. And in any case, the technique doesn't strictly
require functions: a giant yyparse() can be generated which just
contains a self-contained goto graph.

Many thanks !

Aaron

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to Aaron Gray on Thu Sep 29 22:32:10 2022

On 2022-09-29, Aaron Gray <aaronngray@gmail.com> wrote:

Kaz, Oh this sounds potentially very efficient, thank you, I will have
a serious play and see if I can get this to work !

If you look at how Yacc programs typically work: they use goto anyway.

The reason I say this is that the parser actions all get compiled into
one big yyparse() function, right into its body, and end up as targets
of the labels of a switch() statement, or possibly gotos.

It's just that the gotos are computed, with the help of the parser
tables.

So basically what you're doing is eliminating the computed goto; instead
of changing some state variable and returning to the top of a loop to
switch on it, you just jump to the destination directly.

I think there are still situations where a computed goto is inevitable.
The LALR parser is a push-down automaton: where the LR(0) items form the
basic state transitions for recognizing the regular language of the
senential patterns. This is augmented by the stack to handle
the recursion in the grammar.

When reductions occur and the stack is popped, it is not statically
known which state the machine will end up in; so there cannot tbe
a hard coded goto or tail call. This is because the same phrase
structure, e,g, "expression" can occur in multiple contexts.

So here, the goto-based or tail-recursion-based implementation still
requires a computed goto. Thus in C a switch statement would be
used, or the GNU C computed labels; or else if tail calls are used,
then perhaps function pointers could be pushed into the stack.

I'm not looking at this at the required level of detail, but my
intuition for the problem space affords me a modicum of assurance. :)

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anton Ertl@21:1/5 to All on Fri Sep 30 08:05:43 2022

[Maybe it'll even be slower since
there is a lot more code so it's less likely to fit in a CPU cache.
-John]

My experience is: Don't worry about I-cache misses until you have
them. I have seen cases where a lot of code was produced, but it had
enough temporal and spatial locality that I-cache misses were no
problem, as well as a case where we applied a technique that reduced
the code size (with the intent of reducing I-cache misses), but as a
result we saw a slowdown from increased I-cache misses, because the
resulting code had worse spatial locality.

Looking at the numbers in 'Thomas J Penello (1986). "Very fast LR
parsing"', one need not worry much about the I-cache: He describes an experiment with a grammar with 126 productions, resulting in 192
states, 305 terminal transitions and 226 nonterminal transitions in
the LALR(1) parser; the resulting parse tables for the table-driven
parser has 2022 bytes, while the recursive-ascent parser has 5602
bytes (including 397 bytes in common with the table-driven parser).
I-cache sizes these days are typically 32KB, so this parser would be
completely within the I-cache (even if we assume a factor of two from
going from 80286 code to AMD64 code), so one does not even need to
think about locality. Penello also reports that a 158-production
grammar for standard Pascal with a few minor extensions yielded a
5326-line assembly program (which MASM 3.0 failed to assemble); this
should also easily fit into 32KB.

Concerning recursive ascent, my dim memory of the article I read about
it a long time ago says that on reduction a routine does often not
return to its parent caller, but to some ancestor. Since the
introduction of the return stack predictor in microarchitectures in
the mid-1990s, implementing that directly causes at least one
misprediction (~20 cycles), and possibly there are followup
mispredictions from having the predictor stack go out-of-sync with the
real returns. Looking at the Wikipedia page, it says that on
reduction a counter is returned, so you take the returns one at a time
and count down until you arrive at the desired level. This costs some
cycles, too.

Overall the performance of recursive ascent relative to table-driven
is less advantageous than it may have been in the 1980s (before branch prediction and I-caches). Penello reports a speedup by a factor 5.5
for the recursive-ascent parser (which uses assembly language) over
"an interpretive LR parser written in Pascal and compiled by a good
optimizing Pascal compiler" on an 80286. It would be interesting to
see a performance comparison between todays Bison skeletons compiled
with a present-day optimizing C compiler and a recursive ascent parser
on present-day hardware.

- anton
--
M. Anton Ertl
anton@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Johann 'Myrkraverk' Oskarsson@21:1/5 to Aaron Gray on Fri Sep 30 10:45:05 2022

On 9/29/2022 3:26 AM, Aaron Gray wrote:

I am after a good explanation of Recursive Ascent Parsing as I wish to implement a parser generator to generate recursive ascent C/C++ code from an LR1 grammar.

The best explanation I've read for all of this is Holub's Compiler at

https://holub.com/compiler/

though I don't recall if he covers RA specifically. In any case, since
other posters talked about Lex, Yacc, etc., I want to point out this
book since 1) it's the one I personally learned best from, and 2) isn't recommended much [anymore.] It's where I learned more about implement-
ing parser generators that any other more modern resource.

If he does talk about RA at all, it's probably as an exercise or in an
off-hand manner, for which the book may not be worth it.

Good luck,
--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk
[I have the book, and he didn't. Keep in mind that the book had a
stupendous number of errors, so be sure to read the 52 pages of
errata at the back. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From antispam@math.uni.wroc.pl@21:1/5 to Anton Ertl on Thu Oct 6 15:30:53 2022

Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:

[Maybe it'll even be slower since
there is a lot more code so it's less likely to fit in a CPU cache.
-John]

My experience is: Don't worry about I-cache misses until you have
them. I have seen cases where a lot of code was produced, but it had
enough temporal and spatial locality that I-cache misses were no
problem, as well as a case where we applied a technique that reduced
the code size (with the intent of reducing I-cache misses), but as a
result we saw a slowdown from increased I-cache misses, because the
resulting code had worse spatial locality.

Looking at the numbers in 'Thomas J Penello (1986). "Very fast LR
parsing"', one need not worry much about the I-cache: He describes an experiment with a grammar with 126 productions, resulting in 192
states, 305 terminal transitions and 226 nonterminal transitions in
the LALR(1) parser; the resulting parse tables for the table-driven
parser has 2022 bytes, while the recursive-ascent parser has 5602
bytes (including 397 bytes in common with the table-driven parser).
I-cache sizes these days are typically 32KB, so this parser would be completely within the I-cache (even if we assume a factor of two from
going from 80286 code to AMD64 code), so one does not even need to
think about locality. Penello also reports that a 158-production
grammar for standard Pascal with a few minor extensions yielded a
5326-line assembly program (which MASM 3.0 failed to assemble); this
should also easily fit into 32KB.

Karlsruhe team (Grosch and collaborators) had somewhat different
conclusions. When they added extra features needed for production
compiler they noted slowdowns on bigger grammars when translating
to code. OTOH they reported quite good results from table-driven
parsers. Reports were available online, but I do not have links
handy (quite possible that old links are broken now).

Of couse, modern machines tend to have larger caches than the
old ones. But also modern machines are troughput oriented
and my suffer more from latency.

--
Waldek Hebisch

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kaz Kylheku@21:1/5 to antispam@math.uni.wroc.pl on Fri Oct 7 18:57:01 2022

On 2022-10-06, antispam@math.uni.wroc.pl <antispam@math.uni.wroc.pl> wrote:

Of couse, modern machines tend to have larger caches than the
old ones. But also modern machines are throughput oriented
and my suffer more from latency.

But that's the latency of a single instruction you're talking about,
right?. E.g. a deeper pipeline can worsen the time from instruction
fetch to completion, if the clock isn't jacked up and propagation
delays reduced to compensate.

When do you care about single-instruction-level latency, other than if concerned about pipeline stalls in some scenarios?

Throughput translates to low latency at the level of blocks of
instructions or procedures. The procedure call returns a result faster
due to throughput, which is less latency.

The latency you perceive on modern machines as an interactive user is
due to cra^H^H^Hrichly functional software stacks.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	67:06:56
Calls:	6,712
Files:	12,244
Messages:	5,356,405

Good explanation of Recursive Ascent Parsing wanted

Who's Online

System Info