I can't think of any other branch of computer science where there is such a rich foundation upon which to develop software.
Hello Compiler Experts!
I am learning the algorithms and theory underlying parsing. Wow! I am discovering that parsing has such a rich theoretical foundation: grammars, LL,
LR, first() function, following() function, parsing tables, etc.
I can't think of any other branch of computer science where there is such a rich foundation upon which to develop software.
I am learning the algorithms and theory underlying parsing. Wow! I am >discovering that parsing has such a rich theoretical foundation: grammars, LL, >LR, first() function, following() function, parsing tables, etc....
In all the software that I have written, I am lucky if I can use one or two >algorithms that have a theoretical foundation. Mostly, the software is one-off >code.
Are compiler developers light-years ahead of other software development?
Yes, LL and LR parser generators have been around for decades. My
compiler class in the late 1970s had me implement both of them.
However, shortly after that parser technology was considered basically
a settled problem, while in actuality many parser generators are hard
to use and people don't understand it when they output state machines
and conflict messages freak compiler writers out, etc.
As a result, despite all the technology we could be bringing to bare
on compilers, as far as I can tell, more than half the compilers in
actual use are done via hand-written recursive descent with little
hacks here and there to get them to work.
This points to one reason why some people don't use these generators.
You need to know quite a bit about the implementation technique behind
it to understand what (for LR-based parser generators) a shift-reduce
or reduce-reduce conflict means and how to fix that problem.
Are compiler developers light-years ahead of other software development?
In other words, a major compiler for probably
the programming language with the most
complicated syntax ever, eschews pretty much
all that we have learned and accumulated about
parsing between around 1968 and now.
None of the buzzwords you mentioned related to parsing are needed in
building a compiler: grammars, LL, LR, first() function, following() >function, parsing tables, etc.
All that goes away if you write recursive descent parser. The grammar is
then in the documentation and code comments only, and somewhat expressed
in its code structure. There is no LR, first or follow, no tables.
The GNU C++ compiler, undeniably a production compiler and a >major/known/widely-used one, has a big recursive-descent parser
maintained by hand: over a megabyte of code.
In other words, a major compiler for probably the programming language
with the most complicated syntax ever, eschews pretty much all that we
have learned and accumulated about parsing between around 1968 and now.
On Friday, January 21, 2022 at 2:30:42 PM UTC-8, Anton Ertl wrote:
You need to know quite a bit about the implementation technique behind
it to understand what (for LR-based parser generators) a shift-reduce
or reduce-reduce conflict means and how to fix that problem.
Interesting reason, as I know exactly how I learned about that.
(And not from compiler class, where I might have.)
In compiling a program, not actually changing it, the instructions warn
that it will get a shift-reduce conflict warning, and to ignore it.
There is a whole separate thread on ambiguous languages, which is
where these come from. In a language with an if-then-optional else
construct, and which can be nested, there is an ambiguity in which
if an else belongs to. In most such languages it goes to the nearest >(deepest nesting) if, and this is the default from the warning.
reduce-reduce results from an actual mistake in the language,
and does need to be fixed.
Kaz Kylheku wrote this about the C++ compiler:
In other words, a major compiler for probably
the programming language with the most
complicated syntax ever, eschews pretty much
all that we have learned and accumulated about
parsing between around 1968 and now.
Yikes!
They ignored the rich theory and vast set of algorithms, in favor of
their own proprietary code? Why would the C++ compiler developers do
such a thing?
/Roger
[My guess is that they were too busy chopping down trees to sharpen their axes. -John]
Kaz Kylheku <480-992-1380@kylheku.com> writes:
None of the buzzwords you mentioned related to parsing are needed in >>building a compiler: grammars, LL, LR, first() function, following() >>function, parsing tables, etc.
All that goes away if you write recursive descent parser. The grammar is >>then in the documentation and code comments only, and somewhat expressed
in its code structure. There is no LR, first or follow, no tables.
As our moderator notes, recursive descent is an implementation
technique for LL(1) grammars. Moreover, you need first-sets to
implement them. If your grammar is
A -> B|C
your recursive-descent parser becomes:
if next_symbol in first(B) then parse_B else parse_C fi
If you want to learn about conflicts in your grammar early on (rather
than finding it out later by having some input be parsed other than intended), you also need follow-sets: if your grammar is
A -> B|epsilon
there is a conflict if "intersection(first(b),follow(A))" (where "U" signifies the union) is not empty: if one of the symbols in the
intersection comes up at the start of an A, do you parse B or not?
Of course, if you are freshly designing a programming language, you
may be fine with declaring the actual behaviour of the
redursive-descent parser to be the intended behaviour. But if you
have to implement a given grammar, things become more complex.
They ignored the rich theory and vast set of algorithms, in favor of their own
proprietary code? Why would the C++ compiler developers do such a thing?
/Roger
[My guess is that they were too busy chopping down trees to sharpen their axes. -John]
In my experience bison/yacc parsers are really good for knowing the exact language that you are parsing.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 286 |
Nodes: | 16 (2 / 14) |
Uptime: | 84:05:46 |
Calls: | 6,495 |
Calls today: | 6 |
Files: | 12,097 |
Messages: | 5,276,894 |