• Are compiler developers light-years ahead of other software development

    From Roger L Costello@21:1/5 to All on Sun Jan 16 14:36:30 2022
    Hello Compiler Experts!

    I am learning the algorithms and theory underlying parsing. Wow! I am discovering that parsing has such a rich theoretical foundation: grammars, LL, LR, first() function, following() function, parsing tables, etc.

    In all the software that I have written, I am lucky if I can use one or two algorithms that have a theoretical foundation. Mostly, the software is one-off code.

    The mathematician Alfred North Whitehead has this wonderful phrase [1]:

    ... the slow accumulation of clear and relevant ideas.

    The parsing community has done that - the community has slowly accumulated clear and relevant ideas.

    I can't think of any other branch of computer science where there is such a rich foundation upon which to develop software.

    Are compiler developers light-years ahead of other software development?

    /Roger

    [1] An Introduction to Mathematics by Alfred North Whitehead, p. 19.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Philipp Klaus Krause@21:1/5 to Roger L Costello on Sun Jan 16 22:13:15 2022
    On 16.01.22 15:36, Roger L Costello wrote:
    I can't think of any other branch of computer science where there is such a rich foundation upon which to develop software.

    There are others. But a compiler is something that most softare
    developers use, and thus are aware of.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Roger L Costello on Mon Jan 17 07:14:48 2022
    On Sunday, January 16, 2022 at 9:26:38 AM UTC-8, Roger L Costello wrote:
    Hello Compiler Experts!

    I am learning the algorithms and theory underlying parsing. Wow! I am discovering that parsing has such a rich theoretical foundation: grammars, LL,
    LR, first() function, following() function, parsing tables, etc.

    (snip)

    I can't think of any other branch of computer science where there is such a rich foundation upon which to develop software.

    Thinking about D.E.Knuth's "The Art of Computer Programming", there is a whole book (volume 3), "Sorting and Searching". A large fraction of important algorithms rely on the foundation of search algorithms. (And if you sort, you can
    use binary search to find thinks.)

    Now, some of the book is based on the now lost art of sorting data on
    magnetic tape, where one has only sequential access to most of the data.

    Others will tell you that Quicksort is all you need to know, and forget
    the rest of the book.

    In any case, I vote for sorting and searching algorithms as the rich theoretical
    foundation of much of CS. Hash tables are in important part
    of many compilers. Dynamic programming algorithms are used in
    many optimization problems, and fundamental to many algorithms.

    The dynamic programming algorithm used by the Unix diff command,
    to find an optimal set of differences between two files, originated
    from biologists comparing protein sequences.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Roger L Costello on Wed Jan 19 21:33:57 2022
    Roger L Costello <costello@mitre.org> writes:
    I am learning the algorithms and theory underlying parsing. Wow! I am >discovering that parsing has such a rich theoretical foundation: grammars, LL, >LR, first() function, following() function, parsing tables, etc.

    In all the software that I have written, I am lucky if I can use one or two >algorithms that have a theoretical foundation. Mostly, the software is one-off >code.
    ...
    Are compiler developers light-years ahead of other software development?

    There are multiple facets to this question:

    1) Compilers are decades ahead of much other software development: If
    you look at early CS curricula (I have looked at one from IIRC 1965),
    you see compilers and a few other topics, but many of the topics that
    now fill the curricula did not even exist then. E.g., BNF was
    invented before Algol 60 was published in 1960, while the relational
    model for data bases is from 1970, the concept of NP-completeness is
    from 1971, and RISC architectures (in the form of the IBM 801) were
    invented in the mid-1970s. And if you look at currently hot topics
    like deep learning, cloud computing and IoT, their practical relevance
    is much younger (although I guess you can point to 1960s work for each
    of them that points in that direction).

    2) As I have written some time ago:

    |Compilers are ideal software projects: You have relatively stable and |well-defined, partly even formal specifications for the input and the
    |output, and lots of interesting subproblems in between.
    |
    |In particular, the scanning part of compilers has been formalized
    |using regular languages and is expressed in regular expressions, which
    |have become not only popular for scanning programming languages, but
    |for many other tasks as well. The parsing part has been formalized
    |with context-free grammars in (extended) BNF and that has led to the |development of parser generators that are quite popular in language |implementations.

    There is also similar formalization in instruction selection (although
    my impression is that generators are not as widely used there as in
    the front end).

    However, in between things are more like more common programming, even
    though people have tried to propagate the formalization success in
    these areas, too.

    And I think that points to why we don't see that much of that kind of formalization in most other areas: context-free grammars are something
    that most programming language designers want to use, so developing
    methods for parsing such languages and putting them in parser
    generators has a significant payoff, while a formal semantics becomes
    so inflexible that most programming language designers prefer to stick
    to more or less informal semantics (in some cases with some parts
    formalized), and then of course you implement the semantics with a
    less formal approach, too. And for most other software, it's similar.

    3) The love of formal stuff can lead you astray. E.g., the first
    Fortran compiler did not report syntax errors.

    Of course, mistakes also happen in less formalized software; my point
    is that if you focus your attention on the nice formal stuff, you can
    easily overlook that your software has other aspects, too, that you
    need to cover.

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/
    [I am reasonably sure the original Fortran compiler reported syntax
    errors. A 1957 paper at bitsavers says it did: http://bitsavers.org/pdf/ibm/704/FORTRAN_paper_1957.pdf
    -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher F Clark@21:1/5 to All on Thu Jan 20 21:11:51 2022
    Sadly, I would like to offer an alternate perspective on how far
    compilers are than other areas of software development.

    Yes, LL and LR parser generators have been around for decades. My
    compiler class in the late 1970s had me implement both of them.
    However, shortly after that parser technology was considered basically
    a settled problem, while in actuality many parser generators are hard
    to use and people don't understand it when they output state machines
    and conflict messages freak compiler writers out, etc.

    As a result, despite all the technology we could be bringing to bare
    on compilers, as far as I can tell, more than half the compilers in
    actual use are done via hand-written recursive descent with little
    hacks here and there to get them to work. The prevalence of "ad hack"
    compilers should be embarrassing to anyone who wants to talk about
    software "engineering".

    There is good theory we could put to use building better compilers and
    more reliable languages. However, most of it is languishing and going
    to waste.

    At least that's what this compiler developer sees when he looks at the landscape. But maybe I am biased in that perspective....

    ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------ [There's a definite "too busy chopping down trees to sharpen our axes" vibe here. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Christopher F Clark on Fri Jan 21 17:38:02 2022
    Christopher F Clark <christopher.f.clark@compiler-resources.com> writes:
    Yes, LL and LR parser generators have been around for decades. My
    compiler class in the late 1970s had me implement both of them.
    However, shortly after that parser technology was considered basically
    a settled problem, while in actuality many parser generators are hard
    to use and people don't understand it when they output state machines
    and conflict messages freak compiler writers out, etc.

    This points to one reason why some people don't use these generators.
    You need to know quite a bit about the implementation technique behind
    it to understand what (for LR-based parser generators) a shift-reduce
    or reduce-reduce conflict means and how to fix that problem.

    As a result, despite all the technology we could be bringing to bare
    on compilers, as far as I can tell, more than half the compilers in
    actual use are done via hand-written recursive descent with little
    hacks here and there to get them to work.

    The "little hacks" point to another reason: How to deal with cases
    where LL or LR technology, or (probably more often) context-free
    grammars don't quite fit, e.g., when you want to do a compatible
    extension of an existing language, like C++ (which was intended to be compatible with C). My impression is that Terrence Parr attacked this
    problem head-on by formalizing these hacks (to some extent) with
    syntactic and semantic predicates in PCCTS/ANTLR.

    There is also the problem of interfacing between the parser and the
    semantic analysis and back end of the compiler. The yacc interface is
    not that great, but for some reason improvements such as ox have not
    taken over the world (maybe because ox is based on attribute grammars
    which is quite a leap for imperative programmers).

    Finally, a reason is that some people don't want to depend on software
    written by others. E.g., Wirth apparently considered it simpler to
    write a recursive-descent parser by hand than to write a parser
    generator and use that for implementing a parser. And he even wrote a
    compiler text book that teaches this approach.

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Anton Ertl on Fri Jan 21 17:40:06 2022
    On Friday, January 21, 2022 at 2:30:42 PM UTC-8, Anton Ertl wrote:

    (snip)

    This points to one reason why some people don't use these generators.
    You need to know quite a bit about the implementation technique behind
    it to understand what (for LR-based parser generators) a shift-reduce
    or reduce-reduce conflict means and how to fix that problem.

    Interesting reason, as I know exactly how I learned about that.
    (And not from compiler class, where I might have.)

    In compiling a program, not actually changing it, the instructions warn
    that it will get a shift-reduce conflict warning, and to ignore it.

    There is a whole separate thread on ambiguous languages, which is
    where these come from. In a language with an if-then-optional else
    construct, and which can be nested, there is an ambiguity in which
    if an else belongs to. In most such languages it goes to the nearest
    (deepest nesting) if, and this is the default from the warning.

    reduce-reduce results from an actual mistake in the language,
    and does need to be fixed.

    I believe the program was Kermit, but I am not so sure now the
    details. I believe that it has an implementation language that
    translates to C, using a lex/yacc parser. The Makefile does
    it all for you, but you have to know to ignore the message.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Roger L Costello on Sat Jan 22 03:01:09 2022
    On 2022-01-16, Roger L Costello <costello@mitre.org> wrote:
    Are compiler developers light-years ahead of other software development?

    I'd say it's pretty impressive how, say, the Google Assistant can speak completely naturally (and in multiple languages). I mean, details like intonation across entire sentences and such; nothing "robot-like".

    Fantastic advances have been done in computer graphics in the last 30
    years.

    Compilers don't all use fancy algorithms, or not all of them. Fancy
    algorithms are specialized, optimized (in particular ways) solutions to problems that have other solutions that are pedestrian. Sometimes the
    other solutions are actually faster on your input cases.

    None of the buzzwords you mentioned related to parsing are needed in
    building a compiler: grammars, LL, LR, first() function, following()
    function, parsing tables, etc.

    All that goes away if you write recursive descent parser. The grammar is
    then in the documentation and code comments only, and somewhat expressed
    in its code structure. There is no LR, first or follow, no tables.

    The GNU C++ compiler, undeniably a production compiler and a major/known/widely-used one, has a big recursive-descent parser
    maintained by hand: over a megabyte of code.

    In other words, a major compiler for probably the programming language
    with the most complicated syntax ever, eschews pretty much all that we
    have learned and accumulated about parsing between around 1968 and now.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    [In principle a r-d parser recognizes an LL(1) grammar, in practice people often cheat a little. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roger L Costello@21:1/5 to Kaz Kylheku on Sat Jan 22 12:50:07 2022
    Kaz Kylheku wrote this about the C++ compiler:

    In other words, a major compiler for probably
    the programming language with the most
    complicated syntax ever, eschews pretty much
    all that we have learned and accumulated about
    parsing between around 1968 and now.

    Yikes!

    They ignored the rich theory and vast set of algorithms, in favor of their own proprietary code? Why would the C++ compiler developers do such a thing?

    /Roger
    [My guess is that they were too busy chopping down trees to sharpen their axes. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to Kaz Kylheku on Sat Jan 22 10:43:39 2022
    Kaz Kylheku <480-992-1380@kylheku.com> writes:
    None of the buzzwords you mentioned related to parsing are needed in
    building a compiler: grammars, LL, LR, first() function, following() >function, parsing tables, etc.

    All that goes away if you write recursive descent parser. The grammar is
    then in the documentation and code comments only, and somewhat expressed
    in its code structure. There is no LR, first or follow, no tables.

    As our moderator notes, recursive descent is an implementation
    technique for LL(1) grammars. Moreover, you need first-sets to
    implement them. If your grammar is

    A -> B|C

    your recursive-descent parser becomes:

    if next_symbol in first(B) then parse_B else parse_C fi

    If you want to learn about conflicts in your grammar early on (rather
    than finding it out later by having some input be parsed other than
    intended), you also need follow-sets: if your grammar is

    A -> B|epsilon

    there is a conflict if "intersection(first(b),follow(A))" (where "U"
    signifies the union) is not empty: if one of the symbols in the
    intersection comes up at the start of an A, do you parse B or not?

    Of course, if you are freshly designing a programming language, you
    may be fine with declaring the actual behaviour of the
    redursive-descent parser to be the intended behaviour. But if you
    have to implement a given grammar, things become more complex.

    You can also design the syntax to make recursive-descent parsing easy,
    like Wirth did. E.g., almost all statements start with a keyword, and
    there is AFAIK only one statement that may be started by several
    symbols (none of them conflicting with one of the other statement
    keywords), so you can easily implement statements like this:

    if next_symbol=IF_SYM then ifstatement()
    else if next_symbol=WHILE_SYM then whilestatement()
    else if ...
    ...
    else assignmentstatement()

    All statements are followed by ";" or END, making it easy to know that
    there are no conflicts with following statements.

    These properties also make it easier for humans to understand
    grammars, so they are a good idea in syntax design. If you go further
    in that direction, you get to Lisp (which does not need a BNF-based
    parser) or Forth (where the parser just looks for the next white
    space). Of course, if you look at textbooks for these languages, you
    find that you have patterns at a different ("static semantics") level,
    but at least it's always clear in these language from the syntax which
    if an else-branch belongs to.

    The GNU C++ compiler, undeniably a production compiler and a >major/known/widely-used one, has a big recursive-descent parser
    maintained by hand: over a megabyte of code.

    In other words, a major compiler for probably the programming language
    with the most complicated syntax ever, eschews pretty much all that we
    have learned and accumulated about parsing between around 1968 and now.

    C++ is the language that inspired Terrence Parr to add synactic and
    semantic predicates to PCCTS/ANTLR (not sure what has been added since
    I heard him talk about it in the early 1990s). So basically parser
    generators like yacc were not sufficient for the problems posed by the
    C++ syntax; when G++ was started in 1987, no free parser generator had
    such features, so the decision to go with a hand-written parser is understandable, and of course since then there were two good reasons
    to stay there: 1) Transition costs. 2) Backwards compatibility.

    Looking further back, AFAIK Cfront also used a hand-written parser;
    this (together with the requirement of C compatibility) resulted in a
    syntax design that was not constrained by the parser generators of the
    time, but of course also a syntax that was beyond their capabilities.

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/
    [Nope, cfront used a yacc parser with a whole lot of %left and %right declarations to make shift/reduce warnings go away. Legend says that
    much odd C++ syntax is from poorly understood consequences of those declarations. See it at http://www.softwarepreservation.org/projects/c_plus_plus/index.html#cfront -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to gah4@u.washington.edu on Sat Jan 22 09:34:58 2022
    gah4 <gah4@u.washington.edu> writes:
    On Friday, January 21, 2022 at 2:30:42 PM UTC-8, Anton Ertl wrote:
    You need to know quite a bit about the implementation technique behind
    it to understand what (for LR-based parser generators) a shift-reduce
    or reduce-reduce conflict means and how to fix that problem.

    Interesting reason, as I know exactly how I learned about that.
    (And not from compiler class, where I might have.)

    In compiling a program, not actually changing it, the instructions warn
    that it will get a shift-reduce conflict warning, and to ignore it.

    That means that the program's authors have (hopefully) determined that
    the default resolution (shift) for the shift/reduce conflicts in that
    program was what they intended. It does not tell you that shift
    resulution is the intended behaviour for all cases (otherwise there
    would be no point in reporting the conflict).

    There is a whole separate thread on ambiguous languages, which is
    where these come from. In a language with an if-then-optional else
    construct, and which can be nested, there is an ambiguity in which
    if an else belongs to. In most such languages it goes to the nearest >(deepest nesting) if, and this is the default from the warning.

    That's not the only occurence of shift/reduce conflicts, but it's a
    relatively prominent one, because many compiler writers leave it to
    the parser generator to resolve it. However, when developing a
    grammar, I have seen cases (even for relatively small languages) with
    many conflicts (both shift/reduce and reduce/reduce). I always change
    the grammar to eliminate all the conflicts. Often the syntax is
    changed by these changes, which is outside the scope of pure compiler
    writers, but inside the scope of language designers.

    Concerning the Algol 60 case, the syntax is unambiguous (from <https://www.masswerk.at/algol60/report.htm>):

    |<if clause> ::= if <Boolean expression> then
    |<unconditional statement> ::= <basic statement> |
    | <compound statement> | <block>
    |<if statement> ::= <if clause> <unconditional statement>
    |<conditional statement> ::= <if statement> |
    | <if statement> else <statement> |
    | <if clause> <for statement> |
    | <label>: <conditional statement>

    Statements such as

    if B1 then if B2 then S1 else S1

    and even

    if B1 then if B2 then S1

    are syntax errors. You cannot put an if-statement or a conditional
    statement in a then-branch without wrapping it in a begin...end.

    Languages like Pascal and C (and its descendants) changed this to
    allow either, making the grammar ambiguous, and using prose to resolve
    the ambiguity. E.g., for Pascal:

    |IfStatement = "if" BooleanExpression "then" Statement [ "else" Statement ].
    |
    |Note: The syntactic ambiguity arising from the construct
    |
    |if e1 then if e2 then sl else s2
    |
    |is resolved by interpreting the construct as equivalent to
    |
    |if e1 then begin if e2 then sl else s2 end

    I consider either grammar to be flawed; Algol fixed the flaw in Algol
    68, Wirth fixed it in Modula, but the C-syntax descendants have
    unfortunately never fixed it.

    <https://en.wikipedia.org/wiki/Dangling_else#Avoiding_the_conflict_in_LR_parsers>
    describes how to express the intent in the grammar rather than in
    prose, but looking at the example grammars makes it clear why compiler
    writers prefer to rely on the parser-generator resolution of the
    shift/reduce conflict rather than expressing the intent in the
    grammar.

    Yacc has precedence and associativity declarations to provide
    additional information to the parser generator on how to resolve
    ambiguity in the BNF (and for suppressing the warnings about conflicts
    that are resolved by these declarations). Something like this might
    be a good solution for dealing with the dangling else if you cannot
    avoid it by fixing the syntax.

    reduce-reduce results from an actual mistake in the language,
    and does need to be fixed.

    There can be cases where the default behaviour is the intended
    behaviour, too, but IIRC the default depends on the order of rules, so
    could be destroyed by a seemingly innocuous change to the parser
    generator input. And there are (probably many) cases where either
    reduction is intended in some cases, so the generator default is wrong
    for some cases for all rule orders.

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/
    [The original sin of dangling else may have been COBOL. A 1964 IBM
    manual for 7080 COBOL says ELSE matches the closest IF. PL/I inherited
    it shortly aftrwards. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Roger L Costello on Sat Jan 22 21:22:31 2022
    On 2022-01-22, Roger L Costello <costello@mitre.org> wrote:
    Kaz Kylheku wrote this about the C++ compiler:

    In other words, a major compiler for probably
    the programming language with the most
    complicated syntax ever, eschews pretty much
    all that we have learned and accumulated about
    parsing between around 1968 and now.

    Yikes!

    They ignored the rich theory and vast set of algorithms, in favor of
    their own proprietary code? Why would the C++ compiler developers do
    such a thing?

    Wild guess:

    Suppose that there is a foo_statement which contains a bar_designator,
    and something is wrong in there. They can fire up gdb, and put a simple breakpoint on bar_designator, feed in the test case and get a call stack
    in which parse_bar_designator is called by parse_foo_statement.
    They can examine all the locals, and arguments up the stack.

    Typically, nothing like this is easily possible with the
    theoretically-based parser generation tools.

    And in C++, you're likely going to be debugging parsing quite a bit.

    The main parser generation tools used by GNU projects are Flex and
    Bison. These tools are moving targets; especially Bison. The common
    practice is to ship the generated parser.

    (In a fundamental compiler project, you have to for other reasons, like
    the users not having thta tool installed, because maybe they need your
    compiler to build it: you want as few dependencies as possible.)

    Now GCC is hacked on quite a bit and has lots of contributors. It would
    be annoying to tell people "Oh, just use the generated, shipped
    parsers if you're not touching the grammar; if you need to regenerate,
    please use the exact version Bison X.Y.Z.".

    If a parser is hand-written, that whole sort of problem goes away.

    /Roger
    [My guess is that they were too busy chopping down trees to sharpen their axes. -John]

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Anton Ertl on Sat Jan 22 21:38:38 2022
    On 2022-01-22, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    Kaz Kylheku <480-992-1380@kylheku.com> writes:
    None of the buzzwords you mentioned related to parsing are needed in >>building a compiler: grammars, LL, LR, first() function, following() >>function, parsing tables, etc.

    All that goes away if you write recursive descent parser. The grammar is >>then in the documentation and code comments only, and somewhat expressed
    in its code structure. There is no LR, first or follow, no tables.

    As our moderator notes, recursive descent is an implementation
    technique for LL(1) grammars. Moreover, you need first-sets to
    implement them. If your grammar is

    A -> B|C

    your recursive-descent parser becomes:

    I touched on this point with "All that goes away if you write recursive
    descent parser. The grammar is then in the documentation and code
    comments only, and somewhat expressed in its code structure."

    So it's not that we don't have a grammar or don't know that it's LL;
    perhaps we do. But the tooling doesn't.

    if next_symbol in first(B) then parse_B else parse_C fi

    In a given parser, there might be no conscious connection to any such theoretical entities, even though the theory readily identifies those
    elements in the resulting work.

    (Kind of like a musician who plays by ear is using certain musical
    devices that he might not be able to name, but music scholars can.)

    If you want to learn about conflicts in your grammar early on (rather
    than finding it out later by having some input be parsed other than intended), you also need follow-sets: if your grammar is

    A -> B|epsilon

    there is a conflict if "intersection(first(b),follow(A))" (where "U" signifies the union) is not empty: if one of the symbols in the
    intersection comes up at the start of an A, do you parse B or not?

    You can easily parse both ways and pick one. There are readily available backtracking techniques, even in blub languages.

    Years ago I wrote a C expression parser which did this quite well using exceptions based on setjmp and longjmp.

    (Because it parsed expression shapes outside of any declaration context,
    so it woudln't know, say, whether (A)(B) is the expression (B) being
    cast to type (A), or function (A) being called with argument (B). The
    goal of the parser was to determine whether the expression could have
    side effects, with a tri-valued result: no, possibly, yes.)

    "Parse expression grammars" formalize this more.

    That sort of approach has to be carefully wielded, because you can
    get poor performance due to explosion of the search space, with several
    levels of speculative parsing.

    Of course, if you are freshly designing a programming language, you
    may be fine with declaring the actual behaviour of the
    redursive-descent parser to be the intended behaviour. But if you
    have to implement a given grammar, things become more complex.

    If you have to implement a given grammar, with any tool or approach,
    you need to have, or develop as you go along, a comprehensive regression
    test suite, hitting as many of the the corner cases as possible.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ian Lance Taylor@21:1/5 to costello@mitre.org on Sat Jan 22 15:40:02 2022
    On Sat, Jan 22, 2022 at 3:26 PM Roger L Costello <costello@mitre.org> wrote:


    They ignored the rich theory and vast set of algorithms, in favor of their own
    proprietary code? Why would the C++ compiler developers do such a thing?

    /Roger
    [My guess is that they were too busy chopping down trees to sharpen their axes. -John]


    The change in GCC away from using bison to a recursive descent parser was a considered decision on the part of the GCC C++ maintainers. I believe that
    the project started at
    https://gcc.gnu.org/legacy-ml/gcc/2000-10/msg00573.html and it was
    committed to the tree in December, 2002.

    In my experience bison/yacc parsers are really good for knowing the exact language that you are parsing. But it is difficult to make them generate
    good error messages and it is difficult to debug them. A language like C++
    can only be parsed in conjunction with a symbol table, because the parsing depends on the semantic meaning of tokens; that too is difficult to do
    using yacc. So while I agree that yacc parsers are theoretically superior,
    I believe that experience shows that they have some problems in practice.

    In fact, I would be interested in knowing whether there is any generally
    used C++ compiler that uses a yacc parser. For example, clang also uses a recursive descent parser.

    Ian

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Ian Lance Taylor on Sun Jan 23 06:17:36 2022
    On 2022-01-22, Ian Lance Taylor <ianlancetaylor@gmail.com> wrote:
    In my experience bison/yacc parsers are really good for knowing the exact language that you are parsing.

    In my experience, you can easily know what language you're processing
    with Yacc is if you avoid its /ad hoc/ features for suppressing conflict messages. Namely:

    - %left, %right and %nonassoc declarations for tokens.

    - %prec in rules.

    Otherwise, you're likely going to be relying on your regression test
    cases to inform you what language you're parsing.

    The exception is that %left, %right are readily understood if they are
    only used for the operator tokens in the productions for a binary
    operator expression grammar (that likely being their motivating use case).

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)