• RE: Why no shift-shift conflicts.

    From Christopher F Clark@21:1/5 to All on Sun Jan 30 03:14:05 2022
    Our esteemed moderator was right in saying that this is part of how
    the LR family of algorithms work. In state machines, you can build a
    state that represents two (or more) rules (productions). So,
    shift-shift conflicts don't occur in the LR machine. You only care
    about conflicts when you need to reduce. More on that a bit later in
    this reply.

    In LL(k) algorithms, you don't get to do that (past the k-th token of
    lookahead at the beginning of the rule. So, when your LL(k) parser
    generator tells you that you need to left-factor your two rules. And,
    it usually does so by declaring your grammar as "ambiguous" which it
    might not be, just not LL(k). It has actually declared a shift-shift
    conflict, just not labelling it as that, as the LL people don't think
    of advancing the token index through the rule as shifting (although in
    an LR parser, that's what it is called because of how the stack is
    handled in LR parsing).

    Next, I want to talk about this example:

    X: '1' { a = a + 1; } '2' 'x';
    Y: '1' { a = a - 1; } '2' 'y';
    Z: '1' '2' 'z' { a = 0 };

    [The mid-rule action is a cheat, a shorthand for this:

    X: '1' x1 '2' 'x' ;
    x1: { a = a + 1; } ;

    Y: '1' y1 '2' 'y' ;
    y1: { a = a - 1; } ;

    That's why those actions create conflicts where there were none before.

    -John]

    The cheat as our moderator calls it, is *somewhat* an artifact of the traditional yacc/bison implementation of it. You can build a state
    machine model of the rules (we do in "Yacc++", the one by compiler
    resources as opposed to the C++ port of yacc, which is often refered
    to by the same name although not usually in initial cap spelling. I
    only mention the distinction lest someone use the latter and say, wait
    it doesn't do that) where you haven't introduced those artificial
    non-terminals (x1 and y1) and thus don't have to reduce them. If so,
    you can make the conflict "go away" at least in terms of what the
    parser needs to do.

    However, that in itself is somewhat misleading. In that, while you can
    make that work at the grammar level, you will have changed when the
    action code runs. You will postpone running it until after the lexer
    has processed and returned to the parser both the "2" token and the
    "x" or "y" token following it. If one is depending upon that action to influence how the lexer works (e.g. changing the lexer state or
    something similar), the action will occur too late.

    Thus, although Yacc++ can move the action code later in the processing
    of the parser's state machine, it still wants to report a
    warning/error to tell the user that the action had to be delayed as
    the two actions were in conflict when it shifted the "1" token. Now,
    we don't call this a shift-shift conflict, but rather an "action code
    conflict" because it is the two actions that are in conflict, not the
    shifting. And, worth noting, if the two actions are the same
    (textually), we don't even move it, because both transitions want to
    do the same thing and there is no "action code conflict".

    Now, getting back to LL(k) parsers, they mask this problem. Because
    before they start running a rule, they get to look-ahead (i.e. read
    from the lexer) up to k tokens, so they will get the "1" "2" and "x"
    or "y" tokens (k=3) before running the rule at all. Thus, they have
    the same issue in interacting with the lexer, but haven't told the
    grammar writer about it.

    Now, about conflicts at reduce time. At reduce time, an LR(k) parser
    gets to read up to k tokens of lookahead before deciding whether to
    reduce or not. If the generator is LALR(k), there are some additional restrictions on that, but not that important to what I am going to
    say. So, in this case, a LR parser generator can look k tokens past
    the end of the rule (or past the action code, with the above mentioned
    caveat) before declaring a conflict.

    And, particularly worth noting are the GLR variants of the LR
    algorithm. In that case, the parser generator can simply generate a
    parse forest, effectively postponing the decision (potentially until
    the entire file has been processed) and only then noting that one
    still has a forest and not a tree that the grammar is ambiguous. Now,
    this has the positive effect of eliminating spurious conflict
    messages, because the parser can handle the grammar by generating a
    parse forest if needed. But, it also has the disadvantage of removing
    the check that at parser generation time the grammar has been detected
    as being potentially ambiguous. Now, for some grammars, this can be
    resolved and the grammar be shown to be ambiguous or not. However, in
    general, such detection is equivalent to solving the halting problem.
    (The proof of that being true is beyond what I can explain.)

    Finally, I want to mention something about Syntactic (and Semantic)
    predicates, as they were a part of this discussion earlier. They are
    somewhat an alternate solution to this problem. In fact, with the
    right model, they can be used to turn (even an LL) parser generator
    into a "universal turing machine", because they allow one to write
    backtracking parsers (with unlimited backtracking) and the allow one
    to effectively give oneself unlimited look-ahead before deciding which
    rule to apply and at the same time fine grained control over the order
    the rules are examined in. This is perhaps Terence Parr's greatest
    invention. In fact, it is so important that a whole new branch of
    parsing technologies "Parsing Expression Grammars" (PEGs) have evolved
    from it. Now, without the proper checks, it still has the problem of
    allowing ambiguous grammars, but with the twist that it guarantees an unambiguous parse tree (not a parse forest).

    Now, at some point, all of these advances in parsing technology will
    be merged (it is on my to do list, to do so, but I keep having other
    things to finish first). If done right, one potentially has a parser
    generator that will warn you if your grammar is ambiguous and yet
    allow you to process it in an unambiguous manner or get out a parse
    forest of all the possible interpretations (with fine grained control
    to get exactly the level of ambiguity you want handled in each
    manner). And, if really done right, it will give you something that
    looks like hand- written recursive descent code in most cases, and
    will allow you to edit that code and get the grammar tweaks back with
    the appropriate sanity checks. We have the theory to do so, but it
    isn't a one month project, so it remains just a dream. Of course, I
    won't be surprised if there are still people hacking on hand-written
    recursive descent parsers even if the dream comes to fruition.

    -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)