• Context sensitive tokens

    From Christopher F Clark@21:1/5 to All on Sun Mar 1 20:14:08 2020
    The discussion on tokens that are substrings of other tokens got me
    thinking about a feature that might help make such tokens easier to
    specify. I am now looking for a name (keyword) to use to describe
    these tokens.

    In particular consider the case of ">>" v. ">" in C++ templates. In
    expression contexts, you want >> to return the "right shift operator"
    token, but in template contexts you want it to return each ">" as an
    "end of template angle bracket" token. You can do this with lexer
    states. But, the more of these you have, the more lexer states you
    get and combinatorial explosion sets in. Not desirable, especially if
    you are creating the lexer states by hand.

    An alternate solution (that seems nice and simple to me) is to have
    flags associated with the problematic tokens that you want returned
    only in some states and not others. Where the lexer queries the
    parser to determine which tokens are allowed and only returns one from
    the allowable set.

    So normally, in Yacc++, one would write:

    token greater_than : ">";
    token right_shift : ">>";

    But, since we want the right shift token to be context sensitive. We
    would instead write.

    token greater_than : ">";
    context sensitive token right_shift : ">>";

    Now, before returning a right_shift token, it queries the parser as to
    whether that is legal in the current parser state. It would be an
    array of bits indicating which ones were legal that the parser would
    toggle to indicate whether the token was legal or not. (The parser
    knows for each state, what tokens are expected, so the bit mask is not
    hard to generate. And the only reason to do this only for some tokens
    is to make syntax error discovery easier, by not turning all
    unexpected tokens into lexical syntax errors.) If not, the lexer
    would return a different sequence of tokens (e.g. just a greater_than
    token, since that was the longest match prior to this disallowed
    match). The actual implementation is a little more subtle than that,
    but that captures the idea.

    The main question I have is what keyword(s) I should use to indicate
    the tokens in question.

    context sensitive
    contextual
    optional
    expected
    expectable?!??
    expectation sensitive
    suppressible

    something else?

    -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)