• Supporting multiple input syntaxes

    From luser droog@21:1/5 to All on Wed Aug 12 15:20:35 2020
    I've got my project successfully parsing the circa-1975 C syntax
    from that old manual. I'd like to add parsers for K&R1 and c90
    syntaxes.

    How separate should these be? Should they be complete
    separate grammars, or more piecewise selection?

    My feeling is that separating them will be less headache, but maybe
    there's some advantage to changing out smaller pieces of the grammar
    in that it might be easier to make sure that they produce the same
    structure compatible with the backend.

    Any guidance in this area?

    https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c

    [Really, it's up to you. My inclination would be to make them
    separate but use some sort of macro setup so you can insert
    common pieces into each of the grammars. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Hans-Peter Diettrich@21:1/5 to All on Thu Aug 13 10:27:45 2020
    Am 13.08.2020 um 00:20 schrieb luser droog:
    I've got my project successfully parsing the circa-1975 C syntax
    from that old manual. I'd like to add parsers for K&R1 and c90
    syntaxes.

    How separate should these be? Should they be complete
    separate grammars, or more piecewise selection?

    IMO this depends widely on the usage of the parser output (diagnostics, backend...). C90 is much stricter than K&R, requires more checks. Do you
    need extensive error diagnostics, or do you assume that all source code
    is free of errors?


    https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c

    You seem to implement an LL(1) parser? My C98 Parser is LL(2), i.e. an
    LL(1) parser with one or two locations where more lookahead is required.
    Also identifiers are classified as typenames and others prior to their
    usage.

    For real-world testing (recommended!) a preprocessor is required and a
    copy of the standard libraries of existing compiler(s).

    Your test_syntax() source misses "=" from the variable declarations (initializers). What about pointer syntax/semantics? If you add these
    (and other) syntax differences conditionally (version specific) to your
    code, which way would look better to you? Which way will be safer to
    maintain?


    Nice code BTW :-)

    DoDi

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to luser droog on Thu Aug 13 00:43:47 2020
    On 2020-08-12, luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email> wrote:
    I've got my project successfully parsing the circa-1975 C syntax
    from that old manual. I'd like to add parsers for K&R1 and c90
    syntaxes.

    How separate should these be? Should they be complete
    separate grammars, or more piecewise selection?

    My feeling is that separating them will be less headache, but maybe
    there's some advantage to changing out smaller pieces of the grammar
    in that it might be easier to make sure that they produce the same
    structure compatible with the backend.

    Any guidance in this area?

    https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c

    I'd say that since you're not using a parser generator, but using code statements to construct the grammar objects at initialization time, you
    have the flexibility to merge the implementation, because you can check
    the value of some dialect-selecting variable, and construct the parser accordingly, and elsewhere check that same variable to do whatever else
    needs to be done conditionally.

    The trick is to find a way to embed the *semantics* of the older dialects
    into the new so then everything after the parsing can be shared.

    Similar remarks would apply to recursive descent.

    If you were using something clunky like a Yacc, there are still ways
    to combine everything into a single grammar. The input stream can be
    primed with one of several "secret internal token" objects that has no
    lexeme. (Primed, meaming that the first call to the lexer yields this
    secret token instead of processing actual input.) Each token indicates
    a dialect to parse. The top-level grammar production can then pick
    one of several subordinate production rules corresponding to the entry
    points for the respective dialects. Those can then share common rules
    as much as possible.

    translation_unit : C75_TOKEN c75_translation_unit /* orig flavor */
    | C79_TOKEN c79_translation_unit /* "K&R" */
    | C90_TOKEN c90_translation_unit /* ANSI/ISO */
    ;

    --
    TXR Programming Lanuage: http://nongnu.org/txr
    Music DIY Mailing List: http://www.kylheku.com/diy
    ADA MP-1 Mailing List: http://www.kylheku.com/mp1

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@arcor.de@21:1/5 to All on Thu Aug 13 14:15:09 2020
    Am Donnerstag, 13. August 2020 00:32:56 UTC+2 schrieb luser droog:
    I've got my project successfully parsing the circa-1975 C syntax
    from that old manual. I'd like to add parsers for K&R1 and c90
    syntaxes.

    How separate should these be? Should they be complete
    separate grammars, or more piecewise selection? ...

    Why not settle for one master dialect and use awk to translate between dialects?

    [Probably because there is a great deal of C code written to comply with
    the various versions of the standard, users want error messages that match
    the code they wrote rather than some intermediate code, and it would be quite an awk program that could reconcile all the differences among C flavors. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to minf...@arcor.de on Thu Aug 13 21:37:55 2020
    On Thursday, August 13, 2020 at 5:24:36 PM UTC-5, minf...@arcor.de wrote:
    Am Donnerstag, 13. August 2020 00:32:56 UTC+2 schrieb luser droog:
    I've got my project successfully parsing the circa-1975 C syntax
    from that old manual. I'd like to add parsers for K&R1 and c90
    syntaxes.

    How separate should these be? Should they be complete
    separate grammars, or more piecewise selection? ...

    Why not settle for one master dialect and use awk to translate between dialects?

    [Probably because there is a great deal of C code written to comply with
    the various versions of the standard, users want error messages that match the code they wrote rather than some intermediate code, and it would be quite an awk program that could reconcile all the differences among C flavors. -John]

    One of the possible goals for this project is exactly such a translator
    that can downgrade or upgrade code from one standard version to another.

    Another possible application is a source code formatter. Currently the
    CST produced by the parser keeps all the original whitespace attached
    to each token.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to Hans-Peter Diettrich on Thu Aug 13 21:36:13 2020
    On Thursday, August 13, 2020 at 5:22:51 PM UTC-5, Hans-Peter Diettrich wrote:
    Am 13.08.2020 um 00:20 schrieb luser droog:
    I've got my project successfully parsing the circa-1975 C syntax
    from that old manual. I'd like to add parsers for K&R1 and c90
    syntaxes.

    How separate should these be? Should they be complete
    separate grammars, or more piecewise selection?

    IMO this depends widely on the usage of the parser output (diagnostics, backend...). C90 is much stricter than K&R, requires more checks. Do you
    need extensive error diagnostics, or do you assume that all source code
    is free of errors?


    https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c

    You seem to implement an LL(1) parser? My C98 Parser is LL(2), i.e. an
    LL(1) parser with one or two locations where more lookahead is required.
    Also identifiers are classified as typenames and others prior to their
    usage.


    Yes, it's basically LL(1) with backtracking. There's one part of the
    grammar I'm using that's left-recursive and I still need to work that
    out.

    For real-world testing (recommended!) a preprocessor is required and a
    copy of the standard libraries of existing compiler(s).

    Your test_syntax() source misses "=" from the variable declarations (initializers). What about pointer syntax/semantics? If you add these
    (and other) syntax differences conditionally (version specific) to your
    code, which way would look better to you? Which way will be safer to maintain?


    That's actually correct for the 1975 dialect: no '=' to initialize
    variables. I think it's pretty ugly without it, but it could be
    removed anyway for the AST.


    Nice code BTW :-)


    Thanks! I think I need to sidetrack a bit and work up some primitives
    for pattern matching and decomposition to make the backend easier.
    I'll report back if/when it can do more tricks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Christopher F Clark@21:1/5 to All on Fri Aug 14 12:44:31 2020
    We did something similar with Yacc++. We used inheritance of grammars
    (a feature of our product) to do so. In fact, the point of the
    exercise was to demonstrate that feature. I would presume something
    similar would work in a hand-written recursive descent parser.

    -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres
    23 Bailey Rd voice: (508) 435-5016
    Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Lovemore@21:1/5 to All on Sat Aug 15 06:42:29 2020
    It may be useful to consider what you would like to happen if you encounter a syntax that is ambiguous or works differently or is for another expected
    syntax from what you are parsing: produce a warning, error or handle quietly, or fall over, or don’t care.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to David Lovemore on Sat Aug 15 15:20:06 2020
    On Saturday, August 15, 2020 at 9:46:47 AM UTC-5, David Lovemore wrote:
    It may be useful to consider what you would like to happen if you encounter a syntax that is ambiguous or works differently or is for another expected syntax from what you are parsing: produce a warning, error or handle quietly, or fall over, or don’t care.

    Very good points. The parser is backtracking, returning a list of
    results. That could conceivably be useful for dealing with ambiguity by
    looking at more than just the first result.

    Warnings and error messages are going to be trickier I fear. The parser
    is built around the idea of Monadic combinators. So my research suggests
    that I'll need Monad Transformers to add the state needed for good
    messages. There are a bunch of lectures I've found about using these
    in Haskell, but not much about how to build them from scratch.

    As it is, any error in parsing will just produce no results.

    I started a prototype where the input list of characters was instead
    a list of (character, line-number, line-position). But it got really
    confusing at the time so I stopped. And the few times I've tried to
    look at it, I can't figure out what I was thinking.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Lovemore@21:1/5 to All on Sun Aug 16 02:21:23 2020
    My friend, reporting the furthest position examined by the parser I have
    useful in error cases as a simple stop gap when using a combinator approach.

    Thinking about it you kind of want to see the furthest failed position and the stack of rules above it. Such requires meta information when the code is written in the most natural way. For this reason and others I believe it is good to represent your grammar in data structures which is further in the direction of a compiler compiler tool (or compiler interpreter tool).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to davidl...@gmail.com on Thu Aug 20 14:45:28 2020
    On Sunday, August 16, 2020 at 10:53:24 AM UTC-5, davidl...@gmail.com wrote:
    My friend, reporting the furthest position examined by the parser I have [found]
    useful in error cases as a simple stop gap when using a combinator approach.

    Thinking about it you kind of want to see the furthest failed position and the
    stack of rules above it. Such requires meta information when the code is written in the most natural way. For this reason and others I believe it is good to represent your grammar in data structures which is further in the direction of a compiler compiler tool (or compiler interpreter tool).

    Thanks. I've done some further investigating. I built my parsers following
    two papers. Hutton and Meijer, Monadic Parser Combinators
    https://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf
    and Hutton, Higher-Order Functions for Parsing
    https://pdfs.semanticscholar.org/6669/f223fba59edaeed7fabe02b667809a5744d9.pdf

    The first adds error reporting using Monad Transformers. I'm thinking about
    how to move in this direction, but first I'd need to reformulate the code to make the Monad more explicit. It should be something like an interface or
    a mixin, like a base class with all virtual member functions. That could be done by modelling my objects more like OO objects and have 'bind' and
    'result' in a vtable in the Parser object.

    But the second paper does it differently, and maybe something I can do
    more easily. It redefines the parsers to no longer produce a list of results, so there's no longer support for ambiguity. Then it defines them to
    return a Maybe,

    maybe * ::= Fail [char] | Error [char] | OK *
    .
    where the OK branch has the parse tree, and Fail or Error both contain an error message. It describes how a Fail can be transformed into an Error. But it isn't entirely clear where the messages get injected.

    Still need to do some thinking on it, but I think I can rewrite the parsers
    to follow this model, and then decorate my grammar with possible errors
    at each node.

    Thanks for the encouragement. My classes start on Monday so I'm hoping
    to accomplish something on this before then.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to luser droog on Sun Aug 23 14:26:27 2020
    On Wednesday, August 12, 2020 at 3:32:56 PM UTC-7, luser droog wrote:
    I've got my project successfully parsing the circa-1975 C syntax
    from that old manual. I'd like to add parsers for K&R1 and c90
    syntaxes.

    (snip)

    I remember using a later C compiler that accepted
    the older form of assignment operators, such as =+, =-, etc.
    Presumably for those with older code.

    I then had to put an extra space when assigning negative values:

    i= -4;

    I don't remember now how I found out about that one.

    That is the only old C syntax I remember.
    (And much later than 1975.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to luser droog on Sun Aug 23 19:35:00 2020
    On Sunday, August 23, 2020 at 1:39:30 PM UTC-5, luser droog wrote:
    On Sunday, August 16, 2020 at 10:53:24 AM UTC-5, davidl...@gmail.com wrote:
    My friend, reporting the furthest position examined by the parser I have [found]
    useful in error cases as a simple stop gap when using a combinator approach.

    Thinking about it you kind of want to see the furthest failed position and the
    stack of rules above it. Such requires meta information when the code is written in the most natural way. For this reason and others I believe it is good to represent your grammar in data structures which is further in the direction of a compiler compiler tool (or compiler interpreter tool).

    Thanks. I've done some further investigating. I built my parsers following two papers. Hutton and Meijer, Monadic Parser Combinators
    https://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf
    and Hutton, Higher-Order Functions for Parsing
    https://pdfs.semanticscholar.org/6669/f223fba59edaeed7fabe02b667809a5744d9.pdf

    The first adds error reporting using Monad Transformers. [...]>
    But the second paper does it differently, and maybe something I can do
    more easily. It redefines the parsers to no longer produce a list of results, so there's no longer support for ambiguity. Then it defines them to
    return a Maybe,

    maybe * ::= Fail [char] | Error [char] | OK *
    .
    where the OK branch has the parse tree, and Fail or Error both contain an error
    message. It describes how a Fail can be transformed into an Error. But it isn't
    entirely clear where the messages get injected.

    Still need to do some thinking on it, but I think I can rewrite the parsers to follow this model, and then decorate my grammar with possible errors
    at each node.

    I've made some progress. I wrote a new prototype following Hutton and
    modified it to add position information to the character stream.
    And then rewrote the parsers to produce the maybe structure and then
    to collect rudimentary error messages.

    For these, a positive and negative case in PostScript,

    0 0 (abcd\ne) input-pos
    (abc) str exec
    pc
    0 0 (abcd\ne) input-pos
    (abd) str nofail exec
    pq

    I get this output:

    $ gsnd -q -dNOSAFER pc11.ps
    stack:
    [/OK [[(a) [(b) [(c) []]]] [[(d) [0 3]] [[(\n) [0 4]] [[(e) [1 0]] null]]]]] stack:
    [/Error [[(after) (a)] [[(after) (b)] [[{(d) eq} (not satisfied)] [[(c) [0 2]] [[(d) [0 3]] [[(\n) [0 4]] [[(e) [1 0]] null]]]]]]]]

    So, this indicates that I *can* modify my C parsers to produce error
    messages. The remaining input list has the position information for
    where the error occurred, [(c) [0 2]].

    Following the prototype, I modified the input functions to add positions
    for each character and modified the base parser item() to detect and
    remove the position stuff before it passes into the rest of the machinery.

    The next hurdle was making the extra position information work with
    the Unicode filters ucs4_from_utf8() and utf8_from_ucs4(). And that
    all appears to be working now.

    But that's probably the end of the story for now. Got to gear up for
    Operating Systems and Advanced Web Dev with Jave.

    Thanks to everyone for the help, esp. Kaz with the brilliant suggestion
    to pass a language id token between tokenizer and parser.


    Ps. the prototype is written in PostScript extended with function syntax. https://github.com/luser-dr00g/pcomb/blob/master/ps/pc11.ps https://codereview.stackexchange.com/questions/193520/an-enhanced-syntax-for-defining-functions-in-postscript

    --
    l droog
    [Why Postscript? I realize it's Turing complete, but it seems odd to run ones parser on a printer. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to luser droog on Sun Aug 23 20:39:30 2020
    On Sunday, August 23, 2020 at 10:15:06 PM UTC-5, luser droog wrote:

    Thanks to everyone for the help, esp. Kaz with the brilliant suggestion
    to pass a language id token between tokenizer and parser.


    Ps. the prototype is written in PostScript extended with function syntax. https://github.com/luser-dr00g/pcomb/blob/master/ps/pc11.ps https://codereview.stackexchange.com/questions/193520/an-enhanced-syntax-for-defining-functions-in-postscript

    --
    l droog
    [Why Postscript? I realize it's Turing complete, but it seems odd to run ones parser on a printer. -John]

    I discovered PostScript around '97 or '98. I was taking Computer Graphics
    and it was in an Appendix to the textbook (Salman). At the same time
    I was editor of the Honors College student magazine so it really piqued
    my interest as a graphics and typography language.

    But the language itself I just really enjoy. It's my "Lego blocks"
    language. The RPN syntax removes all ambiguity about precedence and
    sequencing. It has the same code=data properties as Lisp. Application
    code can read from the program stream. It has strings, arrays and
    dictionaries. It has first class procedures which can be constructed
    on the fly. I've found it a nice playpen for syntax extension.

    I was also on a many-decades long crusade to never use MS Word after that /first/ time they screwed everyone by changing the interface. And
    PostScript has slowly become my tool for that as my programming skill grew. https://github.com/luser-dr00g/ibis.ps

    On another front, I wanted to have parsers in PostScript so I could
    evaluate infix math expressions. And I wanted regular expression
    matching in PS thinking it would help to implement algorithmic
    hyphenation of text.
    [Take a look at Forth. Many of the same advantages, runs a lot more places. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to Hans-Peter Diettrich on Sun Aug 23 22:16:33 2020
    On Thursday, August 13, 2020 at 5:22:51 PM UTC-5, Hans-Peter Diettrich wrote:
    Am 13.08.2020 um 00:20 schrieb luser droog:
    I've got my project successfully parsing the circa-1975 C syntax
    from that old manual. I'd like to add parsers for K&R1 and c90
    syntaxes.

    How separate should these be? Should they be complete
    separate grammars, or more piecewise selection?

    IMO this depends widely on the usage of the parser output (diagnostics, backend...). C90 is much stricter than K&R, requires more checks. Do you
    need extensive error diagnostics, or do you assume that all source code
    is free of errors?


    https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c

    You seem to implement an LL(1) parser? My C98 Parser is LL(2), i.e. an
    LL(1) parser with one or two locations where more lookahead is required.

    In which places do you need more lookahead? Btw, some of my reading
    describes my parsers as LL(infinity) because of the backtracking.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to luser droog on Mon Aug 24 17:01:59 2020
    luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email> schrieb:

    [PostScript]

    But the language itself I just really enjoy. It's my "Lego blocks"
    language. The RPN syntax removes all ambiguity about precedence and sequencing.

    I recently had the doubtful pleasure of evaluating the formula

    x = ((a-b)*c^2+(-d^2+e^2-a^2+b^2)*c+a^2*b+(f^2-e^2-b^2)*a
    +(-f^2+d^2)*b)/((-2*d+2*e)*c+(2*f-2*e)*a-2*b*(f-d))

    in Postscript. (Yes, really. Don't ask.)

    It was the first time in more than a decade that I wrote a
    flex/bison grammar (mostly copied from the bison manual). It was
    faster and less error-prone than trying to do it directly.

    The grammar actually generated fairly unidiomatic PostScript
    because I made it give names to all the variables, so

    a-b

    became

    a b sub

    I'm sure a real PostScript aficionado would have done it all
    on the stack :-)
    [Turning infix into RPN is a pretty basic intro compiler course exercise. Conceptually,
    you make the parse tree and then do a postorder tree walk. Or if you'rs using yacc or
    bison, in the expression grammar you just print the operator or token in the code for each
    rule because they are recognized in the right order, e.g.:

    expr: VARIABLE { printf(" %s ", $1): }
    | expr '+' expr ( printf(" add "); }
    | expr '-' expr ( printf(" sub "); }
    | '(' expr ')' { /* print nothing */ }

    -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to luser droog on Mon Aug 24 12:14:42 2020
    On Monday, August 24, 2020 at 10:42:33 AM UTC-5, luser droog wrote:
    [Why Postscript? I realize it's Turing complete, but it seems odd to run ones parser on a printer. -John]

    I discovered PostScript around '97 or '98. I was taking Computer Graphics
    and it was in an Appendix to the textbook (Salman). At the same time
    I was editor of the Honors College student magazine so it really piqued
    my interest as a graphics and typography language. ...

    [Take a look at Forth. Many of the same advantages, runs a lot more places. -John]

    Good suggestion. I have looked at Forth quite a bit. I lurked in comp.lang.forth for a number of years. I've got a half-written
    interpreter that stalled because my vm doesn't have any I/O.

    https://groups.google.com/d/topic/comp.lang.forth/Y1XlX8wD3RQ/discussion https://retrocomputing.stackexchange.com/questions/6610/how-to-do-i-o-with-emulation-code

    I went down a wild rabbit hole after discovering the document "X86 is an
    octal machine" and tried to recode the assembler macros using more octal.
    But I kind of stalled on that whole area since the first thing I would
    want in Forth is tagged objects like PostScript has. There's Oforth and
    8th which both supply that sort of thing, but then I'd probably miss
    the PS graphics functions.,,. :)

    I've also played with APL and tried writing a few interpreters for it.
    But the common thread among all these interpreters was coding them all
    in C. So I turned my attention to compiling and analyzing C code.
    A friend of mine was wanting a really customizable C formatter so I
    thought I might be able to make a tool to accommodate lots of different backends for doing something with the parse tree or syntax tree.
    I want to be able to write C99 code and transpile it automatically to
    something that will work with the MS compiler without having to maintain
    any MS business in the "master" source.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to Thomas Koenig on Fri Aug 28 10:56:15 2020
    On Monday, August 24, 2020 at 2:12:13 PM UTC-5, Thomas Koenig wrote:
    luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email.dmarc.email> schrieb:

    [PostScript]

    But the language itself I just really enjoy. It's my "Lego blocks" language. The RPN syntax removes all ambiguity about precedence and sequencing.

    I recently had the doubtful pleasure of evaluating the formula

    x = ((a-b)*c^2+(-d^2+e^2-a^2+b^2)*c+a^2*b+(f^2-e^2-b^2)*a
    +(-f^2+d^2)*b)/((-2*d+2*e)*c+(2*f-2*e)*a-2*b*(f-d))

    in Postscript. (Yes, really. Don't ask.)


    In case you need it, I've got a PostScript debugger that can single
    step into loops and procedures.

    https://github.com/luser-dr00g/debug.ps

    $ gsnd db5.ps
    GPL Ghostscript 9.52 (2020-03-19)
    Copyright (C) 2020 Artifex Software, Inc. All rights reserved.
    This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
    see the file COPYING for details.
    { 0 0 1 5 { add } for } stepon traceon debug
    %|-
    0 %|- 0
    0 %|- 0 0
    1 %|- 0 0 1
    5 %|- 0 0 1 5
    {add} %|- 0 0 1 5 {add}
    for nametype object for step: (continue|next|bypass|step|prompt|quit)?s
    %|- 0 0
    add nametype object add step: (continue|next|bypass|step|prompt|quit)?
    %|- 0
    1 %|- 0 1
    1 %|- 0 1 1
    5 %|- 0 1 1 5
    {add} %|- 0 1 1 5 {add}
    for nametype object for step: (continue|next|bypass|step|prompt|quit)?
    %|- 0 1
    add nametype object add step: (continue|next|bypass|step|prompt|quit)?
    %|- 1
    2 %|- 1 2
    1 %|- 1 2 1
    5 %|- 1 2 1 5
    {add} %|- 1 2 1 5 {add}
    for nametype object for step: (continue|next|bypass|step|prompt|quit)?
    %|- 1 2
    add nametype object add step: (continue|next|bypass|step|prompt|quit)?
    %|- 3
    3 %|- 3 3
    1 %|- 3 3 1
    5 %|- 3 3 1 5
    {add} %|- 3 3 1 5 {add}
    for nametype object for step: (continue|next|bypass|step|prompt|quit)?
    %|- 3 3
    add nametype object add step: (continue|next|bypass|step|prompt|quit)?
    %|- 6
    4 %|- 6 4
    1 %|- 6 4 1
    5 %|- 6 4 1 5
    {add} %|- 6 4 1 5 {add}
    for nametype object for step: (continue|next|bypass|step|prompt|quit)?
    %|- 6 4
    add nametype object add step: (continue|next|bypass|step|prompt|quit)?
    %|- 10
    5 %|- 10 5
    1 %|- 10 5 1
    5 %|- 10 5 1 5
    {add} %|- 10 5 1 5 {add}
    for nametype object for step: (continue|next|bypass|step|prompt|quit)?
    %|- 10 5
    add nametype object add step: (continue|next|bypass|step|prompt|quit)?
    %|- 15
    6 %|- 15 6
    1 %|- 15 6 1
    5 %|- 15 6 1 5
    {add} %|- 15 6 1 5 {add}
    for nametype object for step: (continue|next|bypass|step|prompt|quit)?
    GS<1>==
    15
    quit

    [This is drifting rather far from compilers now. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to luser droog on Thu Feb 11 23:27:43 2021
    luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email> wrote:
    I've got my project successfully parsing the circa-1975 C syntax
    from that old manual. I'd like to add parsers for K&R1 and c90
    syntaxes.

    How separate should these be? Should they be complete
    separate grammars, or more piecewise selection?

    My feeling is that separating them will be less headache, but maybe
    there's some advantage to changing out smaller pieces of the grammar
    in that it might be easier to make sure that they produce the same
    structure compatible with the backend.

    Any guidance in this area?

    https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c

    [Really, it's up to you. My inclination would be to make them
    separate but use some sort of macro setup so you can insert
    common pieces into each of the grammars. -John]

    Gnu Pascal supports several Pascal dialects. Gnu Pascal uses
    unified parser for all dialects. Some ideas used:
    - flags in scanner decide if dialect specific tokens are
    recognized
    - superset parsing: several constructs are generalized so
    that single construct represents things that othewise
    would lead to conflits. Later semantic stage looks at
    dialects flags, prunes things not allowed in given
    dialect. Example of superset contruction is rule
    'call_or_cast', it handles several syntactically similar
    constructs that are usually given by separate syntax
    rules. Semantic rules beside dialect flags use types to
    decide of meaning.
    - even after usin two tricks above grammar still have
    LALR conflicts, they are resolved using GLR option
    of Bison. All conflicts are resolvable using lookahead,
    and AFAICS some are only resolvable with lookahead.
    Parser lookahead means that traditional trick of
    passing semantic info back to scanner does not work
    (parser actions are delayed, so scanner may be forced
    to produce token before semantic info is available).
    Still, it seems that GLR leads to cleaner parser.

    My impression is that variation in Pascal dialects is larger
    than in C dialects, so case for unified parser in C IMHO
    is much stronger. OTOH Gnu Pascal is full compiler with
    semantic actions invoked from grammar rules. Semantic code
    embedded in the parser changed much more than grammar rules,
    so maintaining separate parsers probably would be a
    nightmare.

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Elijah Stone@21:1/5 to antispam@math.uni.wroc.pl on Wed Feb 17 01:44:14 2021
    On Thu, 11 Feb 2021, antispam@math.uni.wroc.pl wrote:

    My impression is that variation in Pascal dialects is larger than in C dialects, so case for unified parser in C IMHO

    Pascal is more fragmented, but it's also much easier to parse than C. I
    think it's a wash.

    (I also think the whole idea is horrifying and ought not to be pursued;
    but.)

    -E

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From antispam@math.uni.wroc.pl@21:1/5 to Elijah Stone on Tue Feb 23 23:28:16 2021
    Elijah Stone <elronnd@elronnd.net> wrote:
    On Thu, 11 Feb 2021, antispam@math.uni.wroc.pl wrote:

    My impression is that variation in Pascal dialects is larger than in C dialects, so case for unified parser in C IMHO

    Pascal is more fragmented, but it's also much easier to parse than C. I think it's a wash.

    I did a C parser, it was not hard at all. I in C (like in standard
    Pascal) there are conflicts, but that conflicts can be resolved
    easily using semantic info. Alternativly, for C one can use 2
    token lookahead. Turbo Pascal folks introduced "interesting"
    difficulty with caret constants. Frank Heckenbach worked out
    how to handle them and his analysis indicates that correct
    handling of Turbo Pascal needs IIRC 6 tokens of lookahead.

    Note that for both Pascal and C, with 1 token of lokahead
    semantic info is available when needed to disambiguate
    parsing, once you have more than 1 token of lokahead
    semantic info is sometimes too late and in effect paser
    must work purely syntactically.

    (I also think the whole idea is horrifying and ought not to be pursued;
    but.)

    What you mean by "whole idea"? Do you think that creating
    compiler that can correctly handle multiple dialects (Pascal
    or other language) is wrong?

    --
    Waldek Hebisch

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From ltcmelo@gmail.com@21:1/5 to All on Sun Mar 14 21:08:33 2021
    Elijah Stone <elr...@elronnd.net> wrote:
    I did a C parser, it was not hard at all. I in C (like in standard
    Pascal) there are conflicts, but that conflicts can be resolved
    easily using semantic info. Alternativly, for C one can use 2
    token lookahead.

    I'm not sure whether I captured the full context of your statement, but, if I did, I don't think it's 100% correct:

    - In regards to lookahead of 2:
    This isn't enough to disambiguate, e.g., between a cast-expression in 6.5.4,
    `( type-name ) cast-expression`, and a compound literal in 6.5.2, `( type-name ) { initializer-list }`.

    - In regards to using semantic info:
    Yes, with semantic info you can disambiguate things like `x * y;`, so I'd say that, from a pragmatic/practical standpoint, this affirmation is right. However, from a more theoretical perspective, a parser (thinking of it a program that "simply" validates a sentence based on a grammar), isn't expected — arguably — to rely on anything else other than syntax. Whether or not
    the theoretical aspect of it is relevant, depends on the application of the parser, I guess. For instance, for the implementation of static analysis tool, not depending, as much as possible, on semantic information to guide parsing
    is an advantage.

    This is a table (only for expressions) that I recently put up when rewriting the C parser of my project: https://docs.google.com/spreadsheets/d/1oGjtFaqLzSoBEp2aGNgHrbEHxSi4Ijv57mXMP ymZEcQ/edit?usp=sharing

    --
    Leandro T. C. Melo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Rock Brentwood@21:1/5 to luser droog on Sun Mar 14 17:36:01 2021
    On Wednesday, August 12, 2020 at 5:32:56 PM UTC-5, luser droog wrote:
    I've got my project successfully parsing the circa-1975 C syntax
    from that old manual. I'd like to add parsers for K&R1 and c90
    syntaxes.

    How separate should these be? Should they be complete
    separate grammars, or more piecewise selection?

    I'm in a similar situation with a utility that I want to grandfather in the
    old syntax for, but write with a new and better syntax. My recommendation is this: stick to C99, since that's already in POSIX. Write a separate utility to convert legacy syntax to C99 (and to call out any irregularities/inconsistencies in the program being converted). That's, like, "lint" on steroids.

    The other syntaxes would be used in the other utilities, only - one per utility. It can also be hybridized with "indent" and a driver routine can control the conversion, so that all the conversion utilities can be combined
    to one. So, on input, the source syntax is selected, and on output the format is driven in much the same way that it is with indent. It's an excellent exercise in Text-To-AST-To-Text programming.

    Each program, upon upward conversion to C99, would replace the original, once it passes the consistency checks provided by the utility; so there isn't a question of cueing error messages to the format of the older program, because the older program would be replaced. Doing all of this is an example of "refactoring" used to pay off "code debt". And there's a lot of code debt out there that needs to be paid up.

    Technical Debt (Wikipedia): https://en.wikipedia.org/wiki/Technical_debt
    Code Refactoring (Wikipedia): https://en.wikipedia.org/wiki/Code_refactoring

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)