• help wanted - compiler for a new language, written in Modula-2

    From magicmouse94937@gmail.com@21:1/5 to All on Mon Apr 10 23:06:06 2017
    I am working on a new computer language, and have written a first cut recursive descent transpiler (converting from the new language into AS3 and JavaScript), but need someone to redo the compiler using top down operator precedence (as presented in the
    lectures by Douglas Crockford), using the Pratt method, which is a fusion of left-right precedence parsing and top down recursive descent. It would be a lot faster than the backtracking that is currently in my compiler. If you know what this all means,
    and program in modula-2, then please send me a note.

    Currently using the ADW modula-2 compiler on Windows, but have a very nice cross-platform low level library that has a version for the P1 compiler.

    Will pay commensurate with skill and quality of output.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From jan272@gmail.com@21:1/5 to All on Wed Apr 12 02:31:45 2017
    Perhaps combine it with the MC project?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From carlglassberg@gmail.com@21:1/5 to magicmo...@gmail.com on Fri Apr 14 03:22:53 2017
    Is the grammar for your new language LL-1 ?

    Sincerely
    Carl Glassberg
    ----

    On Monday, April 10, 2017 at 11:06:07 PM UTC-7, magicmo...@gmail.com wrote:
    I am working on a new computer language, and have written a first cut recursive descent transpiler (converting from the new language into AS3 and JavaScript), but need someone to redo the compiler using top down operator precedence (as presented in the
    lectures by Douglas Crockford), using the Pratt method, which is a fusion of left-right precedence parsing and top down recursive descent. It would be a lot faster than the backtracking that is currently in my compiler. If you know what this all means,
    and program in modula-2, then please send me a note.

    Currently using the ADW modula-2 compiler on Windows, but have a very nice cross-platform low level library that has a version for the P1 compiler.

    Will pay commensurate with skill and quality of output.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From magicmouse94937@gmail.com@21:1/5 to All on Fri Apr 14 23:48:50 2017
    i have an EBNF grammar for the language; it is an indented language like Python. It isn't that hard to parse, about the same size grammar as Swift. After watching the Crockford video on youtube about the simplicity and speed of the Pratt methodology, it
    is clear that the compiler will be much faster with less backtracking if it uses that approach. A real sticking point in my first cut compiler which was purely recursive descent is that error recovery is very hard to do in recursive descent, and a huge
    amount of extra code exists to exit out of deeply nested function calls. This was always a weak spot in the Wirth code examples which i learned from. Production compilers are very different than academic versions, as nice error messages are a must for
    the next 100 million programmers to be. It includes in the language a full 2D graphics and event model and a graph database, as well as shell scripting capabilities. It is designed to replace the full development stack for mobile and web development,
    which means it is aiming directly at the largest single software development market which is interactive graphics applications for mobile + web. Basically you write your code in a single language. With no make scripts, no package managers, no external
    database, no frameworks at all, and under 30 API functions for a typical program. It will be competing against Eve, Elm, Red, and a few other "next-gen" languages.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco van de Voort@21:1/5 to magicmouse94937@gmail.com on Sat Apr 15 12:15:25 2017
    On 2017-04-15, magicmouse94937@gmail.com <magicmouse94937@gmail.com> wrote:
    i have an EBNF grammar for the language; it is an indented language like Python. It isn't that hard to parse, about the same size grammar as
    Swift. After watching the Crockford video on youtube about the simplicity and speed of the Pratt methodology, it is clear that the compiler will be much faster with less backtracking if it uses that approach. A real
    sticking point in my first cut compiler which was purely recursive descent
    is that error recovery is very hard to do in recursive descent, and a huge amount of extra code exists to exit out of deeply nested function calls.
    This was always a weak spot in the Wirth code examples which i learned
    from.

    Strangely I always considered the Wirth languages the ones with good error handling support, and that was often attributed to them typically being RD
    as almost LL(1) languages too.

    It might simply be a bad fit for your particular language though.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Chris Burrows@21:1/5 to magicmo...@gmail.com on Sat Apr 15 18:23:12 2017
    On Saturday, April 15, 2017 at 4:18:51 PM UTC+9:30, magicmo...@gmail.com wrote:
    A real
    sticking point in my first cut compiler which was purely recursive descent is that error recovery is very hard to do in recursive descent, and a huge amount of extra code exists to exit out of deeply nested function calls. This was always a weak spot in the Wirth code examples which i learned
    from.

    How old were these examples that you were using? The strategy used by Wirth in his latest Project Oberon compiler does not have that problem. He describes the strategy as follows:

    "Also, the language Oberon is designed with the property that most large constructs begin with a unique symbol, such as IF, WHILE, CASE, RECORD, etc. These symbols facilitate the recovery of the parsing process in the erroneous text. More problematic are
    open constructs which neither begin nor end with key symbols, such as types, factors, and expressions. Relying on heuristics, the source text is skipped up to the first occurrence of a symbol which may begin a construct that follows the one being parsed.
    The employed scheme may not be the best possible, but it yields quite acceptable results and keeps the amount of program devoted to the handling of erroneous texts within justifiable bounds."

    For the full source code of including extensive documentation of his RISC5 Oberon compiler and the operating system that was built entirely using the language see 'Project Oberon 2013' at:

    https://people.inf.ethz.ch/wirth/

    If your compiler is having difficulty parsing your language then you should consider that as a warning sign that humans may have similar difficulties when trying to fully comprehend your language. I'd advise you to rethink the language design for areas
    that are causing you specific problems.

    Regards,
    Chris Burrows
    CFB Software
    http://www.astrobe.com/RISC5

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From trijezdci@21:1/5 to magicmo...@gmail.com on Sat Jul 29 06:21:56 2017
    On Tuesday, 11 April 2017 15:06:07 UTC+9, magicmo...@gmail.com wrote:
    I am working on a new computer language, and have written a first cut recursive descent transpiler (converting from the new language into AS3 and JavaScript), but need someone to redo the compiler using top down operator precedence (as presented in the
    lectures by Douglas Crockford), using the Pratt method, which is a fusion of left-right precedence parsing and top down recursive descent. It would be a lot faster than the backtracking that is currently in my compiler. If you know what this all means,
    and program in modula-2, then please send me a note.rsion for the P1 compiler.


    Sorry for the late response. You want to make sure that your language grammar is LL(1) and the issue you describe will be avoided in the first place. No backtracking is needed when parsing LL(1) grammars.

    For an example of an expression grammar that is LL(1) see

    https://github.com/m2sf/m2bsk/wiki/Syntax-Diagrams#expression

    The grammar imposes left-associativity in line with the rules of mathematics. A parser following that grammar will not need to backtrack nor reorder terms.

    However, if you were to add exponentiation to this expression grammar, like

    simpleTerm :=
    NOT factor | factor ( '**' factor )
    ;

    then things would get a little more interesting because the above grammar imposes left associativity which is what you want for just about all math operations but not exponentiation which is right associative.

    In this case it then depends on whether or not you are building an abstract syntax tree (AST) as multi-pass compilers do, or using syntax directed translation as single pass compilers do.

    If you are building an AST, associativity of an expression subtree entirely depends how you insert new nodes into the existing tree. You are in control of that, so there is no issue, no backtracking, no shunting, nor any other post processing needed.

    If you are doing syntax directed translation, then you'd need to shunt the factors, for example by pushing them on a stack and then pop them for evaluation once you have reached the end of the expression. The overhead associated with that is negligible.
    You'd pay more for creating AST nodes in a multi-pass compiler than you pay for pushing and popping values to and from a pre-allocated stack in a single pass compiler. I am not a fan of syntax directed translation but this particular scenario is not an
    issue.


    For an example of how to insert AST nodes for the above expression grammar in the parser (written in Modula-2), see

    https://github.com/m2sf/m2bsk/blob/master/src/imp/Parser.mod#L4000

    hth
    benjamin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From trijezdci@21:1/5 to magicmo...@gmail.com on Sat Jul 29 07:34:45 2017
    On Saturday, 15 April 2017 15:48:51 UTC+9, magicmo...@gmail.com wrote:
    i have an EBNF grammar for the language;

    Is it LL(1) though?

    it is clear that the compiler will be much faster with less backtracking if it uses that approach.

    An RD parser for an LL(1) grammar should not have to backtrack.

    A real sticking point in my first cut compiler which was purely recursive descent is that error recovery is very hard to do in recursive descent,

    I always found it much harder to do error recovery with table driven parsers.

    and a huge amount of extra code exists to exit out of deeply nested function calls.

    Then you are doing something wrong.

    This was always a weak spot in the Wirth code examples which i learned from.

    Wirth never taught error recovery. His examples were meant to illustrate the general process of recursive descent, not its error recovery. You may want to read other authors who actually cover error recovery. For example Dick Grune.

    The key to good error recovery in an RD parser for an LL(1) grammar is to

    (1) implement FIRST() and FOLLOW() sets as functions the parser can call
    (2) don't nest match token/set tests, but code the tests sequentially
    (3) if a test fails, skip to a token that is expected by the next test

    You can see this approach all over my parser. For example, take a look at the function that implements array type declaration

    https://github.com/m2sf/m2bsk/blob/master/src/imp/Parser.mod#L945

    arrayType :=
    ARRAY valueCount OF typeIdent
    ;

    One might be inclined to code this in a nested fashion like so

    (* ARRAY *)
    lookahead := Lexer.consumeSym(lexer);

    (* valueCount *)
    IF matchSet(FIRST(Expression)) THEN
    lookahead := expression(valueCount);

    (* OF *)
    IF matchToken(Token.Of) THEN
    lookahead := Lexer.consumeSym(lexer);

    (* typeIdent *)
    IF matchSet(FIRST(Qualident)) THEN
    lookahead := qualident(baseType)

    Of course this works, but this makes error recovery very difficult because you need to back out of your nested IFs and then figure out how the parser could continue. Generally speaking, when you are at the end of the nested block, you have progressed too
    far in the code to continue and you are forced to skip even further in the hope to find a sensible resync point.

    The approach I outlined above in the three bullet points doesn't have this problem. You'd code the tests in sequence like so

    (* ARRAY *)
    lookahead := Lexer.consumeSym(lexer);

    (* valueCount *)
    IF matchSet(FIRST(Expression)) THEN
    lookahead := expression(valueCount);
    ELSE (* resync *)
    lookahead := skipToMatchTokenOrSet(Token.Of, FIRST(TypeIdent))
    END; (* IF *)

    (* OF *)
    IF matchToken(Token.Of) THEN
    lookahead := Lexer.consumeSym(lexer);
    ELSE (* resync *)
    lookahead := skipToMatchSetOrSet(FIRST(Qualident), FOLLOW(ArrayType))
    END (* IF *)

    (* typeIdent *)
    IF matchSet(FIRST(Qualident)) THEN
    lookahead := qualident(baseType)
    ELSE (* resync *)
    lookahead := skipToMatchSet(FOLLOW(ArrayType))
    END (* IF *)

    With this approach, when an error occurs, the parser will skip to the token that will be expected by the following test and continue. It is then just a matter of fine tuning the resync points, that is to say which tokens to skip to.

    For this you will need to write a number of resync functions for different cases. In my parser I started out with just two, skipToMatchToken and skipToMatchSet, but soon found that I needed to add more, like skiptToMatchTokenOrToken,
    skipToMatchTokenOrSet and skipToMatchTokenOrTokenOrSet.

    The benefit of this approach is that your error recovery is then easily tunable by simply changing the resync tokens. With nested tests, you can't adjust that easily. If you want to change your resync points in a nested IF structure, you can't just
    change the resync tokens, you have to change the code as well.

    If you read authors who handle error recovery, like Grune, you will learn that some approaches involve inserting tokens into the stream instead of skipping tokens when an error is detected, in other words, error recovery assumes that some token is
    missing the source code.

    You will find that the structure of sequential tests I describe above can easily accommodate such a strategy as well. The flatter your structure, the more flexible it is.

    hth
    benjamin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Pascal J. Bourguignon@21:1/5 to trijezdci on Sat Jul 29 22:09:44 2017
    trijezdci <trijezdci@gmail.com> writes:

    This was always a weak spot in the Wirth code examples which i learned from.

    Wirth never taught error recovery. His examples were meant to
    illustrate the general process of recursive descent, not its error
    recovery. You may want to read other authors who actually cover error recovery. For example Dick Grune.

    On the other hand, when you have a fast (unsophisticated, therefore
    correct and fast) compiler, you can easily compile up to the first
    error, correct it in the IDE, and then compiler again up to the next
    error.

    Error recovery was important when you used batch compilers in batch environments, but not when you're compiling interactively in an
    interactive IDE.

    --
    __Pascal J. Bourguignon
    http://www.informatimago.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)