• Got the regular expression parser working again hurray!

    From luser droog@21:1/5 to All on Thu Nov 4 19:04:43 2021
    For the umpteenth time I've rewritten my parser combinators.
    With any luck, they finally have all the necessary bells and
    whistles added to be useful for other stuff.

    The code is in these 3 files. Trimmed of testing gibberish,
    it's about 200 lines of code. https://github.com/luser-dr00g/pcomb/blob/5efeca34f410e8855eff9d38ba21f80983b45afd/ps/pc11are.ps
    https://github.com/luser-dr00g/pcomb/blob/5efeca34f410e8855eff9d38ba21f80983b45afd/ps/pc11a.ps
    https://github.com/luser-dr00g/pcomb/blob/5efeca34f410e8855eff9d38ba21f80983b45afd/ps/struct2.ps

    A parser is a function that takes an <input-stream> type and
    yields a <result-structure> type. The <input-stream> is a lazy
    list of [char [row col]] structures. The <result-structure> is a
    two element array, the first element of which will be /OK /Fail or
    /Error. For an /OK result, the second element will be
    [result remainder]. For a /Fail or /Error result, the second element
    will be [message remainder].

    So here's the regular expression parser with this new setup.
    And some simple testing code and output, which should be
    comprehensible using the above description of the input/output.
    In effect, this is a syntax-directed compiler from regular expressions
    to the syntax for constructing parsers using these same combinators.

    errordict/typecheck{ps pe quit}put
    %errordict/stackunderflow{pe quit}put
    %errordict/stackunderflow{pq}put
    %errordict/undefined{pq}put
    %(../../debug.ps/db5.ps) run
    (pc11a.ps)run {
    fix { flatten clean }
    clean { { dup zero eq { pop } if } map }
    ? { {maybe} compose }
    + { {some} compose }
    * { {many} compose }
    } pairs-begin

    /Dot (.) char {pop {item} one} using def
    /Meta (*+?) anyof def
    /Character (*+?.|()) noneof {first {literal} curry one} using def /Expression {-777 exec} def
    /Atom //Dot
    (\() char //Expression executeonly xthen (\)) char thenx alt
    //Character alt def
    /Factor //Atom /A
    //Meta {/A load first exch first load exec one } using
    maybe {dup first zero eq {pop /A load} if } using
    into def
    /Term //Factor //Factor many then
    { fix { {then} compose compose } reduce one } using def //Expression 0 //Term (|) char //Term xthen many then
    { fix { {plus} compose compose } reduce one } using put

    /regex { 0 0 3 2 roll string-input //Expression exec report } def

    {
    0 0 (ab) string-input //Dot maybe exec pc
    0 0 (ab) string-input //Meta exec pc
    0 0 (*) string-input //Meta exec pc
    0 0 (ab) string-input //Character maybe exec pc
    0 0 (ab) string-input //Atom maybe exec pc
    0 0 (.) string-input //Atom maybe exec pc
    %0 0 (a*) string-input //Atom //Meta then ==
    0 0 (a*) string-input //Atom //Meta then exec pc
    0 0 (ab) string-input Factor pc
    0 0 (a*) string-input Factor pc
    0 0 (ab) string-input Term pc
    0 0 (ab|c) string-input Expression pc
    (ab) regex
    } exec
    {
    } pop
    quit


    $ gsnd -dNOSAFER pc11are.ps
    GPL Ghostscript 9.52 (2020-03-19)
    Copyright (C) 2020 Artifex Software, Inc. All rights reserved.
    This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
    see the file COPYING for details.
    stack:
    [/OK [[[] []] [[(a) [0 0]] {0 1 (b) string-input}]]]
    :stack
    stack:
    [/Fail [[{(*+?) within} (not satisfied)] [[(a) [0 0]] {0 1 (b) string-input}]]] :stack
    stack:
    [/OK [[(*) []] {0 1 () string-input}]]
    :stack
    stack:
    [/OK [[{(a) literal} []] {0 1 (b) string-input}]]
    :stack
    stack:
    [/OK [[{(a) literal} []] {0 1 (b) string-input}]]
    :stack
    stack:
    [/OK [[{item} []] {0 1 () string-input}]]
    :stack
    stack:
    [/OK [[{(a) literal} [(*) []]] {0 2 () string-input}]]
    :stack
    stack:
    [/OK [[{(a) literal} []] [[(b) [0 1]] {0 2 () string-input}]]]
    :stack
    stack:
    [/OK [[{(a) literal many} []] {0 2 () string-input}]]
    :stack
    stack:
    [/OK [[{(a) literal (b) literal then} []] []]]
    :stack
    stack:
    [/OK [[{(a) literal (b) literal then (c) literal plus} []] []]]
    :stack
    OK
    [{(a) literal (b) literal then}]
    remainder:[]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jeffrey H. Coffield@21:1/5 to luser droog on Fri Nov 5 08:15:51 2021
    On 11/04/2021 07:04 PM, luser droog wrote:
    For the umpteenth time I've rewritten my parser combinators.
    With any luck, they finally have all the necessary bells and
    whistles added to be useful for other stuff.


    I spent a significant amount of time writing several versions of a
    generalized parser and never got all the "bells and whistles" that I
    wanted. Then I came across Antlr4 which is in Java and while I admire
    your intent to parse Postscript in Postscript, Antlr4 has tons of "bells
    and whistles" and can be applied to practically any language, although I
    don't see that anyone has posted a PostScript grammar yet. There about
    250 different languages currently available at :

    https://github.com/antlr/grammars-v4

    There is a grammar for something called RPN which sounds like it could
    be a starting point.

    https://github.com/antlr/grammars-v4/tree/master/rpn

    Antlr4 takes some time to learn and I'm not sure how much Java you would
    need to know (I program about 80% in Java now) but if it sounded like
    something you are interested in, I would be willing to help. There is a
    great book "The Definitive Antlr 4 Reference".

    I am not associated directly with Antlr4 but have submitted some patches.

    Jeff Coffield
    www.digitalsynergyinc.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to Jeffrey H. Coffield on Fri Nov 5 17:21:11 2021
    On Friday, November 5, 2021 at 10:15:55 AM UTC-5, Jeffrey H. Coffield wrote:
    On 11/04/2021 07:04 PM, luser droog wrote:
    For the umpteenth time I've rewritten my parser combinators.
    With any luck, they finally have all the necessary bells and
    whistles added to be useful for other stuff.
    I spent a significant amount of time writing several versions of a generalized parser and never got all the "bells and whistles" that I
    wanted. Then I came across Antlr4 which is in Java and while I admire
    your intent to parse Postscript in Postscript, Antlr4 has tons of "bells
    and whistles" and can be applied to practically any language, although I don't see that anyone has posted a PostScript grammar yet. There about
    250 different languages currently available at :

    https://github.com/antlr/grammars-v4

    There is a grammar for something called RPN which sounds like it could
    be a starting point.

    https://github.com/antlr/grammars-v4/tree/master/rpn

    Antlr4 takes some time to learn and I'm not sure how much Java you would
    need to know (I program about 80% in Java now) but if it sounded like something you are interested in, I would be willing to help. There is a
    great book "The Definitive Antlr 4 Reference".

    I am not associated directly with Antlr4 but have submitted some patches.

    Jeff Coffield
    www.digitalsynergyinc.com

    Thanks, I'll check those out. I've had Antlr suggested to me many times before but I've shied away thus far, probably from squeamish feelings about Java.
    (If only Sun NeWS had been successful, we might never have needed Java.)

    But I had to use Java for several school projects last year while finally completing my degree. It's not so scary anymore. I'm pretty happy with
    the composability of my functions the way they work now, but there are
    probably ways to optimize the execution or use a different model behind
    the scenes and maintain a similar interface. I'm also going to need a
    nice syntax for describing grammars so I can write a parser for /that/
    and have it translate to the combinators. Then you can have any color
    you want as long as it's black.

    Which bells and whistles were you still missing? My latest version adds
    error messages which were sorely lacking in previous ones. But I've also
    gotten a handle on how to do lazy execution. And bizarrely enough, I've
    been able to prototype in PostScript and then re-code in C, despite the dissimilarity between those (the C one kinda pretends to be Lisp, tho).

    For PostScript Level 1, the grammar is actually super simple.

    <Object> :: <Single>
    || { <Object> * }

    <Single> :: <Integer> | <Real> | <Name> | <String>

    Some fiddly stuff further down the tree. But it's just the one production
    for procedures that needs a CFG. The rest of the language is strictly
    Regular. Level 2 adds a few more kinds of Single, like binary name
    tokens and binary object encodings.

    So probably no one has written a grammar for PS for popular parsing
    libraries because it doesn't get you anywhere. You get just [Object
    Object Object ...]. It would get you like 2% closer to being able to
    write a translator or interpreter for PostScript. Whereas with almost
    any other structured language the grammar gets you roughly 50% there.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)