• PS Level 1 grammar

    From luser droog@21:1/5 to All on Sun Nov 7 16:40:42 2021
    Here's a rough draft of the grammar for a PS tokenizer using my new functions. This is almost the exact same code as the previous version pc9token.ps
    with just a few function names changed and no handlers yet to transform
    the data. My previous code didn't do the recursion for procedures, but
    this one does or should assuming it works.

    I'm building recursive parsers by starting with a "forwarding" proc
    /myparser {-777 exec} def
    which can be composed with other parsers and filled
    in later by doing
    //myparser 0 //composed-parser put
    This is the simplest way I've found so far after struggling with more complicated ways.

    It's missing some stuff like e notation, hex strings, ASCII85.
    pc11atoken.ps:

    (pc11a.ps)run

    /delimiters ( \t\n()/%[]<>{}) def
    /delimiter delimiters anyof def
    /octal (0)(7) range def
    /digit (0)(9) range def
    /alpha (a)(z) range (A)(Z) range alt def
    /regular delimiters noneof def

    /number //digit some def
    /opt-number //digit many def
    /rad-digits //digit //alpha plus some def
    /rad-integer //digit //digit maybe then (#) char then //rad-digits then def /integer (+-) anyof maybe //number then def
    /real (+-) anyof maybe
    //number (.) char then //opt-number then
    (.) char //number then alt then def
    /name //regular some def

    /ps-char {-777 exec} def
    /escape (\\) char
    (\\) char
    (\() char alt
    (\)) char alt
    (n) char alt
    (r) char alt
    (t) char alt
    (b) char alt
    (f) char alt
    //octal //octal maybe then //octal maybe then alt
    then def
    /substring (\() char //ps-char many then (\)) char then def
    //ps-char 0 //escape
    //substring alt
    (()) noneof alt put
    /ps-string (\() char //ps-char many then (\)) char then def

    /spaces ( \t\n) anyof many def
    /object {-777 exec} def
    /ps-token //spaces //object xthen def
    /object 0 //rad-integer
    //real alt
    //integer alt
    //name alt
    (/) char //name then alt
    (/) char (/) char then //name then alt
    //ps-string alt
    ({) char //ps-token many then spaces (}) char xthen then alt
    //delimiter alt put

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to luser droog on Mon Nov 8 09:49:29 2021
    On Sunday, November 7, 2021 at 6:40:43 PM UTC-6, luser droog wrote:
    Here's a rough draft of the grammar for a PS tokenizer using my new functions.
    This is almost the exact same code as the previous version pc9token.ps
    with just a few function names changed and no handlers yet to transform
    the data. My previous code didn't do the recursion for procedures, but
    this one does or should assuming it works.

    I'm building recursive parsers by starting with a "forwarding" proc
    /myparser {-777 exec} def
    which can be composed with other parsers and filled
    in later by doing
    //myparser 0 //composed-parser put
    This is the simplest way I've found so far after struggling with more complicated ways.

    It's missing some stuff like e notation, hex strings, ASCII85.
    pc11atoken.ps:

    (pc11a.ps)run

    /delimiters ( \t\n()/%[]<>{}) def
    /delimiter delimiters anyof def
    /octal (0)(7) range def
    /digit (0)(9) range def
    /alpha (a)(z) range (A)(Z) range alt def
    /regular delimiters noneof def

    /number //digit some def
    /opt-number //digit many def
    /rad-digits //digit //alpha plus some def
    /rad-integer //digit //digit maybe then (#) char then //rad-digits then def /integer (+-) anyof maybe //number then def
    /real (+-) anyof maybe
    //number (.) char then //opt-number then
    (.) char //number then alt then def
    /name //regular some def

    /ps-char {-777 exec} def
    /escape (\\) char
    (\\) char
    (\() char alt
    (\)) char alt
    (n) char alt
    (r) char alt
    (t) char alt
    (b) char alt
    (f) char alt
    //octal //octal maybe then //octal maybe then alt
    then def
    /substring (\() char //ps-char many then (\)) char then def
    //ps-char 0 //escape
    //substring alt
    (()) noneof alt put
    /ps-string (\() char //ps-char many then (\)) char then def


    /hex-char //digit (a)(f) range (A)(F) range alt alt def
    /non-hex-char //hex-char none def
    /hex-string (<) char //non-hex-char many //hex-char xthen many then (>) char then def


    /spaces ( \t\n) anyof many def
    /object {-777 exec} def
    /ps-token //spaces //object xthen def
    /object 0 //rad-integer
    //real alt
    //integer alt
    //name alt
    (/) char //name then alt
    (/) char (/) char then //name then alt
    //ps-string alt

    //hex-string alt


    ({) char //ps-token many then spaces (}) char xthen then alt
    //delimiter alt put

    Adding hex strings needed a new combinator `none` that I'd been able to avoid until now. In earlier versions it had been a factor of `noneof` which matches the inverse of a set of characters.

    pc9.ps:
    noneof { anyof none }
    none {p} { { dup /p exec [] ne { zero }{ item } ifelse exec } ll } @func

    But I found a simpler way to write `anyof` and `noneof` since this version builds everything on top of `pred satisfy`. So they can both use a factor `within` that checks a character against a string.

    pc11.ps:
    anyof { {within} curry satisfy }
    noneof { {within not} curry satisfy }

    But to do the inverse of a parser built out of 3 ranges, I really need the
    more general `none` now.

    So this function takes a parser as a named parameter then constructs
    a new procedure with this parameter substituted inside (like a primitive 'lambda') and yields this procedure as its result.

    none{ p }{
    { dup /p exec +is-ok { pop [ /p ( succeeded) ] fail }{ pop item } ifelse exec } ll
    } @func

    It also just includes the parameter parser as part of the error message
    would could result in a very unhelpful message. But I think it's the best
    that can be done here with the information available. I might be nicer
    if `none` had access to a higher level description of its parameter.
    But I'm not sure how to orchestrate that right now.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to luser droog on Tue Nov 9 08:56:45 2021
    On Monday, November 8, 2021 at 11:49:30 AM UTC-6, luser droog wrote:
    On Sunday, November 7, 2021 at 6:40:43 PM UTC-6, luser droog wrote:
    Here's a rough draft of the grammar for a PS tokenizer using my new functions.

    A little more fleshed out, formatted, and slightly tested. I've been having
    to futz around with the innards of several parsers like `then` and `many`
    to get `xthen` and `thenx` to work reliably. I had been using `append` to combine the results of two sequential parsers, and `append` works like
    a Lisp list append; ie. it scans to the end of the cdr chain and then replaces the last null with the new element. That all works for the most part.

    It fails when you try to do fancy stuff like `xthen` and `thenx`. These
    are sequencing combinators like `then` which runs one parser and
    then the other on the remainder from the first. But `xthen` has the
    extra trick of discarding the result of the first parser, and `thenx`
    discards the result of the second parser.

    These are great for discarding stuff during the parse. Like when
    processing escape codes, some simple ones like (\\) (\() (\)) are
    completely handled by simply discarding the first slash. And all
    the escape handling is simplified by just doing that wholesale
    in all cases.

    But if you're appending results into a long list, then you've lost
    the <first> vs. <second> structure! The obvious solution for that
    was to replace the calls to `append` with calls to `cons` which
    just groups the two parts into a 2-element array: easy to grab
    the two pieces out later.

    Doing that caused a bug that took a while to track down. It caused
    a problem in the handlers, all the procedures composed into the
    parsers with `using`. In all of them I was calling a function called
    `flatten` that only knew how to deal with 1-D Lisp lists. So it went
    wild with a weird non-list cons structure.

    So now it all works by using a more powerful function called
    `unwrap` which can tease apart whatever weird cons tangle
    is thrown at it. But you can't see any of this here; it's all inside
    the `fix` function from pc11a.ps.

    With this fix, I only just now got hex strings to appear to work,
    discarding non hex characters it finds. Still need to interpret
    the hex characters and do some handling for procedures.
    And e notation.

    Then a further challenge if I actually want to emulate the
    `token` operator. I'll need to reliably recreate the remainder substring.
    This string may or may not be reliably tucked into the lazy
    remainder list still in string form. So some fiddly business may
    be needed to reconstruct this string.

    %errordict/typecheck{pq}put
    (pc11a.ps)run <<
    /interpret-octal { 0 exch { first 48 sub exch 8 mul add } forall }
    /to-char { 1 string dup 0 4 3 roll put }
    begin

    /delimiters ( \t\n()/%[]<>{}) def
    /delimiter delimiters anyof def
    /octal (0)(7) range def
    /digit (0)(9) range def
    /alpha (a)(z) range (A)(Z) range alt def
    /regular delimiters noneof def

    /rad-digit //digit //alpha alt def
    /rad-integer //digit //digit maybe then (#) char then //rad-digit some then def
    /number //digit some def
    /opt-number //digit many def
    /integer (+-) anyof maybe //number then def
    /real (+-) anyof maybe
    //number (.) char then //opt-number then
    (.) char //number then alt then def

    /name //regular some def

    /ps-char {-777 exec} def
    /escape (\\) char
    (\\) char
    (\() char alt
    (\)) char alt
    (n) char { pop (\n) one } using alt
    (r) char { pop (\r) one } using alt
    (t) char { pop (\t) one } using alt
    (b) char { pop (\b) one } using alt
    (f) char { pop (\f) one } using alt
    //octal //octal maybe then //octal maybe then
    { fix interpret-octal to-char one } using alt
    xthen def
    /ps-string (\() char //ps-char executeonly many then (\)) char then def //ps-char 0 //escape
    //ps-string alt
    (()) noneof alt put

    /hex-char //digit (a)(f) range (A)(F) range alt alt def
    /non-hex-char //hex-char (>) char alt none def
    /hex-string (<) char
    //non-hex-char many //hex-char xthen many then //non-hex-char many thenx
    (>) char then def

    /spaces ( \t\n) anyof many def
    /object {-777 exec} def
    /ps-token //spaces //object executeonly xthen def

    //object 0 //rad-integer { fix to-string cvi } using
    //real { fix to-string cvr } using alt
    //integer { fix to-string cvi } using alt
    //name { fix to-string cvn cvx } using alt
    (/) char //name then { fix to-string rest cvn cvlit } using alt
    (/) char (/) char then //name then { fix to-string rest rest cvn load } using alt
    //ps-string { fix to-string 1 1 index length 2 sub getinterval } using alt
    //hex-string { fix 1 1 index length 2 sub getinterval } using alt
    ({) char //ps-token many then //spaces (}) char xthen then alt
    //delimiter { fix to-string cvn cvx } using alt
    put

    /mytoken {
    dup length 0 gt {
    0 0 3 2 roll string-input //ps-token exec
    }{ pop false } ifelse
    } def

    {
    0 0 (47) string-input //integer exec pc
    0 0 (47) string-input //number exec pc
    0 0 (8#117) string-input
    //digit //digit maybe then (#) char then //rad-digit some then exec pc
    %quit
    0 0 (8#117) string-input //rad-integer exec pc
    0 0 (1.17) string-input //real exec pc
    } pop

    (8#117) mytoken pc
    (47) mytoken pc
    (string) mytoken pc
    ([stuff) mytoken pc
    (/litname) mytoken pc
    (42.42) mytoken pc
    ((a\\117 \\\\string\\n)) mytoken ps second first print clear
    /thing 12 def
    (//thing) mytoken pc
    (<abc defg>) mytoken pc

    quit

    $ gsnd -dNOSAFER pc11atoken.ps
    GPL Ghostscript 9.52 (2020-03-19)
    Copyright (C) 2020 Artifex Software, Inc. All rights reserved.
    This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
    see the file COPYING for details.
    stack:
    [/OK [79 []]]
    :stack
    stack:
    [/OK [47 {0 2 () string-input}]]
    :stack
    stack:
    [/OK [string []]]
    :stack
    stack:
    [/OK [[ {0 1 (stuff) string-input}]]
    :stack
    stack:
    [/OK [/litname []]]
    :stack
    stack:
    [/OK [42.42 []]]
    :stack
    stack:
    [/OK [(aO \\string\n) {0 18 () string-input}]]
    :stack
    aO \string
    stack:
    [/OK [12 []]]
    :stack
    stack:
    [/OK [[(a) (b) (c) (d) (e) (f)] {0 10 () string-input}]]
    :stack

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to luser droog on Thu Nov 11 15:02:00 2021
    On Tuesday, November 9, 2021 at 10:56:46 AM UTC-6, luser droog wrote:
    On Monday, November 8, 2021 at 11:49:30 AM UTC-6, luser droog wrote:
    On Sunday, November 7, 2021 at 6:40:43 PM UTC-6, luser droog wrote:
    Here's a rough draft of the grammar for a PS tokenizer using my new functions.
    A little more fleshed out, formatted, and slightly tested.
    [snip]
    Still need to interpret
    the hex characters and do some handling for procedures.
    And e notation.

    Then a further challenge if I actually want to emulate the
    `token` operator. I'll need to reliably recreate the remainder substring. This string may or may not be reliably tucked into the lazy
    remainder list still in string form. So some fiddly business may
    be needed to reconstruct this string.


    It is done. All that stuff.

    https://github.com/luser-dr00g/pcomb/blob/f2d20f01a4a4a0fb28e184143f66d0d0f0584bdb/ps/struct2.ps
    https://github.com/luser-dr00g/pcomb/blob/f2d20f01a4a4a0fb28e184143f66d0d0f0584bdb/ps/pc11a.ps
    https://github.com/luser-dr00g/pcomb/blob/f2d20f01a4a4a0fb28e184143f66d0d0f0584bdb/ps/pc11atoken.ps

    $ cat pc11atoken.ps
    %errordict/typecheck{pq}put
    (pc11a.ps)run <<
    /middle { 1 1 index length 2 sub getinterval }
    /interpret-octal { 0 exch { first 48 sub exch 8 mul add } forall }
    /interpret-hex {
    { dup (9) le { first 48 sub }{ first 55 sub dup 15 gt { 32 sub } if } ifelse } map
    dup length 2 mod 1 eq { [ 0 ] compose } if
    [ exch 2 { aload pop exch 16 mul add to-char } fortuple ]
    to-string }
    /interpret-ascii85 {
    { dup (z) eq { pop (!)(!)(!)(!)(!) }{ dup ( \t\n) within { pop } if } ifelse } map
    [ 1 index length 5 mod { (u) } repeat ] compose
    [ exch 5 {
    0 exch { first 33 sub exch 85 mul add } forall
    4 { dup 256 mod exch 256 idiv } repeat pop 4 aa reverse
    { to-char } forall
    } fortuple ]
    to-string }
    begin

    /delimiters ( \t\n()/%[]<>{}) def
    /initials ([]) anyof def
    /delimiter delimiters anyof def
    /octal (0)(7) range def
    /digit (0)(9) range def
    /alpha (a)(z) range (A)(Z) range alt def
    /regular delimiters noneof def
    /spaces ( \t\n) anyof many def

    /rad-digit //digit //alpha alt def
    /rad-integer //digit //digit maybe then (#) char then //rad-digit some then def
    /number //digit some def
    /opt-number //digit many def
    /eE (eE) anyof (+-) anyof maybe then //number then def
    /integer (+-) anyof maybe //number then def
    /real (+-) anyof maybe
    //number (.) char then //opt-number then //eE maybe then
    (.) char //number then //eE maybe then alt
    //number //eE then alt
    then def

    /name //regular some def

    /ps-char {-777 exec} def
    /escape (\\) char
    (\\) char
    (\() char alt
    (\)) char alt
    (n) char { pop (\n) one } using alt
    (r) char { pop (\r) one } using alt
    (t) char { pop (\t) one } using alt
    (b) char { pop (\b) one } using alt
    (f) char { pop (\f) one } using alt
    //octal //octal maybe then //octal maybe then
    { fix interpret-octal to-char one } using alt
    xthen def
    /ps-string (\() char //ps-char executeonly many then (\)) char then def //ps-char 0 //escape
    //ps-string alt
    (()) noneof alt put

    /hex-char //digit (a)(f) range (A)(F) range alt alt def
    /hex-string (<) char
    //spaces //hex-char xthen many then //spaces thenx
    (>) char then def

    /ascii85-char ( )(z) range (\t\n) anyof alt def
    /ascii85-string (<~) str
    //spaces //ascii85-char xthen many xthen //spaces thenx
    (~>) str thenx def

    /object {-777 exec} def
    /ps-token //spaces //object xthen def

    //object 0 //rad-integer { fix to-string cvi } using
    //real { fix to-string cvr } using alt
    //integer { fix to-string cvi } using alt
    (/) char (/) char then //name then { fix to-string rest rest cvn load } using alt
    (/) char //name maybe then { fix to-string rest cvn cvlit } using alt
    //name { fix to-string cvn cvx } using alt
    //ps-string { fix to-string middle } using alt
    //hex-string { fix middle interpret-hex } using alt
    //ascii85-string { fix interpret-ascii85 } using alt
    ({) char
    //ps-token many executeonly xthen
    //spaces %{(s)= ps}using
    (}) char %{(b)= ps}using
    xthen
    thenx { first cvx } using alt
    //initials (<<) str alt (>>) str alt { fix to-string cvn cvx } using alt
    put

    /remainder-length {
    dup zero eq { pop 0 }{
    dup type /arraytype ne { what? }{
    dup xcheck {
    2 get length
    }{
    second remainder-length 1 add
    } ifelse
    } ifelse } ifelse
    } def

    /mytoken {
    dup length 0 gt {
    dup 0 0 3 2 roll string-input //ps-token exec +is-ok { % s result=ok
    second aload pop % s res rem
    %dup zero eq { 3 -1 roll pop pop () exch true }{
    %dup type /arraytype eq 1 index xcheck and { 3 -1 roll pop 2 get exch true }{
    3 -1 roll exch remainder-length 1 index length 1 index sub exch getinterval
    exch true
    %} ifelse } ifelse
    }{ % s result=not-ok
    pop pop false
    } ifelse
    }{ pop false } ifelse
    } def

    /test-mytoken {
    /s exch def
    s token %1 index type =
    s mytoken %1 index type =
    } def

    {
    0 0 (47) string-input //integer exec pc
    0 0 (47) string-input //number exec pc
    0 0 (8#117) string-input
    //digit //digit maybe then (#) char then //rad-digit some then exec pc
    %quit
    0 0 (8#117) string-input //rad-integer exec pc
    0 0 (1.17) string-input //real exec pc
    } pop

    (8#117) test-mytoken pc
    (47) test-mytoken pc
    (string) test-mytoken pc
    ([stuff) test-mytoken pc
    (/litname) test-mytoken pc
    (42.42) test-mytoken pc
    ((a\\117 \\\\string\\n)) test-mytoken pc
    /thing 12 def
    (//thing) test-mytoken pc
    (<abc def >) test-mytoken pc
    (name[delim) test-mytoken pc
    ({a proc}) test-mytoken pc
    (/(str)) test-mytoken pc
    (2e5) test-mytoken pc
    (<~ 9jq o^~>) test-mytoken pc

    quit

    $ gsnd -dNOSAFER pc11atoken.ps
    GPL Ghostscript 9.52 (2020-03-19)
    Copyright (C) 2020 Artifex Software, Inc. All rights reserved.
    This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
    see the file COPYING for details.
    stack:
    true
    79
    ()
    true
    79
    ()
    :stack
    stack:
    true
    47
    ()
    true
    47
    ()
    :stack
    stack:
    true
    string
    ()
    true
    string
    ()
    :stack
    stack:
    true
    [
    (stuff)
    true
    [
    (stuff)
    :stack
    stack:
    true
    /litname
    ()
    true
    /litname
    ()
    :stack
    stack:
    true
    42.42
    ()
    true
    42.42
    ()
    :stack
    stack:
    true
    (aO \\string\n)
    ()
    true
    (aO \\string\n)
    ()
    :stack
    stack:
    true
    12
    ()
    true
    12
    ()
    :stack
    stack:
    true
    (\253\315\357)
    ()
    true
    (\253\315\357)
    ()
    :stack
    stack:
    true
    name
    ([delim)
    true
    name
    ([delim)
    :stack
    stack:
    true
    {a proc}
    ()
    true
    {a proc}
    ()
    :stack
    stack:
    true
    /
    (\(str\))
    true
    /
    (\(str\))
    :stack
    stack:
    true
    200000.0
    ()
    true
    200000.0
    ()
    :stack
    stack:
    true
    (Man )
    ()
    true
    (Man )
    ()
    :stack

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)