• Further struggles with parsers

    From luser droog@21:1/5 to All on Sun Sep 22 01:13:09 2019
    I turned back to my parser code and tried to add
    string handling to the simple lexer for PostScript
    tokens. And it works for simple strings, but not
    when I try to do something fancy like converting
    backslant escapes.

    The problem is that the outer handler code relies
    upon the matched string to be the correct length
    to advance in the input. If I convert \n to a
    single newline character, then the outer handler
    doesn't consume the final closing paren from the
    input.

    So, I wondered a lot and searched for some magical
    way to get the length information by composing
    the parser monads with a state monad. And I got
    some interesting links to read, but nothing really
    gelled. So I turned back to a paper by Graham
    Hutton, Higher Order Functions for Parsing.

    This paper describes preprocessing the input stream
    to add (row, col) decorations to each character.
    The 'satify(pred)' parser filters out the extra
    decoration, and thus any parsers built out of
    'satisfy' are not any more complicated to deal with
    extra noise.

    So, I've made some headway in rewriting everything
    to follow this new idea. My string-input function
    now has to produce a lazy list of [char [row col]]
    structures. Then there's a new function 'tok(p tag)'
    which decorates the result from the parser p with
    a structure like [[/tag (matched)] [len row col]].
    (The paper just has 'row col', but length was the
    thing I really needed.)

    And I'll spare you the code until it's more complete,
    but it will be more usefully commented than the previous
    incarnation (for my own sake). But here's a small
    example and output.

    $ tail -3 pc10.ps
    (abc) string-input (ab) str exec ps clear / =
    (abc) string-input (ab) str /AB tok exec ps first first ps
    quit

    $ gsnd -q -DNOSAFER pc10.ps
    stack:
    [[[97 98] [[99 [2 2 0]] {[() [3 3 0]] string-next}]]]

    stack:
    [[[[/AB (ab)] [2 0 0]] [[99 [2 2 0]] {[() [3 3 0]] string-next}]]]
    stack:
    [[/AB (ab)] [2 0 0]]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to luser droog on Thu Sep 26 04:07:34 2019
    On Sunday, September 22, 2019 at 3:13:10 AM UTC-5, luser droog wrote:

    So, I've made some headway in rewriting everything
    to follow this new idea. My string-input function
    now has to produce a lazy list of [char [row col]]
    structures. Then there's a new function 'tok(p tag)'
    which decorates the result from the parser p with
    a structure like [[/tag (matched)] [len row col]].
    (The paper just has 'row col', but length was the
    thing I really needed.)

    And I'll spare you the code until it's more complete,
    but it will be more usefully commented than the previous
    incarnation (for my own sake). But here's a small
    example and output.

    $ tail -3 pc10.ps
    (abc) string-input (ab) str exec ps clear / =
    (abc) string-input (ab) str /AB tok exec ps first first ps
    quit

    $ gsnd -q -DNOSAFER pc10.ps
    stack:
    [[[97 98] [[99 [2 2 0]] {[() [3 3 0]] string-next}]]]

    stack:
    [[[[/AB (ab)] [2 0 0]] [[99 [2 2 0]] {[() [3 3 0]] string-next}]]]
    stack:
    [[/AB (ab)] [2 0 0]]


    Further wrestling ensued when I realized that I had subtly slipped into
    using arrays when I had intended to use lists. So now it uses lists.
    And after much debugging, it finally produces the same results above.

    The next big issue is that none of the combinators are trying to be lazy.
    The input string becomes a lazy list, but sequences and especially
    alternates are prime unexploited candidates for returning an unevaluated continuation.

    I have some guidance in the working C version. Sadly, the important
    parts were hacked together until something appeared to work. So it's
    not a completely sound, thoroughly explained mathematical model so
    to speak.

    But the important feature is now present and working, viz. that
    character positioning information is supplied by the *-input
    continuation, and suppressed by satisfy() and any parents of satisfy
    except for tok() which can gather the position info via a weird monadic
    reach around.

    So, next up: laziness! <ironic exciting exhultation!>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to luser droog on Sat Sep 28 23:49:33 2019
    On Thursday, September 26, 2019 at 6:07:35 AM UTC-5, luser droog wrote:

    So, next up: laziness! <ironic exciting exhultation!>

    I just found some time to look at this again. I set about to compare
    my new PostScript version against the C version which shares the
    same parent. I found just now that 'chain' or 'bind' -- the function
    that links up parsers -- doesn't actually need to worry about laziness,
    it calls a few other functions, and each of them has to worry about
    laziness.

    So I checked out the subfunctions 'fmap' and 'fold' which were recently written, and they already had a clause checking for a suspension given
    as data. I commented out the code in the clause, about to write something
    to delay evaluation and return a new suspension. And then I actually
    read the code I had just commented out, and it already did all that.

    Consequently I don't really know the state of the code after devoting
    scattered minutes dispersed over several weeks on it.

    But, for posterity here it is. There's a comment block of the "API" such as it is.

    $ cat pc10.ps
    errordict/rangecheck{ps countexecstack array execstack == quit}put (struct2.ps)run { % comments below
    nl (\n) @first
    eof { % :a
    /z [ -1 a second ] def
    z one
    }
    string-input (s/string) { [ [s [0 0 0]] /string-next cvx ] cvx } @func string-next (a/array) { a first length 0 eq { eof }{
    /z [ a first first a second ] def
    /r [ a first rest a second new-position ] def
    [ z [ r /string-next cvx ] cvx ] } ifelse } @func
    new-position { % :z
    spill 3 2 roll 1 add 3 1 roll exch
    z first nl eq { pop 1 add 0 }{ 1 add } ifelse exch 3 aa }
    filename-input { (r) file file-input }
    file-input (f/file) { [ [f [0 0 0]] /file-next cvx ] cvx } @func
    file-next (a/array) { a first read not { eof }{
    /z [ 3 2 roll a second ] def
    /r [ a first a second new-position ] def
    [ z [ r /file-next cvx ] cvx ] } ifelse } @func

    nop {}
    zero { { pop [] } }
    result {v} { { /v exch cons one } ll } @func
    item { { @ dup first first -1 ne { xs-x result }{ zero } ifelse exec } } chain {m f} { { /m exec { xs-x /f exec } fmap join } ll } @func
    join { {cat} [] fold nop }
    append { 1 index second [] eq { 1 exch put }{ exch second exch append } ifelse }
    cat { 1 index 3 1 roll append }

    satisfy {pred} { item { first dup /pred exec { one result }{ pop zero } ifelse exec } ll
    chain } @func
    lit {x} { { /x eq } ll satisfy } @func
    range {a b} { { dup /a ge exch /b le and } ll satisfy } @func
    char { first lit }
    str { {lit} map {seq} reduce }

    seq { { /_ load exch cat result exec } chain { <<exch/_ exch>>begin }{ end } wrap chain }
    xthen { seq { second result exec } chain }
    thenx { seq { first result exec } chain }

    into {p v q} { /p load { <<exch/v exch>>begin /q exec end } ll chain } @func tok {t} { { to-string
    /t exch cons
    1 index first second first inp first second spill 3 2 roll pop 3 aa
    cons exch cons one } ll chain
    { @ <</inp 2 index>>begin }{ end } wrap } @func

    alt {p q} { { dup /p exec exch /q exec compose } ll } @func
    anyof { {lit} any }
    any { map {alt} reduce }
    noneof { anyof none }
    none {p} { { dup /p exec [] ne { zero }{ item } ifelse exec } ll } @func

    maybe { [] result alt }
    many { {{-777 exec}exec} ll dup first 3 1 roll seq maybe % x* = xx*
    2 copy 0 exch put exch pop executeonly }
    some { dup many seq }
    using { { result exec } compose chain }
    trim {p} { { /p exec dup length 0 gt { 0 1 getinterval } if } ll } @func

    to-string { dup type /integertype eq { [] cons } if array-from-list string-from-array }
    wrap { 3 2 roll exch compose compose }
    ll { {load-if-literal-name} deep-map }
    deep-map { 1 index type /arraytype ne { exec }{
    1 index xcheck 3 1 roll [ 3 1 roll /deep-map cvx 2 aa cvx forall ] exch {cvx} if} ifelse }
    list-from-array { dup length 0 gt { [ exch dup 0 get exch rest list-from-array ] } if }
    array-from-list { dup length 0 ne { [ exch all-list-elements ] } if } all-list-elements { dup first exch next dup length 0 eq {pop}{ all-list-elements } ifelse}
    string-from-array { dup length string
    0 1 2 index length 1 sub {3 copy exch pop get 3 copy put pop pop} for exch pop }

    fold {
    %(fold)= ps
    2 index xcheck {
    /@ cvx 3 1 roll /fold cvx 5 aa cvx
    }{
    2 index length 0 eq { 3 1 roll pop pop }{
    2 index second [] eq { pop pop first }{
    3 2 roll % f z l
    spill % f z l_0 l_1
    3 index exch 5 3 roll % l0 f l1 f z
    fold % l0 f l1'
    exch exec
    } ifelse
    } ifelse
    } ifelse
    }
    fmap {
    %(fmap)= ps
    1 index xcheck {
    /@ cvx exch /fmap cvx 4 aa cvx
    }{
    1 index length 0 eq { pop }{
    1 index first 1 index exec 3 1 roll exch second exch fmap cons
    } ifelse
    } ifelse
    }
    take {x n}{ n 0 eq { [] }{ [ /x load x-xs n 1 sub take ] } ifelse } @func drop {n}{ n 0 gt { next n 1 sub drop } if } @func
    x-xs { @ dup first exch next } % car and cdr or cdr and car
    xs-x { @ dup next exch first }
    next { dup second xcheck { dup 1 {@} update } if second } % force and update cdr
    update { 3 copy pop get exch exec put } % a i p a[i]=p(a[i])
    /@ { { dup xcheck { exec }{ exit } ifelse } loop } % force

    cons { 2 aa } one { [] cons } second { 1 get }
    spill { {} forall } aa { array astore }
    ps { (stack:)= pstack } pc { ps clear } pq { ps quit }
    } pairs-begin


    {
    (abc) {} map string-from-array pc
    (abc\ndef) {}map list-from-array ps clear / =
    (abc\ndef) string-input ps @ array-from-list ps clear / =
    (abc\ndef) string-input @ ps array-from-list {first} map ps clear / =
    (abc\nde) string-input ps @ dup 6 drop ps pop array-from-list ps
    {first} map ps list-from-array ps clear / =

    } pop
    {
    (abc\nde) string-input item exec ps first ps first ps clear / =
    () string-input ps item exec ps clear / =
    } pop

    {
    (abc) string-input {== ps true} satisfy exec ps first ps first ps clear / = (abc) string-input {== ps false} satisfy exec ps clear / =
    (abc) string-input (a) char exec ps clear / =
    (abc) string-input (ab) str exec ps first ps first ps clear / =
    (abc) string-input (ab) str /AB tok exec ps first first ps clear / =
    (bbb) string-input (abc) anyof exec ps first ps spill ps clear / =
    } exec %pop
    {currentfile file-input (ab) str /AB tok exec ps}exec
    abc
    quit


    % (string) string-input {...}
    %produce lazy list of [char [row col]] records

    % (filename) filename-input {...}
    % -file- file-input {...}

    % *-input parser [[result remainder]*]

    % - zero parser
    % parser that fails

    % value result parser
    % parser that returns [[value remainder]]

    % - item parser
    % parser that matches non-empty input

    % m f chain parser
    % map results from m through f and flatten

    % [[ [...]* ]*] join [[...]*]

    % pred satisfy parser
    % strip [row col] and validate char with pred

    % x lit parser
    % match literal x in input

    % (c) char parser
    % match char from single-char string

    % (string) str parser
    % match string in input

    % p q seq parser
    % call p and q in sequence, returning sequence of results

    % p v q into parser
    % call p and q in sequence, providing to q the result of p defined as v

    % p tag tok parser
    % call p, returning [[tag (match)] [len row col]]

    % x n take/drop x'
    % take or drop n elements from front of list x, forcing evaluation

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)