• Discard extraneous structure returned by parser combinators?

    From luserdroog@21:1/5 to All on Sat Nov 13 20:54:08 2021
    Is there a systematic way to discard the extra noise that can occur
    when using parser combinators? For example, the `many` combinator
    which matches zero or more instances of its argument parser.
    In the case of zero matches, it still needs to return a value.

    In my situation, I'm simulating everything in PostScript because it's
    my favorite language. I'm simulating Lisp cons cells as 2-element
    arrays. So for this JSON string,

    ( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse report

    if I make no special effort, I get a resulting value that looks like this:

    OK
    [[3 [[4 [[5 [[] []]] []]] []]] []]
    remainder:[]

    All those little empty arrays need to just go away, but not any of the important array structure. `many` and `maybe` seem to be the chief
    culprits, but then their results are propagated back by `alt`s and
    `then`s all the way back to the top.

    Do I need to make some kind of out-of-band signal for these "zeros"
    that I can filter out later? The obvious problem here is that the array
    type is being used for too many things. But there's a paucity of
    types in PostScript, sigh. For the JSON application, I have nametype
    objects available that don't have a JSON corollary.

    Do I need to rewrite all the combinators to filter out noise values at
    every turn?.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to luserdroog on Sun Nov 14 16:01:14 2021
    luserdroog <mijoryx@yahoo.com> writes:

    Is there a systematic way to discard the extra noise that can occur
    when using parser combinators? For example, the `many` combinator
    which matches zero or more instances of its argument parser.
    In the case of zero matches, it still needs to return a value.

    In my situation, I'm simulating everything in PostScript because it's
    my favorite language. I'm simulating Lisp cons cells as 2-element
    arrays. So for this JSON string,

    ( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse report

    if I make no special effort, I get a resulting value that looks like this:

    OK
    [[3 [[4 [[5 [[] []]] []]] []]] []]
    remainder:[]

    All those little empty arrays need to just go away, but not any of the important array structure.

    So you want

    [[3 [[4 [[5 []]]]]]]

    ?

    `many` and `maybe` seem to be the chief
    culprits, but then their results are propagated back by `alt`s and
    `then`s all the way back to the top.

    Do I need to make some kind of out-of-band signal for these "zeros"
    that I can filter out later? The obvious problem here is that the array
    type is being used for too many things. But there's a paucity of
    types in PostScript, sigh. For the JSON application, I have nametype
    objects available that don't have a JSON corollary.

    Do I need to rewrite all the combinators to filter out noise values at
    every turn?.

    It's odd to call something that you are returning (presuambly) as
    noise. Are you using lists as a sort of Maybe monad with [] as Nothing?

    I think you'd have to show the code to get anything more concrete as a
    reply.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul Rubin@21:1/5 to luserdroog on Sun Nov 14 15:11:34 2021
    luserdroog <mijoryx@yahoo.com> writes:
    Is there a systematic way to discard the extra noise that can occur
    when using parser combinators? For example, the `many` combinator
    which matches zero or more instances of its argument parser.
    In the case of zero matches, it still needs to return a value.

    I'd expect 'many' to return a list, an empty list in the case of zero
    matches. What is the extra noise? Your PostScript example is
    confusing. I'd expect ( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse to give
    something like [3, [4, [5]]], using square brackets to denote lists.

    I didn't know parser combinators were even a thing in PostScript: or are
    you trying to implement them? You could look at the Parsec paper to see
    how they traditionally worked:

    https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/parsec-paper-letter.pdf

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luserdroog@21:1/5 to Ben Bacarisse on Sun Nov 14 19:27:58 2021
    On Sunday, November 14, 2021 at 10:01:16 AM UTC-6, Ben Bacarisse wrote:
    luserdroog <mij...@yahoo.com> writes:

    Is there a systematic way to discard the extra noise that can occur
    when using parser combinators? For example, the `many` combinator
    which matches zero or more instances of its argument parser.
    In the case of zero matches, it still needs to return a value.

    In my situation, I'm simulating everything in PostScript because it's
    my favorite language. I'm simulating Lisp cons cells as 2-element
    arrays. So for this JSON string,

    ( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse report

    if I make no special effort, I get a resulting value that looks like this:

    OK
    [[3 [[4 [[5 [[] []]] []]] []]] []]
    remainder:[]

    All those little empty arrays need to just go away, but not any of the important array structure.
    So you want

    [[3 [[4 [[5 []]]]]]]

    ?

    I guess that's the big problem here. I'm not sure what I want. I keep having
    to add extra code to clean up and delete the extra stuff. Ultimately the
    result should be

    [ 3 [ 4 [ 5 ] ] ]

    The parser for arrays looks for the left bracket, then ...

    /Jarray //begin-array
    //value executeonly xthen
    //value-separator //value executeonly xthen many then %{ps flatten ps} using
    maybe
    //end-array thenx
    { %filter-zeros first %ps
    } using def

    The `executeonly` are in there to prevent infinite recursion if the expanded code ever gets printed (like in a stack dump while debugging). The /Jarray parser is one of the components of the //value parser.

    Hmm. Initially I had the `then` combinator doing a Lisp-style (append) operation on my simulated lists, so something like

    (a) char (b) char then

    would -- if matched by the input -- return

    [ (a) [ (b) [] ] ]

    which I could then easily massage into

    [ (a) (b) ]

    But that led me into problems when I wanted to use the combinators
    `xthen` and `thenx` which discard one of the two pieces. If the results
    are just appended together in a list, then I've lost the information to
    peel them back apart. So I changed `then` to just (cons) the pieces
    together, and now `xthen` and `thenx` have an easy job.

    And extra noise values pop up if I use combinators like `maybe`
    and `many` which might succeed with zero matches. So, ...

    `many` and `maybe` seem to be the chief
    culprits, but then their results are propagated back by `alt`s and
    `then`s all the way back to the top.

    Do I need to make some kind of out-of-band signal for these "zeros"
    that I can filter out later? The obvious problem here is that the array type is being used for too many things. But there's a paucity of
    types in PostScript, sigh. For the JSON application, I have nametype objects available that don't have a JSON corollary.

    Do I need to rewrite all the combinators to filter out noise values at every turn?.
    It's odd to call something that you are returning (presuambly) as
    noise. Are you using lists as a sort of Maybe monad with [] as Nothing?


    Yes, I think that's what I'm doing, clumsily.

    I think you'd have to show the code to get anything more concrete as a
    reply.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luserdroog@21:1/5 to Paul Rubin on Sun Nov 14 19:29:54 2021
    On Sunday, November 14, 2021 at 5:11:37 PM UTC-6, Paul Rubin wrote:
    luserdroog <mij...@yahoo.com> writes:
    Is there a systematic way to discard the extra noise that can occur
    when using parser combinators? For example, the `many` combinator
    which matches zero or more instances of its argument parser.
    In the case of zero matches, it still needs to return a value.
    I'd expect 'many' to return a list, an empty list in the case of zero matches. What is the extra noise? Your PostScript example is
    confusing. I'd expect ( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse to give
    something like [3, [4, [5]]], using square brackets to denote lists.

    I didn't know parser combinators were even a thing in PostScript: or are
    you trying to implement them? You could look at the Parsec paper to see
    how they traditionally worked:

    https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/parsec-paper-letter.pdf

    It's something I've been trying to do in PostScript for a while now.
    A lot of the saga is detailed in comp.lang.postscript.
    Code is at github.com/luser-dr00g/pcomb/ps

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ben Bacarisse@21:1/5 to luserdroog on Mon Nov 15 10:45:10 2021
    luserdroog <mijoryx@yahoo.com> writes:

    On Sunday, November 14, 2021 at 10:01:16 AM UTC-6, Ben Bacarisse wrote:
    luserdroog <mij...@yahoo.com> writes:

    Is there a systematic way to discard the extra noise that can occur
    when using parser combinators? For example, the `many` combinator
    which matches zero or more instances of its argument parser.
    In the case of zero matches, it still needs to return a value.

    In my situation, I'm simulating everything in PostScript because it's
    my favorite language. I'm simulating Lisp cons cells as 2-element
    arrays. So for this JSON string,

    ( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse report

    if I make no special effort, I get a resulting value that looks like this: >> >
    OK
    [[3 [[4 [[5 [[] []]] []]] []]] []]
    remainder:[]

    All those little empty arrays need to just go away, but not any of the
    important array structure.
    So you want

    [[3 [[4 [[5 []]]]]]]

    ?

    I guess that's the big problem here. I'm not sure what I want. I keep having to add extra code to clean up and delete the extra stuff. Ultimately the result should be

    [ 3 [ 4 [ 5 ] ] ]

    The parser for arrays looks for the left bracket, then ...

    /Jarray //begin-array
    //value executeonly xthen
    //value-separator //value executeonly xthen many then %{ps flatten ps} using
    maybe
    //end-array thenx
    { %filter-zeros first %ps
    } using def

    The `executeonly` are in there to prevent infinite recursion if the expanded code ever gets printed (like in a stack dump while debugging). The /Jarray parser is one of the components of the //value parser.

    Hmm. Initially I had the `then` combinator doing a Lisp-style (append) operation on my simulated lists, so something like

    (a) char (b) char then

    would -- if matched by the input -- return

    [ (a) [ (b) [] ] ]

    which I could then easily massage into

    [ (a) (b) ]

    I think you need to pin down what you want. You made a remark about
    using two-element arrays as cons cells. In that case, parsing (a) and
    (b) in sequence /should/ give [ (a) [ (b) [] ] ]. Massaging that into something else seems like the wrong strategy.

    --
    Ben.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luserdroog@21:1/5 to Ben Bacarisse on Mon Nov 15 14:13:30 2021
    On Monday, November 15, 2021 at 4:45:12 AM UTC-6, Ben Bacarisse wrote:
    luserdroog <mij...@yahoo.com> writes:

    On Sunday, November 14, 2021 at 10:01:16 AM UTC-6, Ben Bacarisse wrote:
    luserdroog <mij...@yahoo.com> writes:

    Is there a systematic way to discard the extra noise that can occur
    when using parser combinators? For example, the `many` combinator
    which matches zero or more instances of its argument parser.
    In the case of zero matches, it still needs to return a value.

    In my situation, I'm simulating everything in PostScript because it's
    my favorite language. I'm simulating Lisp cons cells as 2-element
    arrays. So for this JSON string,

    ( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse report

    if I make no special effort, I get a resulting value that looks like this:

    OK
    [[3 [[4 [[5 [[] []]] []]] []]] []]
    remainder:[]

    All those little empty arrays need to just go away, but not any of the >> > important array structure.
    So you want

    [[3 [[4 [[5 []]]]]]]

    ?

    I guess that's the big problem here. I'm not sure what I want. I keep having
    to add extra code to clean up and delete the extra stuff. Ultimately the result should be

    [ 3 [ 4 [ 5 ] ] ]

    The parser for arrays looks for the left bracket, then ...

    /Jarray //begin-array
    //value executeonly xthen
    //value-separator //value executeonly xthen many then %{ps flatten ps} using
    maybe
    //end-array thenx
    { %filter-zeros first %ps
    } using def

    The `executeonly` are in there to prevent infinite recursion if the expanded
    code ever gets printed (like in a stack dump while debugging). The /Jarray parser is one of the components of the //value parser.

    Hmm. Initially I had the `then` combinator doing a Lisp-style (append) operation on my simulated lists, so something like

    (a) char (b) char then

    would -- if matched by the input -- return

    [ (a) [ (b) [] ] ]

    which I could then easily massage into

    [ (a) (b) ]
    I think you need to pin down what you want. You made a remark about
    using two-element arrays as cons cells. In that case, parsing (a) and
    (b) in sequence /should/ give [ (a) [ (b) [] ] ]. Massaging that into something else seems like the wrong strategy.


    Yes, I think I may have presented an X/Y problem or started the story
    from the middle. In all of my test cases for this version (regexs,
    Postscript scanner, JSON parser) I had to write a `fix` function
    to convert these lists into arrays that I can work with more simply.

    But with each test case, I've had to re-write the `fix`. And it keeps
    running into more problems. The final one that "broke the camel's back"
    and sent me searching for the deeper design flaw was this:

    ( [ {"a":4,"b":5}, 6, {"c":"7"}] ) dup JSON-parse report

    which yields this:

    OK
    [[<< /a 4 /b 5 >> [6 << /c (7) >>]] []]
    remainder:[]

    I can remove the trailing [] easily. But the extra array in the middle
    grouping the 6 and the dictionary, I don't know where to patch in
    to grammar to remove that one. So that seems to show that I'm
    doing something wrong deeper down in the organization of the
    program.

    So, it appears I need to back up and rebuild the pieces more slowly,
    making sure that the regex example doesn't need any kind of `fix`
    before moving on to the more complicated ones.

    On a similar front, I can rewrite `xthen` and `thenx` to do their
    jobs without composing off of `then`. That way I don't need to
    worry about the result of `then` needing to be taken apart again into
    2 pieces.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luserdroog@21:1/5 to luserdroog on Sat Nov 20 16:49:39 2021
    On Monday, November 15, 2021 at 4:13:31 PM UTC-6, luserdroog wrote:
    On Monday, November 15, 2021 at 4:45:12 AM UTC-6, Ben Bacarisse wrote:
    luserdroog <mij...@yahoo.com> writes:

    On Sunday, November 14, 2021 at 10:01:16 AM UTC-6, Ben Bacarisse wrote:
    luserdroog <mij...@yahoo.com> writes:

    Is there a systematic way to discard the extra noise that can occur
    when using parser combinators? For example, the `many` combinator
    which matches zero or more instances of its argument parser.
    In the case of zero matches, it still needs to return a value.

    In my situation, I'm simulating everything in PostScript because it's >> > my favorite language. I'm simulating Lisp cons cells as 2-element
    arrays. So for this JSON string,

    ( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse report

    if I make no special effort, I get a resulting value that looks like this:

    OK
    [[3 [[4 [[5 [[] []]] []]] []]] []]
    remainder:[]

    All those little empty arrays need to just go away, but not any of the >> > important array structure.
    So you want

    [[3 [[4 [[5 []]]]]]]

    ?

    I guess that's the big problem here. I'm not sure what I want. I keep having
    to add extra code to clean up and delete the extra stuff. Ultimately the result should be

    [ 3 [ 4 [ 5 ] ] ]

    The parser for arrays looks for the left bracket, then ...

    /Jarray //begin-array
    //value executeonly xthen
    //value-separator //value executeonly xthen many then %{ps flatten ps} using
    maybe
    //end-array thenx
    { %filter-zeros first %ps
    } using def

    The `executeonly` are in there to prevent infinite recursion if the expanded
    code ever gets printed (like in a stack dump while debugging). The /Jarray
    parser is one of the components of the //value parser.

    Hmm. Initially I had the `then` combinator doing a Lisp-style (append) operation on my simulated lists, so something like

    (a) char (b) char then

    would -- if matched by the input -- return

    [ (a) [ (b) [] ] ]

    which I could then easily massage into

    [ (a) (b) ]
    I think you need to pin down what you want. You made a remark about
    using two-element arrays as cons cells. In that case, parsing (a) and
    (b) in sequence /should/ give [ (a) [ (b) [] ] ]. Massaging that into something else seems like the wrong strategy.

    Yes, I think I may have presented an X/Y problem or started the story
    from the middle. In all of my test cases for this version (regexs,
    Postscript scanner, JSON parser) I had to write a `fix` function
    to convert these lists into arrays that I can work with more simply.
    [snip]

    Ugh. I think it wasn't even an X/Y problem. It was a "doctor it hurts when
    I move my arm like this; ... so don't move your arm like that" problem.

    I want the "result" part of the "reply" structure (using new terms following usage from the Parsec document) to be any of the /usual/ PostScript types: integer, real, string, boolean, array, dictionary.

    But I also need some way to arbitrarily combine or concatenate two
    objects regardless of type. My `then` (aka `seq`) combinator needs to
    do this. So I made a hack-y function that does the combining. If it has
    two arrays, it composes the contents into a longer array. If it has one
    array and some other object it extends the array by one and stuffs
    the object in the front or back as appropriate. If it has two non-array
    objects it makes a new 2-element array to contain them.

    So, instead of building `xthen` and `thenx` off of `then` and needing
    to cons, car, and cdr the stuff, I can write all 3 of these as a more
    general parameterized function.

    sequence{ p q u }{
    { /p exec +is-ok {
    next x-xs force /q exec +is-ok {
    next x-xs 3 1 roll /u exec exch consok
    }{
    x-xs 3 2 roll ( after ) exch cons exch cons cons
    } ifelse
    } if } ll } @func
    then { {append} sequence }
    xthen { {exch pop} sequence }
    thenx { {pop} sequence }

    append { 1 index zero eq { exch pop }{
    dup zero eq { pop }{
    1 index type /arraytype eq {
    dup type /arraytype eq { compose }{ one compose } ifelse
    }{ dup type /arraytype eq { curry }{ cons } ifelse } ifelse } ifelse } ifelse }

    (`@func` is my own non-standard extension to PostScript that takes
    a procedure body and list of parameters and wraps the procedure with
    code that defines the arguments in a local dictionary. `ll` is my hack-y PostScript way of making lambdas with hard-patched parameters, it's
    short for `load all literals.`)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luserdroog@21:1/5 to luserdroog on Sat Nov 20 18:17:23 2021
    On Saturday, November 13, 2021 at 10:54:09 PM UTC-6, luserdroog wrote:
    Is there a systematic way to discard the extra noise that can occur
    when using parser combinators? For example, the `many` combinator
    which matches zero or more instances of its argument parser.
    In the case of zero matches, it still needs to return a value.

    [snip]
    Sigh. I already had this same problem before. It came up when I googled it. [sad trombone]

    https://stackoverflow.com/q/55346600/733077

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luserdroog@21:1/5 to luserdroog on Sat Nov 20 19:55:26 2021
    On Sunday, November 14, 2021 at 9:29:55 PM UTC-6, luserdroog wrote:
    On Sunday, November 14, 2021 at 5:11:37 PM UTC-6, Paul Rubin wrote:
    luserdroog <mij...@yahoo.com> writes:
    Is there a systematic way to discard the extra noise that can occur
    when using parser combinators? For example, the `many` combinator
    which matches zero or more instances of its argument parser.
    In the case of zero matches, it still needs to return a value.
    I'd expect 'many' to return a list, an empty list in the case of zero matches. What is the extra noise? Your PostScript example is
    confusing. I'd expect ( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse to give
    something like [3, [4, [5]]], using square brackets to denote lists.

    I didn't know parser combinators were even a thing in PostScript: or are you trying to implement them? You could look at the Parsec paper to see
    how they traditionally worked:

    https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/parsec-paper-letter.pdf
    It's something I've been trying to do in PostScript for a while now.
    A lot of the saga is detailed in comp.lang.postscript.
    Code is at github.com/luser-dr00g/pcomb/ps

    Ooops. Bad link. Here it at:

    https://github.com/luser-dr00g/pcomb/tree/master/ps

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luserdroog@21:1/5 to luserdroog on Fri Dec 3 21:21:48 2021
    On Saturday, November 20, 2021 at 8:17:24 PM UTC-6, luserdroog wrote:
    On Saturday, November 13, 2021 at 10:54:09 PM UTC-6, luserdroog wrote:
    Is there a systematic way to discard the extra noise that can occur
    when using parser combinators? For example, the `many` combinator
    which matches zero or more instances of its argument parser.
    In the case of zero matches, it still needs to return a value.

    [snip]
    Sigh. I already had this same problem before. It came up when I googled it. [sad trombone]

    https://stackoverflow.com/q/55346600/733077

    Coda: The rewrite is proceeding well. I got the basic parsers typed up fresh and simplified from the previous round. And I got the regular expression
    parser (test case) written with less gyrations than before. And I got the PostScript
    scanner (test case) written but there I did end up needing a `fix` function.

    Precisely because of my earlier decision that the result could be any type
    or an array of any number of any type, when processing this result ... I do kinda need to know whether the thing is an array or not and handle them differently.

    So, it's the same but different, you know? I need to call `fix` but it makes sense why I have to do it, and importantly *where* it will need to be done
    to get the thing to work right.

    I suppose the moral is, in this situation I can follow the Lisp stuff a
    little less literally. And possibly more broadly, as I've lamented in both comp.lang.c and comp.lang.postscript.

    Every few rewrites I try to clean up the lazy evaluation and be more general and strategic with it. But every time I just end up sprinkling in calls to `force`
    until it works right.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)