Forum: >>> Magnum BBS <<<

Discard extraneous structure returned by parser combinators?

From luserdroog@21:1/5 to All on Sat Nov 13 20:54:08 2021

Is there a systematic way to discard the extra noise that can occur
when using parser combinators? For example, the `many` combinator
which matches zero or more instances of its argument parser.
In the case of zero matches, it still needs to return a value.

In my situation, I'm simulating everything in PostScript because it's
my favorite language. I'm simulating Lisp cons cells as 2-element
arrays. So for this JSON string,

( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse report

if I make no special effort, I get a resulting value that looks like this:

OK
[[3 [[4 [[5 [[] []]] []]] []]] []]
remainder:[]

All those little empty arrays need to just go away, but not any of the important array structure. `many` and `maybe` seem to be the chief
culprits, but then their results are propagated back by `alt`s and
`then`s all the way back to the top.

Do I need to make some kind of out-of-band signal for these "zeros"
that I can filter out later? The obvious problem here is that the array
type is being used for too many things. But there's a paucity of
types in PostScript, sigh. For the JSON application, I have nametype
objects available that don't have a JSON corollary.

Do I need to rewrite all the combinators to filter out noise values at
every turn?.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to luserdroog on Sun Nov 14 16:01:14 2021

luserdroog <mijoryx@yahoo.com> writes:

Is there a systematic way to discard the extra noise that can occur
when using parser combinators? For example, the `many` combinator
which matches zero or more instances of its argument parser.
In the case of zero matches, it still needs to return a value.

In my situation, I'm simulating everything in PostScript because it's
my favorite language. I'm simulating Lisp cons cells as 2-element
arrays. So for this JSON string,

( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse report

if I make no special effort, I get a resulting value that looks like this:

OK
[[3 [[4 [[5 [[] []]] []]] []]] []]
remainder:[]

All those little empty arrays need to just go away, but not any of the important array structure.

So you want

[[3 [[4 [[5 []]]]]]]

?

`many` and `maybe` seem to be the chief
culprits, but then their results are propagated back by `alt`s and
`then`s all the way back to the top.

Do I need to make some kind of out-of-band signal for these "zeros"
that I can filter out later? The obvious problem here is that the array
type is being used for too many things. But there's a paucity of
types in PostScript, sigh. For the JSON application, I have nametype
objects available that don't have a JSON corollary.

Do I need to rewrite all the combinators to filter out noise values at
every turn?.

It's odd to call something that you are returning (presuambly) as
noise. Are you using lists as a sort of Maybe monad with [] as Nothing?

I think you'd have to show the code to get anything more concrete as a
reply.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul Rubin@21:1/5 to luserdroog on Sun Nov 14 15:11:34 2021

luserdroog <mijoryx@yahoo.com> writes:

Is there a systematic way to discard the extra noise that can occur
when using parser combinators? For example, the `many` combinator
which matches zero or more instances of its argument parser.
In the case of zero matches, it still needs to return a value.

I'd expect 'many' to return a list, an empty list in the case of zero
matches. What is the extra noise? Your PostScript example is
confusing. I'd expect ( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse to give
something like [3, [4, [5]]], using square brackets to denote lists.

I didn't know parser combinators were even a thing in PostScript: or are
you trying to implement them? You could look at the Parsec paper to see
how they traditionally worked:

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/parsec-paper-letter.pdf

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From luserdroog@21:1/5 to Ben Bacarisse on Sun Nov 14 19:27:58 2021

On Sunday, November 14, 2021 at 10:01:16 AM UTC-6, Ben Bacarisse wrote:

luserdroog <mij...@yahoo.com> writes:

Is there a systematic way to discard the extra noise that can occur
when using parser combinators? For example, the `many` combinator
which matches zero or more instances of its argument parser.
In the case of zero matches, it still needs to return a value.

In my situation, I'm simulating everything in PostScript because it's
my favorite language. I'm simulating Lisp cons cells as 2-element
arrays. So for this JSON string,

( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse report

if I make no special effort, I get a resulting value that looks like this:

OK
[[3 [[4 [[5 [[] []]] []]] []]] []]
remainder:[]

All those little empty arrays need to just go away, but not any of the important array structure.

So you want

[[3 [[4 [[5 []]]]]]]

?

I guess that's the big problem here. I'm not sure what I want. I keep having
to add extra code to clean up and delete the extra stuff. Ultimately the
result should be

[ 3 [ 4 [ 5 ] ] ]

The parser for arrays looks for the left bracket, then ...

/Jarray //begin-array
//value executeonly xthen
//value-separator //value executeonly xthen many then %{ps flatten ps} using
maybe
//end-array thenx
{ %filter-zeros first %ps
} using def

The `executeonly` are in there to prevent infinite recursion if the expanded code ever gets printed (like in a stack dump while debugging). The /Jarray parser is one of the components of the //value parser.

Hmm. Initially I had the `then` combinator doing a Lisp-style (append) operation on my simulated lists, so something like

(a) char (b) char then

would -- if matched by the input -- return

[ (a) [ (b) [] ] ]

which I could then easily massage into

[ (a) (b) ]

But that led me into problems when I wanted to use the combinators
`xthen` and `thenx` which discard one of the two pieces. If the results
are just appended together in a list, then I've lost the information to
peel them back apart. So I changed `then` to just (cons) the pieces
together, and now `xthen` and `thenx` have an easy job.

And extra noise values pop up if I use combinators like `maybe`
and `many` which might succeed with zero matches. So, ...

`many` and `maybe` seem to be the chief
culprits, but then their results are propagated back by `alt`s and
`then`s all the way back to the top.

Do I need to make some kind of out-of-band signal for these "zeros"
that I can filter out later? The obvious problem here is that the array type is being used for too many things. But there's a paucity of
types in PostScript, sigh. For the JSON application, I have nametype objects available that don't have a JSON corollary.

Do I need to rewrite all the combinators to filter out noise values at every turn?.

It's odd to call something that you are returning (presuambly) as
noise. Are you using lists as a sort of Maybe monad with [] as Nothing?

Yes, I think that's what I'm doing, clumsily.

I think you'd have to show the code to get anything more concrete as a
reply.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From luserdroog@21:1/5 to Paul Rubin on Sun Nov 14 19:29:54 2021

On Sunday, November 14, 2021 at 5:11:37 PM UTC-6, Paul Rubin wrote:

luserdroog <mij...@yahoo.com> writes:

Is there a systematic way to discard the extra noise that can occur
when using parser combinators? For example, the `many` combinator
which matches zero or more instances of its argument parser.
In the case of zero matches, it still needs to return a value.

I'd expect 'many' to return a list, an empty list in the case of zero matches. What is the extra noise? Your PostScript example is
confusing. I'd expect ( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse to give
something like [3, [4, [5]]], using square brackets to denote lists.

I didn't know parser combinators were even a thing in PostScript: or are
you trying to implement them? You could look at the Parsec paper to see
how they traditionally worked:

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/parsec-paper-letter.pdf

It's something I've been trying to do in PostScript for a while now.
A lot of the saga is detailed in comp.lang.postscript.
Code is at github.com/luser-dr00g/pcomb/ps

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Ben Bacarisse@21:1/5 to luserdroog on Mon Nov 15 10:45:10 2021

luserdroog <mijoryx@yahoo.com> writes:

On Sunday, November 14, 2021 at 10:01:16 AM UTC-6, Ben Bacarisse wrote:

luserdroog <mij...@yahoo.com> writes:

Is there a systematic way to discard the extra noise that can occur
when using parser combinators? For example, the `many` combinator
which matches zero or more instances of its argument parser.
In the case of zero matches, it still needs to return a value.

In my situation, I'm simulating everything in PostScript because it's
my favorite language. I'm simulating Lisp cons cells as 2-element
arrays. So for this JSON string,

( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse report

if I make no special effort, I get a resulting value that looks like this: >> >
OK
[[3 [[4 [[5 [[] []]] []]] []]] []]
remainder:[]

All those little empty arrays need to just go away, but not any of the
important array structure.

So you want

[[3 [[4 [[5 []]]]]]]

?

I guess that's the big problem here. I'm not sure what I want. I keep having to add extra code to clean up and delete the extra stuff. Ultimately the result should be

[ 3 [ 4 [ 5 ] ] ]

The parser for arrays looks for the left bracket, then ...

/Jarray //begin-array
//value executeonly xthen
//value-separator //value executeonly xthen many then %{ps flatten ps} using
maybe
//end-array thenx
{ %filter-zeros first %ps
} using def

The `executeonly` are in there to prevent infinite recursion if the expanded code ever gets printed (like in a stack dump while debugging). The /Jarray parser is one of the components of the //value parser.

Hmm. Initially I had the `then` combinator doing a Lisp-style (append) operation on my simulated lists, so something like

(a) char (b) char then

would -- if matched by the input -- return

[ (a) [ (b) [] ] ]

which I could then easily massage into

[ (a) (b) ]

I think you need to pin down what you want. You made a remark about
using two-element arrays as cons cells. In that case, parsing (a) and
(b) in sequence /should/ give [ (a) [ (b) [] ] ]. Massaging that into something else seems like the wrong strategy.

--
Ben.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From luserdroog@21:1/5 to Ben Bacarisse on Mon Nov 15 14:13:30 2021

On Monday, November 15, 2021 at 4:45:12 AM UTC-6, Ben Bacarisse wrote:

luserdroog <mij...@yahoo.com> writes:

On Sunday, November 14, 2021 at 10:01:16 AM UTC-6, Ben Bacarisse wrote:

luserdroog <mij...@yahoo.com> writes:

Is there a systematic way to discard the extra noise that can occur
when using parser combinators? For example, the `many` combinator
which matches zero or more instances of its argument parser.
In the case of zero matches, it still needs to return a value.

In my situation, I'm simulating everything in PostScript because it's
my favorite language. I'm simulating Lisp cons cells as 2-element
arrays. So for this JSON string,

( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse report

if I make no special effort, I get a resulting value that looks like this:

OK
[[3 [[4 [[5 [[] []]] []]] []]] []]
remainder:[]

All those little empty arrays need to just go away, but not any of the >> > important array structure.

So you want

[[3 [[4 [[5 []]]]]]]

?

I guess that's the big problem here. I'm not sure what I want. I keep having
to add extra code to clean up and delete the extra stuff. Ultimately the result should be

[ 3 [ 4 [ 5 ] ] ]

The parser for arrays looks for the left bracket, then ...

/Jarray //begin-array
//value executeonly xthen
//value-separator //value executeonly xthen many then %{ps flatten ps} using
maybe
//end-array thenx
{ %filter-zeros first %ps
} using def

The `executeonly` are in there to prevent infinite recursion if the expanded
code ever gets printed (like in a stack dump while debugging). The /Jarray parser is one of the components of the //value parser.

Hmm. Initially I had the `then` combinator doing a Lisp-style (append) operation on my simulated lists, so something like

(a) char (b) char then

would -- if matched by the input -- return

[ (a) [ (b) [] ] ]

which I could then easily massage into

[ (a) (b) ]

I think you need to pin down what you want. You made a remark about
using two-element arrays as cons cells. In that case, parsing (a) and
(b) in sequence /should/ give [ (a) [ (b) [] ] ]. Massaging that into something else seems like the wrong strategy.

Yes, I think I may have presented an X/Y problem or started the story
from the middle. In all of my test cases for this version (regexs,
Postscript scanner, JSON parser) I had to write a `fix` function
to convert these lists into arrays that I can work with more simply.

But with each test case, I've had to re-write the `fix`. And it keeps
running into more problems. The final one that "broke the camel's back"
and sent me searching for the deeper design flaw was this:

( [ {"a":4,"b":5}, 6, {"c":"7"}] ) dup JSON-parse report

which yields this:

OK
[[<< /a 4 /b 5 >> [6 << /c (7) >>]] []]
remainder:[]

I can remove the trailing [] easily. But the extra array in the middle
grouping the 6 and the dictionary, I don't know where to patch in
to grammar to remove that one. So that seems to show that I'm
doing something wrong deeper down in the organization of the
program.

So, it appears I need to back up and rebuild the pieces more slowly,
making sure that the regex example doesn't need any kind of `fix`
before moving on to the more complicated ones.

On a similar front, I can rewrite `xthen` and `thenx` to do their
jobs without composing off of `then`. That way I don't need to
worry about the result of `then` needing to be taken apart again into
2 pieces.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From luserdroog@21:1/5 to luserdroog on Sat Nov 20 16:49:39 2021

On Monday, November 15, 2021 at 4:13:31 PM UTC-6, luserdroog wrote:

On Monday, November 15, 2021 at 4:45:12 AM UTC-6, Ben Bacarisse wrote:

luserdroog <mij...@yahoo.com> writes:

On Sunday, November 14, 2021 at 10:01:16 AM UTC-6, Ben Bacarisse wrote:

luserdroog <mij...@yahoo.com> writes:

Is there a systematic way to discard the extra noise that can occur
when using parser combinators? For example, the `many` combinator
which matches zero or more instances of its argument parser.
In the case of zero matches, it still needs to return a value.

In my situation, I'm simulating everything in PostScript because it's >> > my favorite language. I'm simulating Lisp cons cells as 2-element
arrays. So for this JSON string,

( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse report

if I make no special effort, I get a resulting value that looks like this:

OK
[[3 [[4 [[5 [[] []]] []]] []]] []]
remainder:[]

All those little empty arrays need to just go away, but not any of the >> > important array structure.

So you want

[[3 [[4 [[5 []]]]]]]

?

I guess that's the big problem here. I'm not sure what I want. I keep having
to add extra code to clean up and delete the extra stuff. Ultimately the result should be

[ 3 [ 4 [ 5 ] ] ]

The parser for arrays looks for the left bracket, then ...

/Jarray //begin-array
//value executeonly xthen
//value-separator //value executeonly xthen many then %{ps flatten ps} using
maybe
//end-array thenx
{ %filter-zeros first %ps
} using def

The `executeonly` are in there to prevent infinite recursion if the expanded
code ever gets printed (like in a stack dump while debugging). The /Jarray
parser is one of the components of the //value parser.

Hmm. Initially I had the `then` combinator doing a Lisp-style (append) operation on my simulated lists, so something like

(a) char (b) char then

would -- if matched by the input -- return

[ (a) [ (b) [] ] ]

which I could then easily massage into

[ (a) (b) ]

I think you need to pin down what you want. You made a remark about
using two-element arrays as cons cells. In that case, parsing (a) and
(b) in sequence /should/ give [ (a) [ (b) [] ] ]. Massaging that into something else seems like the wrong strategy.

Yes, I think I may have presented an X/Y problem or started the story
from the middle. In all of my test cases for this version (regexs,
Postscript scanner, JSON parser) I had to write a `fix` function
to convert these lists into arrays that I can work with more simply.

[snip]

Ugh. I think it wasn't even an X/Y problem. It was a "doctor it hurts when
I move my arm like this; ... so don't move your arm like that" problem.

I want the "result" part of the "reply" structure (using new terms following usage from the Parsec document) to be any of the /usual/ PostScript types: integer, real, string, boolean, array, dictionary.

But I also need some way to arbitrarily combine or concatenate two
objects regardless of type. My `then` (aka `seq`) combinator needs to
do this. So I made a hack-y function that does the combining. If it has
two arrays, it composes the contents into a longer array. If it has one
array and some other object it extends the array by one and stuffs
the object in the front or back as appropriate. If it has two non-array
objects it makes a new 2-element array to contain them.

So, instead of building `xthen` and `thenx` off of `then` and needing
to cons, car, and cdr the stuff, I can write all 3 of these as a more
general parameterized function.

sequence{ p q u }{
{ /p exec +is-ok {
next x-xs force /q exec +is-ok {
next x-xs 3 1 roll /u exec exch consok
}{
x-xs 3 2 roll ( after ) exch cons exch cons cons
} ifelse
} if } ll } @func
then { {append} sequence }
xthen { {exch pop} sequence }
thenx { {pop} sequence }

append { 1 index zero eq { exch pop }{
dup zero eq { pop }{
1 index type /arraytype eq {
dup type /arraytype eq { compose }{ one compose } ifelse
}{ dup type /arraytype eq { curry }{ cons } ifelse } ifelse } ifelse } ifelse }

(`@func` is my own non-standard extension to PostScript that takes
a procedure body and list of parameters and wraps the procedure with
code that defines the arguments in a local dictionary. `ll` is my hack-y PostScript way of making lambdas with hard-patched parameters, it's
short for `load all literals.`)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From luserdroog@21:1/5 to luserdroog on Sat Nov 20 18:17:23 2021

On Saturday, November 13, 2021 at 10:54:09 PM UTC-6, luserdroog wrote:

Is there a systematic way to discard the extra noise that can occur
when using parser combinators? For example, the `many` combinator
which matches zero or more instances of its argument parser.
In the case of zero matches, it still needs to return a value.

[snip]
Sigh. I already had this same problem before. It came up when I googled it. [sad trombone]

https://stackoverflow.com/q/55346600/733077

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From luserdroog@21:1/5 to luserdroog on Sat Nov 20 19:55:26 2021

On Sunday, November 14, 2021 at 9:29:55 PM UTC-6, luserdroog wrote:

On Sunday, November 14, 2021 at 5:11:37 PM UTC-6, Paul Rubin wrote:

luserdroog <mij...@yahoo.com> writes:

Is there a systematic way to discard the extra noise that can occur
when using parser combinators? For example, the `many` combinator
which matches zero or more instances of its argument parser.
In the case of zero matches, it still needs to return a value.

I'd expect 'many' to return a list, an empty list in the case of zero matches. What is the extra noise? Your PostScript example is
confusing. I'd expect ( [ 3, [ 4, [ 5 ] ] ] ) JSON-parse to give
something like [3, [4, [5]]], using square brackets to denote lists.

I didn't know parser combinators were even a thing in PostScript: or are you trying to implement them? You could look at the Parsec paper to see
how they traditionally worked:

https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/parsec-paper-letter.pdf

It's something I've been trying to do in PostScript for a while now.
A lot of the saga is detailed in comp.lang.postscript.
Code is at github.com/luser-dr00g/pcomb/ps

Ooops. Bad link. Here it at:

https://github.com/luser-dr00g/pcomb/tree/master/ps

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From luserdroog@21:1/5 to luserdroog on Fri Dec 3 21:21:48 2021

On Saturday, November 20, 2021 at 8:17:24 PM UTC-6, luserdroog wrote:

On Saturday, November 13, 2021 at 10:54:09 PM UTC-6, luserdroog wrote:

Is there a systematic way to discard the extra noise that can occur
when using parser combinators? For example, the `many` combinator
which matches zero or more instances of its argument parser.
In the case of zero matches, it still needs to return a value.

[snip]
Sigh. I already had this same problem before. It came up when I googled it. [sad trombone]

https://stackoverflow.com/q/55346600/733077

Coda: The rewrite is proceeding well. I got the basic parsers typed up fresh and simplified from the previous round. And I got the regular expression
parser (test case) written with less gyrations than before. And I got the PostScript
scanner (test case) written but there I did end up needing a `fix` function.

Precisely because of my earlier decision that the result could be any type
or an array of any number of any type, when processing this result ... I do kinda need to know whether the thing is an array or not and handle them differently.

So, it's the same but different, you know? I need to call `fix` but it makes sense why I have to do it, and importantly *where* it will need to be done
to get the thing to work right.

I suppose the moral is, in this situation I can follow the Lisp stuff a
little less literally. And possibly more broadly, as I've lamented in both comp.lang.c and comp.lang.postscript.

Every few rewrites I try to clean up the lazy evaluation and be more general and strategic with it. But every time I just end up sprinkling in calls to `force`
until it works right.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	88:11:17
Calls:	6,658
Files:	12,203
Messages:	5,333,954

Discard extraneous structure returned by parser combinators?

Who's Online

System Info