On Monday, November 8, 2021 at 11:49:30 AM UTC-6, luser droog wrote:
On Sunday, November 7, 2021 at 6:40:43 PM UTC-6, luser droog wrote:
Here's a rough draft of the grammar for a PS tokenizer using my new functions.
A little more fleshed out, formatted, and slightly tested. I've been having
to futz around with the innards of several parsers like `then` and `many`
to get `xthen` and `thenx` to work reliably. I had been using `append` to combine the results of two sequential parsers, and `append` works like
a Lisp list append; ie. it scans to the end of the cdr chain and then replaces the last null with the new element. That all works for the most part.
It fails when you try to do fancy stuff like `xthen` and `thenx`. These
are sequencing combinators like `then` which runs one parser and
then the other on the remainder from the first. But `xthen` has the
extra trick of discarding the result of the first parser, and `thenx`
discards the result of the second parser.
These are great for discarding stuff during the parse. Like when
processing escape codes, some simple ones like (\\) (\() (\)) are
completely handled by simply discarding the first slash. And all
the escape handling is simplified by just doing that wholesale
in all cases.
But if you're appending results into a long list, then you've lost
the <first> vs. <second> structure! The obvious solution for that
was to replace the calls to `append` with calls to `cons` which
just groups the two parts into a 2-element array: easy to grab
the two pieces out later.
Doing that caused a bug that took a while to track down. It caused
a problem in the handlers, all the procedures composed into the
parsers with `using`. In all of them I was calling a function called
`flatten` that only knew how to deal with 1-D Lisp lists. So it went
wild with a weird non-list cons structure.
So now it all works by using a more powerful function called
`unwrap` which can tease apart whatever weird cons tangle
is thrown at it. But you can't see any of this here; it's all inside
the `fix` function from pc11a.ps.
With this fix, I only just now got hex strings to appear to work,
discarding non hex characters it finds. Still need to interpret
the hex characters and do some handling for procedures.
And e notation.
Then a further challenge if I actually want to emulate the
`token` operator. I'll need to reliably recreate the remainder substring.
This string may or may not be reliably tucked into the lazy
remainder list still in string form. So some fiddly business may
be needed to reconstruct this string.
%errordict/typecheck{pq}put
(pc11a.ps)run <<
/interpret-octal { 0 exch { first 48 sub exch 8 mul add } forall }
/to-char { 1 string dup 0 4 3 roll put }
begin
/delimiters ( \t\n()/%[]<>{}) def
/delimiter delimiters anyof def
/octal (0)(7) range def
/digit (0)(9) range def
/alpha (a)(z) range (A)(Z) range alt def
/regular delimiters noneof def
/rad-digit //digit //alpha alt def
/rad-integer //digit //digit maybe then (#) char then //rad-digit some then def
/number //digit some def
/opt-number //digit many def
/integer (+-) anyof maybe //number then def
/real (+-) anyof maybe
//number (.) char then //opt-number then
(.) char //number then alt then def
/name //regular some def
/ps-char {-777 exec} def
/escape (\\) char
(\\) char
(\() char alt
(\)) char alt
(n) char { pop (\n) one } using alt
(r) char { pop (\r) one } using alt
(t) char { pop (\t) one } using alt
(b) char { pop (\b) one } using alt
(f) char { pop (\f) one } using alt
//octal //octal maybe then //octal maybe then
{ fix interpret-octal to-char one } using alt
xthen def
/ps-string (\() char //ps-char executeonly many then (\)) char then def //ps-char 0 //escape
//ps-string alt
(()) noneof alt put
/hex-char //digit (a)(f) range (A)(F) range alt alt def
/non-hex-char //hex-char (>) char alt none def
/hex-string (<) char
//non-hex-char many //hex-char xthen many then //non-hex-char many thenx
(>) char then def
/spaces ( \t\n) anyof many def
/object {-777 exec} def
/ps-token //spaces //object executeonly xthen def
//object 0 //rad-integer { fix to-string cvi } using
//real { fix to-string cvr } using alt
//integer { fix to-string cvi } using alt
//name { fix to-string cvn cvx } using alt
(/) char //name then { fix to-string rest cvn cvlit } using alt
(/) char (/) char then //name then { fix to-string rest rest cvn load } using alt
//ps-string { fix to-string 1 1 index length 2 sub getinterval } using alt
//hex-string { fix 1 1 index length 2 sub getinterval } using alt
({) char //ps-token many then //spaces (}) char xthen then alt
//delimiter { fix to-string cvn cvx } using alt
put
/mytoken {
dup length 0 gt {
0 0 3 2 roll string-input //ps-token exec
}{ pop false } ifelse
} def
{
0 0 (47) string-input //integer exec pc
0 0 (47) string-input //number exec pc
0 0 (8#117) string-input
//digit //digit maybe then (#) char then //rad-digit some then exec pc
%quit
0 0 (8#117) string-input //rad-integer exec pc
0 0 (1.17) string-input //real exec pc
} pop
(8#117) mytoken pc
(47) mytoken pc
(string) mytoken pc
([stuff) mytoken pc
(/litname) mytoken pc
(42.42) mytoken pc
((a\\117 \\\\string\\n)) mytoken ps second first print clear
/thing 12 def
(//thing) mytoken pc
(<abc defg>) mytoken pc
quit
$ gsnd -dNOSAFER pc11atoken.ps
GPL Ghostscript 9.52 (2020-03-19)
Copyright (C) 2020 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
stack:
[/OK [79 []]]
:stack
stack:
[/OK [47 {0 2 () string-input}]]
:stack
stack:
[/OK [string []]]
:stack
stack:
[/OK [[ {0 1 (stuff) string-input}]]
:stack
stack:
[/OK [/litname []]]
:stack
stack:
[/OK [42.42 []]]
:stack
stack:
[/OK [(aO \\string\n) {0 18 () string-input}]]
:stack
aO \string
stack:
[/OK [12 []]]
:stack
stack:
[/OK [[(a) (b) (c) (d) (e) (f)] {0 10 () string-input}]]
:stack
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)