C scanner
From
luser droog@21:1/5 to
All on Mon Dec 13 15:20:58 2021
Sticking with PS version 12 of the parser combinators, I finished the
usual 3 examples (regex, PS scanner, JSON parser) and they seemed
pretty good and concise. So I translated my C scanner over from
the C version 9. It looks pretty good to me. Especially the helper
function `tokendef` which makes the parser add a tag to the return value.
Wrapping a lazy-input function around another lazy-input functions is
just weird. It seems to work when I run it stepwise in my head, but it
still looks weird the way it's written. It makes more sense when you
look at how `lazy-input` builds the function. But that part isn't new so
I won't include it here.
The big idea is at the bottom. Calling `token-input` with a string-input
and 2 zeros gives you a lazy stream of tagged token structures.
Calling `string-input` needs its own 2 zeros. So there's a lot of zeros
to put 'em together.
%errordict/typecheck{ps pe quit}put
(pc12.ps)run {
tokendef{ 1 index cvlit { exch cons one } curry using def }
cvsstr{ dup length string cvs }
strcat{ 2 copy length exch length add string % a b s
3 2 roll 2 copy 0 exch putinterval % b s a
length 3 2 roll 3 copy putinterval pop pop }
prefix{ exch strcat cvn }
} pairs-begin
/keywords {
int char
float double struct
auto extern
register static
goto return sizeof
break continue
if else
for do while
switch case default
} cvlit def
keywords { cvsstr dup (k_) prefix exch str tokendef } forall
/keyword-names keywords { cvsstr (k_) prefix } map def
/symbols {
star (*) plusplus (++) plus (+) dot (.)
arrow (->) minusminus (--) minus (-)
bangeq (!=) bang (!) tilde (~)
ampamp (&&) amp (&) eqeq (==) equal (=)
caret (^) pipepipe (||) pipe (|)
slant (/) percent (%)
ltlt (<<) lteq (<=) less (<)
gtgt (>>) gteq (>=) greater (>)
lparen (\() rparen (\))
comma (,) semi (;) colon (:) quest (?)
lbrace ({) rbrace (}) lbrack ([) rbrack (])
} cvlit def
symbols 2 { aload pop str tokendef } fortuple
/symbol-names [ symbols 2 { first } fortuple ] def
/assignops {
pluseq (+=) minuseq (-=)
stareq (*=) slanteq (/=) percenteq (%=)
gtgteq (>>=) ltlteq (<<=)
ampeq (&=) careteq (^=) pipeeq (|=)
} cvlit def
assignops 2 { aload pop str tokendef } fortuple
/comment (/*) str (*) noneof many (*) char then some then (/) then def /space ( \t\n) anyof //comment alt many def
/alpha_ (a)(z)range (A)(Z)range alt (_)char alt def
/digit (0)(9)range def
/identifier //alpha_ //alpha_ //digit alt many then tokendef
/integer //digit some tokendef
/floating //digit some (.) char then //digit many then
(.) char //digit some then alt
(eE) anyof (+-) anyof maybe then //digit some then maybe then tokendef
/escape (\\) char
//digit //digit maybe then //digit maybe then
('"bnrt\\) anyof alt then def
/char_ //escape ('\n) noneof alt def
/schar_ //escape ("\n) noneof alt def
/character (') char //char_ then (') char then tokendef
/astring (") char //schar_ many then (") char then tokendef
/constant //floating //integer alt //character alt //astring alt tokendef
/symbolic [ keyword-names {load} forall
symbol-names {load} forall
assignops 2{first load} fortuple
counttomark 1 sub {alt} repeat exch pop def
/ctoken //space //constant //symbolic alt //identifier alt xthen def /token-input{r c in}
{ in dup //ctoken exec +not-ok { true }{ exch pop second xs-x false } ifelse }
{ 4 3 roll } % xs [x[r c]] r' c' -> [x[r c]] r' c' xs
{ token-input } lazy-input def
0 0 ( aname another) string-input //ctoken exec report
0 0 ( ++ / * ) string-input //ctoken exec report
0 0 ( 37,x,y ) string-input //ctoken exec report
0 0 0 0 ( 37,x,y{12+q;} ) string-input token-input
dup first ==
next dup first ==
next dup first ==
next dup first ==
next dup first ==
next dup first ==
pc
quit
$ gsnd -q -dNOSAFER pc12ctok.ps
OK
[[/identifier [(a) (n) (a) (m) (e)]]]
remainder:[[( ) [0 6]] {0 7 (another) string-input}]
OK
[[/plusplus [(+) (+)]]]
remainder:{0 3 ( / * ) string-input}
OK
[[/constant [[/integer [(3) (7)]]]]]
remainder:[[(,) [0 3]] {0 4 (x,y ) string-input}]
[[[/constant [[/integer [(3) (7)]]]]] [0 0]]
[[[/comma (,)]] [0 1]]
[[[/identifier (x)]] [0 2]]
[[[/comma (,)]] [0 3]]
[[[/identifier (y)]] [0 4]]
[[[/lbrace ({)]] [0 5]]
stack:
[[[[/lbrace ({)]] [0 5]] {0 6 {0 8 (12+q;} ) string-input} token-input}]
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)