I've got my project successfully parsing the circa-1975 C syntax
from that old manual. I'd like to add parsers for K&R1 and c90
syntaxes.
How separate should these be? Should they be complete
separate grammars, or more piecewise selection?
https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c
I've got my project successfully parsing the circa-1975 C syntax
from that old manual. I'd like to add parsers for K&R1 and c90
syntaxes.
How separate should these be? Should they be complete
separate grammars, or more piecewise selection?
My feeling is that separating them will be less headache, but maybe
there's some advantage to changing out smaller pieces of the grammar
in that it might be easier to make sure that they produce the same
structure compatible with the backend.
Any guidance in this area?
https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c
I've got my project successfully parsing the circa-1975 C syntax
from that old manual. I'd like to add parsers for K&R1 and c90
syntaxes.
How separate should these be? Should they be complete
separate grammars, or more piecewise selection? ...
Am Donnerstag, 13. August 2020 00:32:56 UTC+2 schrieb luser droog:
I've got my project successfully parsing the circa-1975 C syntax
from that old manual. I'd like to add parsers for K&R1 and c90
syntaxes.
How separate should these be? Should they be complete
separate grammars, or more piecewise selection? ...
Why not settle for one master dialect and use awk to translate between dialects?
[Probably because there is a great deal of C code written to comply with
the various versions of the standard, users want error messages that match the code they wrote rather than some intermediate code, and it would be quite an awk program that could reconcile all the differences among C flavors. -John]
Am 13.08.2020 um 00:20 schrieb luser droog:
I've got my project successfully parsing the circa-1975 C syntax
from that old manual. I'd like to add parsers for K&R1 and c90
syntaxes.
How separate should these be? Should they be complete
separate grammars, or more piecewise selection?
IMO this depends widely on the usage of the parser output (diagnostics, backend...). C90 is much stricter than K&R, requires more checks. Do you
need extensive error diagnostics, or do you assume that all source code
is free of errors?
https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c
You seem to implement an LL(1) parser? My C98 Parser is LL(2), i.e. an
LL(1) parser with one or two locations where more lookahead is required.
Also identifiers are classified as typenames and others prior to their
usage.
For real-world testing (recommended!) a preprocessor is required and a
copy of the standard libraries of existing compiler(s).
Your test_syntax() source misses "=" from the variable declarations (initializers). What about pointer syntax/semantics? If you add these
(and other) syntax differences conditionally (version specific) to your
code, which way would look better to you? Which way will be safer to maintain?
Nice code BTW :-)
It may be useful to consider what you would like to happen if you encounter a syntax that is ambiguous or works differently or is for another expected syntax from what you are parsing: produce a warning, error or handle quietly, or fall over, or don’t care.
My friend, reporting the furthest position examined by the parser I have [found]
useful in error cases as a simple stop gap when using a combinator approach.
Thinking about it you kind of want to see the furthest failed position and the
stack of rules above it. Such requires meta information when the code is written in the most natural way. For this reason and others I believe it is good to represent your grammar in data structures which is further in the direction of a compiler compiler tool (or compiler interpreter tool).
I've got my project successfully parsing the circa-1975 C syntax
from that old manual. I'd like to add parsers for K&R1 and c90
syntaxes.
On Sunday, August 16, 2020 at 10:53:24 AM UTC-5, davidl...@gmail.com wrote:
My friend, reporting the furthest position examined by the parser I have [found]
useful in error cases as a simple stop gap when using a combinator approach.
Thinking about it you kind of want to see the furthest failed position and the
stack of rules above it. Such requires meta information when the code is written in the most natural way. For this reason and others I believe it is good to represent your grammar in data structures which is further in the direction of a compiler compiler tool (or compiler interpreter tool).
Thanks. I've done some further investigating. I built my parsers following two papers. Hutton and Meijer, Monadic Parser Combinators
https://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf
and Hutton, Higher-Order Functions for Parsing
https://pdfs.semanticscholar.org/6669/f223fba59edaeed7fabe02b667809a5744d9.pdf
The first adds error reporting using Monad Transformers. [...]>
But the second paper does it differently, and maybe something I can do
more easily. It redefines the parsers to no longer produce a list of results, so there's no longer support for ambiguity. Then it defines them to
return a Maybe,
maybe * ::= Fail [char] | Error [char] | OK *
.
where the OK branch has the parse tree, and Fail or Error both contain an error
message. It describes how a Fail can be transformed into an Error. But it isn't
entirely clear where the messages get injected.
Still need to do some thinking on it, but I think I can rewrite the parsers to follow this model, and then decorate my grammar with possible errors
at each node.
Thanks to everyone for the help, esp. Kaz with the brilliant suggestion
to pass a language id token between tokenizer and parser.
Ps. the prototype is written in PostScript extended with function syntax. https://github.com/luser-dr00g/pcomb/blob/master/ps/pc11.ps https://codereview.stackexchange.com/questions/193520/an-enhanced-syntax-for-defining-functions-in-postscript
--
l droog
[Why Postscript? I realize it's Turing complete, but it seems odd to run ones parser on a printer. -John]
Am 13.08.2020 um 00:20 schrieb luser droog:
I've got my project successfully parsing the circa-1975 C syntax
from that old manual. I'd like to add parsers for K&R1 and c90
syntaxes.
How separate should these be? Should they be complete
separate grammars, or more piecewise selection?
IMO this depends widely on the usage of the parser output (diagnostics, backend...). C90 is much stricter than K&R, requires more checks. Do you
need extensive error diagnostics, or do you assume that all source code
is free of errors?
https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c
You seem to implement an LL(1) parser? My C98 Parser is LL(2), i.e. an
LL(1) parser with one or two locations where more lookahead is required.
But the language itself I just really enjoy. It's my "Lego blocks"
language. The RPN syntax removes all ambiguity about precedence and sequencing.
[Why Postscript? I realize it's Turing complete, but it seems odd to run ones parser on a printer. -John]
I discovered PostScript around '97 or '98. I was taking Computer Graphics
and it was in an Appendix to the textbook (Salman). At the same time
I was editor of the Honors College student magazine so it really piqued
my interest as a graphics and typography language. ...
[Take a look at Forth. Many of the same advantages, runs a lot more places. -John]
luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email.dmarc.email> schrieb:
[PostScript]
But the language itself I just really enjoy. It's my "Lego blocks" language. The RPN syntax removes all ambiguity about precedence and sequencing.
I recently had the doubtful pleasure of evaluating the formula
x = ((a-b)*c^2+(-d^2+e^2-a^2+b^2)*c+a^2*b+(f^2-e^2-b^2)*a
+(-f^2+d^2)*b)/((-2*d+2*e)*c+(2*f-2*e)*a-2*b*(f-d))
in Postscript. (Yes, really. Don't ask.)
{ 0 0 1 5 { add } for } stepon traceon debug%|-
quit
I've got my project successfully parsing the circa-1975 C syntax
from that old manual. I'd like to add parsers for K&R1 and c90
syntaxes.
How separate should these be? Should they be complete
separate grammars, or more piecewise selection?
My feeling is that separating them will be less headache, but maybe
there's some advantage to changing out smaller pieces of the grammar
in that it might be easier to make sure that they produce the same
structure compatible with the backend.
Any guidance in this area?
https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c
[Really, it's up to you. My inclination would be to make them
separate but use some sort of macro setup so you can insert
common pieces into each of the grammars. -John]
My impression is that variation in Pascal dialects is larger than in C dialects, so case for unified parser in C IMHO
On Thu, 11 Feb 2021, antispam@math.uni.wroc.pl wrote:
My impression is that variation in Pascal dialects is larger than in C dialects, so case for unified parser in C IMHO
Pascal is more fragmented, but it's also much easier to parse than C. I think it's a wash.
(I also think the whole idea is horrifying and ought not to be pursued;
but.)
Elijah Stone <elr...@elronnd.net> wrote:
I did a C parser, it was not hard at all. I in C (like in standard
Pascal) there are conflicts, but that conflicts can be resolved
easily using semantic info. Alternativly, for C one can use 2
token lookahead.
I've got my project successfully parsing the circa-1975 C syntax
from that old manual. I'd like to add parsers for K&R1 and c90
syntaxes.
How separate should these be? Should they be complete
separate grammars, or more piecewise selection?
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (2 / 14) |
Uptime: | 61:54:27 |
Calls: | 6,654 |
Files: | 12,200 |
Messages: | 5,331,620 |