There are many data formats which contain things like this:
A number, N
N occurrences of something
For example, 3 followed by the names of three students:
3
John Doe
Sally Smith
Judy Jones
I have a question about parsing such data. Is it the job of a parser to ensure
that the number of student names matches the number? Or, is it the job of the
parser to merely tokenize whatever is in the input and then create an abstract
syntax tree containing the tokens?
There are many data formats which contain things like this:
A number, N
N occurrences of something
For example, 3 followed by the names of three students:
3
John Doe
Sally Smith
Judy Jones
I have a question about parsing such data. Is it the job of a parser to ensure >that the number of student names matches the number? Or, is it the job of the >parser to merely tokenize whatever is in the input and then create an abstract >syntax tree containing the tokens?
I imagine you will tell me, "it depends". But what is typically the case?
It is almost always done in the AST creation routines, not only do you
as our insightful moderator mentioned generally get better error
messages that way, but curiously, the features of extract a number,
turn it into a count, and apply that count (and yes those might be 3
distinct operations) to be how many items a list involves has not been implemented in any parser generator or lexer generator that I have
ever seen. That's a bizarre omission, particularly since it is a
common feature in many languages like networking protocols. Doing
fixed counts isn't rare, but doing a count held in a "register" or
"variable" seems to not be done.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 295 |
Nodes: | 16 (2 / 14) |
Uptime: | 20:16:49 |
Calls: | 6,640 |
Files: | 12,188 |
Messages: | 5,325,290 |