I've been lately dealing with a parser generator for LL grammars, and since it's inception I've always been blindy assuming the first element read from within the input file is going to be the start symbol or starting rule.
So I've been wondering all this time, just out of curiosity, if there exists a
method or algorithm to find out the start symbol of a given grammar?
While there would exist grammars we could recursively check to find out which it's start symbol is (i.e.: it's the only rule that used the rest of them, where checking every other resulted in dangling rules that weren't even called
in), there might be other grammars for which more than one rule yields full coverage (all of these obviously defining different languages) and so leading to ambiguity.
I only contemplate a simple coverage test, even though other techniques could exist, again, all of them leading to a point where we couldn't ascertain if one or the other is what the user meant.
So I'm wondering if this is even an issue in production-grade parser-generators out there?
Hey guys,
I've been lately dealing with a parser generator for LL grammars, and since it's inception I've always been blindy assuming the first element read from within the input file is going to be the start symbol or starting rule.
So I've been wondering all this time, just out of curiosity, if there exists a
method or algorithm to find out the start symbol of a given grammar?
I guess the answer is no.
While there would exist grammars we could recursively check to find out which it's start symbol is (i.e.: it's the only rule that used the rest of them, where checking every other resulted in dangling rules that weren't even called
in), there might be other grammars for which more than one rule yields full coverage (all of these obviously defining different languages) and so leading to ambiguity.
I only contemplate a simple coverage test, even though other techniques could exist, again, all of them leading to a point where we couldn't ascertain if one or the other is what the user meant.
While there would exist grammars we could recursively check to find out which it's start symbol is (i.e.: it's the only rule that used the rest of them, where checking every other resulted in dangling rules that weren't even called
in), there might be other grammars for which more than one rule yields full coverage (all of these obviously defining different languages) and so leading to ambiguity.
Surely, the start symbol of a context-free grammar is one which appears
only in the left hand side of a rule. If there is such a unique symbol,
it must be /the/ start symbol.
On the face of it, this problem does not smell of undecidability, or
even NP completeness. Where do you suspect is the difficulty?
Whether rules are dangling is also a graph property: is the graph
connected.
As Dodi noted, this is basically a graph analysis problem and the
graph may be disconnected (a forest). And our moderator has added
several insightful comments. E.g. you can "declare" a start symbol
and if not present default to some symbol, either the first one in the grammar, or some symbol from which all other symbols are reachable
(presuming the graph isn't disconnected), and the start symbol can be recursively defined, etc.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 293 |
Nodes: | 16 (2 / 14) |
Uptime: | 242:06:45 |
Calls: | 6,624 |
Files: | 12,175 |
Messages: | 5,320,145 |