Question: Beyond the fact that bison doesn't like them, why are ambiguous >grammars usually bad?
My answer: Ambiguous grammars make it hard for users of the grammar to write >correct instances of the grammar. Stated another way, ambiguous grammars make >it easy to introduce errors into instances of the grammar.
Question: Opine about why languages are usually defined and implemented with >ambiguous grammars.
My answer: Writing a grammar that completely avoids ambiguities might result >in a grammar that is nightmarishly complex. It is easier/better to create a >simple grammar and then add rules/descriptions which explain and resolve the >ambiguities.
I am reading the (excellent) book "flex & bison" by John Levine. Chapter 7 talks about conflicts in grammars: shift/reduce and reduce/reduce conflicts.
At the end of the chapter are a few exercises/questions. I'd like to check with you on whether my answers to the questions are accurate and complete.
Question: Opine about why languages are usually defined and implemented with ambiguous grammars.
My answer: Writing a grammar that completely avoids ambiguities might result in a grammar that is nightmarishly complex. It is easier/better to create a simple grammar and then add rules/descriptions which explain and resolve the ambiguities.
As far as I know, the main cause of ambiguous grammar in programming languages
is the nested if-then-optional-else structure. If you require else, then it isn't ambiguous,
but people like the optional else. That usually comes out as a shift-reduce conflict,
and parser generators know how to handle that.
But you already have the reply from John Levine ...
[See previous message, where we fixed that with "fi" in 1968. -John]
Question: Opine about why languages are usually defined and implemented with ambiguous grammars.
On 2021-12-16, Roger L Costello wrote:
Question: Opine about why languages are usually defined and implemented withBut they aren't.
ambiguous grammars.
Languages are processed as a stream of characters or tokens, with hidden rules about how those relate together and the meaning that emerges.
All of the rules are hidden, including the entire grammar.
If you're only aware of some of the hidden rules, but not others, then
you see ambiguity.
But if you're only aware of some of the hidden rules, but not others,
then you are not working with the correct language.
For instance, I don't know of any mainstream language in which if/else
is actually ambiguous. They have a hidden rule like that the else goes
with the closest preceding if statement.
I also believe there is one more element at play: mathematics. People
study mathematics in school, and those who go on to do programming tend
to be ones who were more exposed to it or paid more attention.
People who are programmers actually had a first contact with formal
syntax in mathematics.
The conflation between syntax and semantics may ultimately come from
that place. Mathematicians design their notations deliberately, in such
ways that when they manipulate symbols, while observing certain rules,
they are actually preserving semantics. The notation directly enables semantically meaningful manipulation, as a tool of thought.
On Wednesday, December 29, 2021 at 2:28:34 PM UTC-8, Kaz Kylheku wrote:
(snip)
I also believe there is one more element at play: mathematics. People
study mathematics in school, and those who go on to do programming tend
to be ones who were more exposed to it or paid more attention.
This reminds me of learning associativity of exponentiation (**)
in Fortran IV (I believe it isn't in the Fortran 66 standard) before I learned it in algebra class. I suspect that there are others I learned
from programming before learning them in math class
People who are programmers actually had a first contact with formal
syntax in mathematics.
The conflation between syntax and semantics may ultimately come from
that place. Mathematicians design their notations deliberately, in such
ways that when they manipulate symbols, while observing certain rules,
they are actually preserving semantics. The notation directly enables
semantically meaningful manipulation, as a tool of thought.
I suspect that people learn some things in the first programming language that they learn, and then expect it to be the same in others. When it isn't, people get surprised or confused.
When I started with unix, I learned csh programming, and mostly
avoided sh (and successors). One reason for that is, as well as I
knew at the time, differences in the syntax and semantics of them.
[Fortran has always had ** exponentiation, starting with the original
version in 1956. It always bound tighter than +-*/ but wasn't
associative, A**B**C not allowed, -John]
On Wednesday, December 29, 2021 at 11:28:34 PM UTC+1, Kaz Kylheku wrote:
On 2021-12-16, Roger L Costello wrote:
Question: Opine about why languages are usually defined and implemented withBut they aren't.
ambiguous grammars.
Languages are processed as a stream of characters or tokens, with hidden
rules about how those relate together and the meaning that emerges.
All of the rules are hidden, including the entire grammar.
If you're only aware of some of the hidden rules, but not others, then
you see ambiguity.
But if you're only aware of some of the hidden rules, but not others,
then you are not working with the correct language.
For instance, I don't know of any mainstream language in which if/else
is actually ambiguous. They have a hidden rule like that the else goes
with the closest preceding if statement.
When designing a grammar and implementing a parser: the grammar can
either be unambiguous by design or unambiguous by accident. The
viewpoint that "there isn't any mainstream language in which if/else
is actually ambiguous" is actually the latter option: unambiguous by accident.
A primary reason why grammars in many mainstream languages (that don't
have a parser generated straight from a verified grammar) are
unambiguous isn't intentional design, but rather it is a consequence
of the fact that those parsers are directly implemented in a language
that is executing statements/expressions/instructions without
verifying consequences of the executions. Some examples of languages
with such execution properties: assembly language (such as: i386,
ARM), C, Haskell.
Contrary to the accidental approach, a parser
generator by design cares about consequences and it is verifying that
the specification actually is unambiguous despite the fact that in the
end the parser gets compiled down into machine instructions.
Verification means to search the whole search space (or at least a
reasonably large subspace of it) - but asm/C/Haskell will run a search
only if it is explicitly (step by step) forced by the programmer to
perform a search.
indeterminate. Early Fortran allowed the compiler to compile any mathematically equivalent, not just numerically equivalent, version
of an expression so A*B+A*C could turn into A*(B+C) which was great
for optimizing, not so much for predictable results. -John]
When I started programming from nothing, I saw BASIC examples in a
book which was doing things like:
10 X = 2
20 X = X + 1
The only language with formulas that I was coming from was math.
So, I thought, what? How can X be equal to X + 1; you cannot solve
this absurdity!
From then I knew that the people who program computers to understand
symbols are free thinkers who make them mean anything they want.
This reminds me of learning associativity of exponentiation (**)
in Fortran IV (I believe it isn't in the Fortran 66 standard) before I learned it in algebra class. I suspect that there are others I learned
from programming before learning them in math class
[Fortran has always had ** exponentiation, starting with the original
version in 1956. It always bound tighter than +-*/ but wasn't
associative, A**B**C not allowed, -John]
[Interesting take. In reality, of couse, BASIC borrowed that from Fortran. Algol
used := for assignment, different from = for equality comparison. -John]
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 286 |
Nodes: | 16 (2 / 14) |
Uptime: | 91:11:46 |
Calls: | 6,496 |
Calls today: | 7 |
Files: | 12,100 |
Messages: | 5,277,688 |