There is no need to memorize undefined behaviours for a language -
indeed, such a thing is impossible since everything not defined by a
language standard is, by definition, undefined behaviour. (C and C++
are not special here - the unusual thing is just that their standards
say this explicitly.)
The real challenge from big languages and big standard libraries is not /writing/ code, it is /reading/ it. It doesn't really matter if a C programmer, when writing some code, does not know what the syntax "void foo(int a[static 10]);" means. (Most C programmers don't know it, and
never miss it.) But it can be a problem if they have to read and
understand code that uses something they don't know.
The trick is to memorize the/defined/ behaviours, and stick to them.
David Brown <david.brown@hesbynett.no> schrieb:
There is no need to memorize undefined behaviours for a language -
indeed, such a thing is impossible since everything not defined by a language standard is, by definition, undefined behaviour. (C and C++
are not special here - the unusual thing is just that their standards
say this explicitly.)
This is a rather C-centric view of things. The Fortran standard
uses a different model.
There are constraints, which are numbered. Any violation of such
a constraint needs to be reported by the compiler ("processor",
in Fortran parlance). If it fails to do so, this is a bug in
the compiler.
There are also phrases which have "shall" or "shall not". If this
is violated, this is an error in the program. Catching such a
violation is a good thing from quality of implementation standpoint,
but is not required. Many run-time errors such as array overruns
fall into this category.
David Brown <david.brown@hesbynett.no> schrieb:
There is no need to memorize undefined behaviours for a language -
indeed, such a thing is impossible since everything not defined by a
language standard is, by definition, undefined behaviour. (C and C++
are not special here - the unusual thing is just that their standards
say this explicitly.)
This is a rather C-centric view of things. The Fortran standard
uses a different model.
There are constraints, which are numbered. Any violation of such
a constraint needs to be reported by the compiler ("processor",
in Fortran parlance). If it fails to do so, this is a bug in
the compiler.
There are also phrases which have "shall" or "shall not". If this
is violated, this is an error in the program. Catching such a
violation is a good thing from quality of implementation standpoint,
but is not required. Many run-time errors such as array overruns
fall into this category.
[...]
The real challenge from big languages and big standard libraries is not
/writing/ code, it is /reading/ it. It doesn't really matter if a C
programmer, when writing some code, does not know what the syntax "void
foo(int a[static 10]);" means. (Most C programmers don't know it, and
never miss it.) But it can be a problem if they have to read and
understand code that uses something they don't know.
Agreed.
On 06/01/2022 08:11, David Brown wrote:
The trick is to memorize the/defined/ behaviours, and stick to them.
Isn't the set of defined behaviours bigger than the set
of undefined behaviours? How do you know what is defined
if you don't know what is undefined?
For example, a = b + c is precisely defined in C and C++ for
floating point variables, but the result can be "undefined behaviour"
for ordinary 32 bit signed integer values.
If you want to stick to defined behaviours then you need
to add extra code. For example, CERT recommends:
if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
((si_b < 0) && (si_a < (INT_MIN - si_b)))) {
/* Handle error */
} else {
sum = si_a + si_b;
}
On 06/01/2022 08:11, David Brown wrote:
The trick is to memorize the/defined/ behaviours, and stick to them.
Isn't the set of defined behaviours bigger than the set
of undefined behaviours?
How do you know what is defined
if you don't know what is undefined?
For example, a = b + c is precisely defined in C and C++ for
floating point variables, but the result can be "undefined behaviour"
for ordinary 32 bit signed integer values.
If you want to stick to defined behaviours then you need
to add extra code. For example, CERT recommends:
if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
((si_b < 0) && (si_a < (INT_MIN - si_b)))) {
/* Handle error */
} else {
sum = si_a + si_b;
}
On Thu, 6 Jan 2022 16:43:05 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
David Brown <david.brown@hesbynett.no> schrieb:
There is no need to memorize undefined behaviours for a language -
indeed, such a thing is impossible since everything not defined by a
language standard is, by definition, undefined behaviour. (C and C++
are not special here - the unusual thing is just that their standards
say this explicitly.)
This is a rather C-centric view of things. The Fortran standard
uses a different model.
There are constraints, which are numbered. Any violation of such
a constraint needs to be reported by the compiler ("processor",
in Fortran parlance). If it fails to do so, this is a bug in
the compiler.
There are also phrases which have "shall" or "shall not". If this
is violated, this is an error in the program. Catching such a
violation is a good thing from quality of implementation standpoint,
but is not required. Many run-time errors such as array overruns
fall into this category.
This seems to me exactly like the C model. What difference do you see ?
Undefined behaviour, as far as language standards are concerned, are >omnipresent in programming - for all languages.
Spiros Bousbouras <spibou@gmail.com> schrieb:
On Thu, 6 Jan 2022 16:43:05 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
This is a rather C-centric view of things. The Fortran standard
uses a different model.
There are constraints, which are numbered. Any violation of such
a constraint needs to be reported by the compiler ("processor",
in Fortran parlance). If it fails to do so, this is a bug in
the compiler.
There are also phrases which have "shall" or "shall not". If this
is violated, this is an error in the program. Catching such a
violation is a good thing from quality of implementation standpoint,
but is not required. Many run-time errors such as array overruns
fall into this category.
This seems to me exactly like the C model. What difference do you see ?
First, I see a difference in result. Highly intelligent and
knowledgable people argue vehemently if a program should be able
to use undefined behavior or not, and lot of vitriol is directed
against compiler writers who use the assumption that undefined
behavior cannot happen in their compilers for optimization,
especially if it turns out that existing code was broken and no
longer works after a compiler upgrade (Just read a few of Linus
Torvald's comments on that matter).
I see C conflating two separate concepts: Programm errors and
behavior that is outside the standard. "Undefined behavior is
always a programming error" does not work; that would make
#include <unistd.h>
#include <string.h>
int main()
{
char a[] = "Hello, world!\n";
write (1, a, strlen(a));
return 0;
}
not more and not less erroneous than
int main()
{
int *p = 0;
*p = 42;
}
whereas I would argue that there is an important difference between
the two.
If the C standard replaced "the behavior is undefined" with "the
program is in error, and the subsequent behavior is undefined"
or something along those lines, the discussion would be much
muted.
(Somebody may point out to me that this what the standard is
actually saying. If so, that would sort of reinforce my argument
that it should be clearer :-)
On Sat, 8 Jan 2022 09:31:06 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
Spiros Bousbouras <spibou@gmail.com> schrieb:
On Thu, 6 Jan 2022 16:43:05 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
This is a rather C-centric view of things. The Fortran standard
uses a different model.
There are constraints, which are numbered. Any violation of such
a constraint needs to be reported by the compiler ("processor",
in Fortran parlance). If it fails to do so, this is a bug in
the compiler.
There are also phrases which have "shall" or "shall not". If this
is violated, this is an error in the program. Catching such a
violation is a good thing from quality of implementation standpoint,
but is not required. Many run-time errors such as array overruns
fall into this category.
This seems to me exactly like the C model. What difference do you see ?
First, I see a difference in result. Highly intelligent and
knowledgable people argue vehemently if a program should be able
to use undefined behavior or not, and lot of vitriol is directed
against compiler writers who use the assumption that undefined
behavior cannot happen in their compilers for optimization,
especially if it turns out that existing code was broken and no
longer works after a compiler upgrade (Just read a few of Linus
Torvald's comments on that matter).
I see C conflating two separate concepts: Programm errors and
behavior that is outside the standard. "Undefined behavior is
always a programming error" does not work; that would make
The C standard is in no position to say that some programme is in
error. This would require near omniscience from the standard
writers.
Spiros Bousbouras <spibou@gmail.com> schrieb:
On Sat, 8 Jan 2022 09:31:06 -0000 (UTC)
Thomas Koenig <tkoenig@netcologne.de> wrote:
I see C conflating two separate concepts: Programm errors and
behavior that is outside the standard. "Undefined behavior is
always a programming error" does not work; that would make
The C standard is in no position to say that some programme is in
error. This would require near omniscience from the standard
writers.
A standard (or other specification document) is certainly able to
state that some construct is in error. To grab an often-quoted
example:
J3/18-007r1, the Fortran 2018 interpretation documents, states in
subclause 9.5.3, "Array elements and array sections",
# The value of a subscript in an array element shall be within the
# bounds for its dimension.
No omnicience required to write or understand that sentence.
This puts the burden on the programmer. The compiler might catch
such an error error and abort the program, or other unpredictable
things such as overwriting an unrelated variable might also happen.
Reading a language standard can be hard. Quite often, information
is scattered throughout the text and needs to be pieced together
to find the necessary information, especially definition of terms
which are crucial to understanding. Most programmers do do not
read standards (at least final committee drafts can usually be
found these days on the Internet), but compiler writers should at
least be familiar with what they are implementing.
Programmers often rely on books, but these can also get things wrong.
Because programmers are human, they also can get ticked off when being
told that a construct they have used for years has been illegal
for decades :-|
Spiros Bousbouras <spibou@gmail.com> schrieb:
This seems to me exactly like the C model. What difference do you see ?
First, I see a difference in result. Highly intelligent and
knowledgable people argue vehemently if a program should be able
to use undefined behavior or not, and lot of vitriol is directed
against compiler writers who use the assumption that undefined
behavior cannot happen in their compilers for optimization,
especially if it turns out that existing code was broken and no
longer works after a compiler upgrade (Just read a few of Linus
Torvald's comments on that matter).
I see C conflating two separate concepts: Programm errors and
behavior that is outside the standard. "Undefined behavior is
always a programming error" does not work; that would make
#include <unistd.h>
#include <string.h>
int main()
{
char a[] = "Hello, world!\n";
write (1, a, strlen(a));
return 0;
}
not more and not less erroneous than
int main()
{
int *p = 0;
*p = 42;
}
whereas I would argue that there is an important difference between
the two.
If the C standard replaced "the behavior is undefined" with "the
program is in error, and the subsequent behavior is undefined"
or something along those lines, the discussion would be much
muted.
(Somebody may point out to me that this what the standard is[Fortran has in principle historically allowed rather aggressive optimization, e.g., A*B+A*C can turn into A*(B+C). On the other hand, in the real world, when IBM improved their optimizing compiler Fortran H into Fortran X, the developers said any new optimization had to produce bit identical results
actually saying. If so, that would sort of reinforce my argument
that it should be clearer :-)
David Brown <david.brown@hesbynett.no> writes:
Undefined behaviour, as far as language standards are concerned, are
omnipresent in programming - for all languages.
Please prove this astounding assertion. My impression is that managed languages define everything, at least to some extent, and leave
nothing undefined. If they allowed nasal demons, the appeal of
managed languages would evaporate instantly.
The big question here, is why do you think Fortran is any different? In theory, there isn't a difference - nothing you have said here convinces
me that there is any fundamental difference between Fortran and C in
regards to undefined behaviour.
(And there's no difference in the
implementations - the most commonly used Fortran compilers also handle
C, C++, and perhaps other languages.)
I see C conflating two separate concepts: Programm errors and
behavior that is outside the standard. "Undefined behavior is
always a programming error" does not work; that would make
#include <unistd.h>
#include <string.h>
int main()
{
char a[] = "Hello, world!\n";
write (1, a, strlen(a));
return 0;
}
C does not have a "write" function in the standard library. So the
behaviour of "write" is not defined by the C standards - but that does
not mean the behaviour is undefined.
It just means it is defined
elsewhere, not in the C standards.
I see C conflating two separate concepts: Programm errors and
behavior that is outside the standard. "Undefined behavior is
always a programming error" does not work; that would make
#include <unistd.h>
#include <string.h>
int main()
{
char a[] = "Hello, world!\n";
write (1, a, strlen(a));
return 0;
}
David Brown <david.brown@hesbynett.no> schrieb:
The big question here, is why do you think Fortran is any different? In
theory, there isn't a difference - nothing you have said here convinces
me that there is any fundamental difference between Fortran and C in
regards to undefined behaviour.
I am not sure how to better explain it. I will try a bit, but
this will be my last reply to you in this thread. We seem to have
a fundamental difference in our understanding, and seem to be
unable to resolve it.
(And there's no difference in the
implementations - the most commonly used Fortran compilers also handle
C, C++, and perhaps other languages.)
Sort of.
At the risk of boring most readers of this group, a very short, but (hopefully) pertinent introduction of how modern compilers work:
There is no compiler (if you mean a single binary) that handles both
C and Fortran. They are separate front ends to common middle
and back ends.
C does not have a "write" function in the standard library. So the
behaviour of "write" is not defined by the C standards - but that does
not mean the behaviour is undefined.
When interpreting at a language standard, you _must_ follow the
definitions in the standards if they exist, you cannot use everyday interpretations.
Subclause 3.4.3 (N2596) defines
# undefined behavior
# behavior, upon use of a nonportable or erroneous program
# construct or of erroneous data, for which this document imposes
# no requirements
write() is nonportable and the C standard imposes no requirements
on it. Therefore, the program above invokes undefined behavior.
David Brown <david.brown@hesbynett.no> writes:
Undefined behaviour, as far as language standards are concerned, are >>omnipresent in programming - for all languages.
Please prove this astounding assertion. My impression is that managed languages define everything, at least to some extent, and leave
nothing undefined. If they allowed nasal demons, the appeal of
managed languages would evaporate instantly.
On 10/01/2022 13:04, Thomas Koenig wrote:
David Brown <david.brown@hesbynett.no> schrieb:
The big question here, is why do you think Fortran is any different? In >>> theory, there isn't a difference - nothing you have said here convinces
me that there is any fundamental difference between Fortran and C in
regards to undefined behaviour.
I am not sure how to better explain it. I will try a bit, but
this will be my last reply to you in this thread. We seem to have
a fundamental difference in our understanding, and seem to be
unable to resolve it.
Fair enough. Maybe in a future discussion, one of us will have an
"Aha!" moment and understand the other's viewpoint, and progress will be
made - until then, there's no point in going around in circles. I'll
snip bits of your post here, and try to minimise new points (unless I
get that "Aha!") - but be sure I am reading and appreciating your entire post.
(And there's no difference in the
implementations - the most commonly used Fortran compilers also handle
C, C++, and perhaps other languages.)
Sort of.
At the risk of boring most readers of this group, a very short, but
(hopefully) pertinent introduction of how modern compilers work:
There is no compiler (if you mean a single binary) that handles both
C and Fortran. They are separate front ends to common middle
and back ends.
Yes. But it is the middle end that handles most of the optimisations, including those based on undefined behaviour. The front end determines whether code can have undefined behaviour and in what circumstances.
C does not have a "write" function in the standard library. So the
behaviour of "write" is not defined by the C standards - but that does
not mean the behaviour is undefined.
When interpreting at a language standard, you _must_ follow the
definitions in the standards if they exist, you cannot use everyday
interpretations.
Subclause 3.4.3 (N2596) defines
# undefined behavior
# behavior, upon use of a nonportable or erroneous program
# construct or of erroneous data, for which this document imposes
# no requirements
write() is nonportable and the C standard imposes no requirements
on it. Therefore, the program above invokes undefined behavior.
No. (As always, this is based on my interpretation of the standards -
consider everything to have "IMHO" attached.) The implementation of
"write" is outside the scope of the standards, and is therefore
undefined as far as the standards are concerned. That does not make it undefined behaviour in the program - it just means the standards don't
say what "write" should do.
This leaves a lot of room for Fortran and C to have entirely different defined/undefined behaviors.
Even the front end for one single language can have a lot of switches affecting what is defined or not.
On 2022-01-08, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
David Brown <david.brown@hesbynett.no> writes:
Undefined behaviour, as far as language standards are concerned, are >>>omnipresent in programming - for all languages.
Please prove this astounding assertion. My impression is that managed
languages define everything, at least to some extent, and leave
nothing undefined. If they allowed nasal demons, the appeal of
managed languages would evaporate instantly.
The Lisp-like programming language Scheme has unspecified order of
argument evaluation. And you can stuff side effects into argument >expressions, like in C.
Its built-in imperative have undefined return values.
ANSI Common Lisp leaves the effects undefined of modifying literals,
just like C. ANSI Lisp code that perpetrates some kind of error is
safe only if compiled in safe mode; if you compile with reduced safety,
e.g. (declare (optimize (safety 0))), then error become undefined
behavior, including type errors. If you declare that some quantity is
a fixnum integer, and request safety 0 speed 3, and then it turns
out that it's other than an integer, woe to that code.
However, in these cases you're invoking the safety escape hatch;
it's not like C where you are shackled by chains of undefined behavior
which make themselves felt every time you squirm.
On Tuesday, January 11, 2022 at 11:47:26 AM UTC-8, Kaz Kylheku wrote:
(big snip)
This leaves a lot of room for Fortran and C to have entirely different
defined/undefined behaviors.
Even the front end for one single language can have a lot of switches
affecting what is defined or not.
I suppose so. But more usual, the compiler works to the least
common denominator.
For one, C requires static variables, and especially external ones, to initialize to zero, but Fortran doesn't. Fortran compilers that use C compiler middle and back ends, tend to zero such variables.
I suspect that there are many more that I don't know about.
As long as the cost is small, and it satisfies both standards,
not much reason not to do it.
Fortran has stricter rules on aliasing than C. I don't actually know
about any effect on C programs, though, but it might be that
compilers do the same for C.
gah4 <gah4@u.washington.edu> schrieb:
On Tuesday, January 11, 2022 at 11:47:26 AM UTC-8, Kaz Kylheku wrote:
For one, C requires static variables, and especially external ones, to
initialize to zero, but Fortran doesn't. Fortran compilers that use C
compiler middle and back ends, tend to zero such variables.
This is more a matter of operating system and linker conventions
than of compilers.
Looking at the ELF standard, one finds
.bss
This section holds uninitialized data that contribute to the program's
memory image. By definition, the system initializes the data with zeros
when the program begins to run. The section occupies no file space, as indicated by the section type, SHT_NOBITS.
which, unsurprisingly, matches exactly what C is doing.
Anybody who writes a Fortran compiler for an ELF system will
use .bss for COMMOM blocks, because it is easiest. Initialization
with zeros then happens automatically.
I suspect that there are many more that I don't know about.
As long as the cost is small, and it satisfies both standards,
not much reason not to do it.
Fortran has stricter rules on aliasing than C. I don't actually know
about any effect on C programs, though, but it might be that
compilers do the same for C.
The rules are different, and unless C is the intermediate language,
a good compiler will hand the corresponding hints to the middle end.
[I have used Fortran systems that initialized otherwise undefined
data to a value that would trap, to help find use-before-set errors.
-John]
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (2 / 14) |
Uptime: | 87:46:17 |
Calls: | 6,658 |
Files: | 12,203 |
Messages: | 5,333,954 |