• Re: what is defined, was for or against equality

    From Thomas Koenig@21:1/5 to David Brown on Thu Jan 6 16:43:05 2022
    David Brown <david.brown@hesbynett.no> schrieb:

    There is no need to memorize undefined behaviours for a language -
    indeed, such a thing is impossible since everything not defined by a
    language standard is, by definition, undefined behaviour. (C and C++
    are not special here - the unusual thing is just that their standards
    say this explicitly.)

    This is a rather C-centric view of things. The Fortran standard
    uses a different model.

    There are constraints, which are numbered. Any violation of such
    a constraint needs to be reported by the compiler ("processor",
    in Fortran parlance). If it fails to do so, this is a bug in
    the compiler.

    There are also phrases which have "shall" or "shall not". If this
    is violated, this is an error in the program. Catching such a
    violation is a good thing from quality of implementation standpoint,
    but is not required. Many run-time errors such as array overruns
    fall into this category.

    [...]

    The real challenge from big languages and big standard libraries is not /writing/ code, it is /reading/ it. It doesn't really matter if a C programmer, when writing some code, does not know what the syntax "void foo(int a[static 10]);" means. (Most C programmers don't know it, and
    never miss it.) But it can be a problem if they have to read and
    understand code that uses something they don't know.

    Agreed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Martin Ward@21:1/5 to David Brown on Fri Jan 7 14:02:50 2022
    On 06/01/2022 08:11, David Brown wrote:
    The trick is to memorize the/defined/ behaviours, and stick to them.

    Isn't the set of defined behaviours bigger than the set
    of undefined behaviours? How do you know what is defined
    if you don't know what is undefined?

    For example, a = b + c is precisely defined in C and C++ for
    floating point variables, but the result can be "undefined behaviour"
    for ordinary 32 bit signed integer values.

    If you want to stick to defined behaviours then you need
    to add extra code. For example, CERT recommends:

    if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
    ((si_b < 0) && (si_a < (INT_MIN - si_b)))) {
    /* Handle error */
    } else {
    sum = si_a + si_b;
    }

    --
    Martin

    Dr Martin Ward | Email: martin@gkc.org.uk | http://www.gkc.org.uk G.K.Chesterton site: http://www.gkc.org.uk/gkc | Erdos number: 4

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Spiros Bousbouras@21:1/5 to Thomas Koenig on Fri Jan 7 13:21:29 2022
    On Thu, 6 Jan 2022 16:43:05 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:
    David Brown <david.brown@hesbynett.no> schrieb:

    There is no need to memorize undefined behaviours for a language -
    indeed, such a thing is impossible since everything not defined by a language standard is, by definition, undefined behaviour. (C and C++
    are not special here - the unusual thing is just that their standards
    say this explicitly.)

    This is a rather C-centric view of things. The Fortran standard
    uses a different model.

    There are constraints, which are numbered. Any violation of such
    a constraint needs to be reported by the compiler ("processor",
    in Fortran parlance). If it fails to do so, this is a bug in
    the compiler.

    There are also phrases which have "shall" or "shall not". If this
    is violated, this is an error in the program. Catching such a
    violation is a good thing from quality of implementation standpoint,
    but is not required. Many run-time errors such as array overruns
    fall into this category.

    This seems to me exactly like the C model. What difference do you see ?

    Regarding the more general issue, it seems to me that undefined behaviour is
    a red herring (which I think is the point David was making). Every time one writes code in any language , one must have an expectation on how the code is supposed to behave and some reasoning on why the code they wrote will behave according to their expectations. The reasoning will be based (apart from general rules from logic and mathematics) on what the standard of the programming language specifies (if the language has a standard) , what the translator/compiler documentation specifies , what the documentation of any libraries they use specifies and so forth.

    For example lets say that I write in C

    int a = INT_MAX + 1 ;

    with the expectation that a will get the value INT_MIN. The onus is on me
    to provide a reasoning why the code above will meet my expectation. If I
    cannot provide such a reasoning then from my point of view the code is
    already undefined. The fact that the C standard also says that the code is undefined is irrelevant. Even if the C standard specified for example that signed integer arithmetic uses wraparound, unless I could point to the place
    in the standard where it said so, the code is still undefined from my point
    of view so I should not use it.

    But lets say that I have the above code and I intend to compile it with
    GCC using the -fwrapv flag. Then my expectation is actually justified
    based on the GCC documentation for what -fwrapv means and the parts
    of the C standard which define what the various symbols in

    int a = INT_MAX + 1 ;

    mean. I'm not going to provide a proof because it should be obvious. But
    any such proof would not need to cite any part of the C standard which explicitly mentions undefined behaviour.


    The only occasion where an explicit mention of undefined behaviour would be relevant would be if the C standard (or any standard) were contradictory i.e. it said in some place that some construct has a certain defined behaviour and it said in some other place that the same construct has undefined behaviour. But with a popular language like C , if such contradictions existed , they would be caught early and corrected.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Thomas Koenig on Fri Jan 7 12:06:12 2022
    On 06/01/2022 17:43, Thomas Koenig wrote:
    David Brown <david.brown@hesbynett.no> schrieb:

    There is no need to memorize undefined behaviours for a language -
    indeed, such a thing is impossible since everything not defined by a
    language standard is, by definition, undefined behaviour. (C and C++
    are not special here - the unusual thing is just that their standards
    say this explicitly.)

    This is a rather C-centric view of things. The Fortran standard
    uses a different model.

    There are constraints, which are numbered. Any violation of such
    a constraint needs to be reported by the compiler ("processor",
    in Fortran parlance). If it fails to do so, this is a bug in
    the compiler.

    C has basically the same concept.

    (IIRC, C++ as a few constraints such as the "one definition rule" that
    where the standard says no diagnostics are necessary, because
    identifying the error would mean the compiler has to see multiple
    translation units at once. Compilers often diagnose these if they have
    some kind of link-time optimisation or program-at-once mode.)


    There are also phrases which have "shall" or "shall not". If this
    is violated, this is an error in the program. Catching such a
    violation is a good thing from quality of implementation standpoint,
    but is not required. Many run-time errors such as array overruns
    fall into this category.

    That is the same in C. From 4.2 "Conformance" :

    """
    If a “shall” or “shall not” requirement that appears outside of a constraint or runtime-constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard
    by the words “undefined behavior” or by the omission of any explicit definition of behavior. There is no difference in emphasis among these
    three; they all describe “behavior that is undefined”.
    """

    The only difference I see from what you describe of Fortran (I have not
    read any Fortran standards) is that the C standards also note that
    behaviour that is not defined in the standards is undefined behaviour as
    far as the standards are concerned. That is a tautology, of course, and applies equally to Fortran and any other language.


    It is quite possible that the details of which behaviours are defined or
    not varies between the languages - things like division by 0,
    out-of-bounds array access, etc., may be different. As I understand it, passing aliased pointers or array references as different parameters to
    the same function can lead to undefined behaviour in Fortran, whereas it
    is defined in C (unless you use "restrict").


    [...]

    The real challenge from big languages and big standard libraries is not
    /writing/ code, it is /reading/ it. It doesn't really matter if a C
    programmer, when writing some code, does not know what the syntax "void
    foo(int a[static 10]);" means. (Most C programmers don't know it, and
    never miss it.) But it can be a problem if they have to read and
    understand code that uses something they don't know.

    Agreed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Martin Ward on Fri Jan 7 15:56:22 2022
    On 07/01/2022 15:02, Martin Ward wrote:
    On 06/01/2022 08:11, David Brown wrote:
    The trick is to memorize the/defined/  behaviours, and stick to them.

    Isn't the set of defined behaviours bigger than the set
    of undefined behaviours? How do you know what is defined
    if you don't know what is undefined?

    You know what is "defined" because you can find the definition for it - everything else is undefined. You could enumerate all defined
    behaviours for a language - after all, the documentation (language
    standards, compiler manual, library documentation, etc.) is finite. It
    doesn't really make sense to try to find how many undefined behaviours
    there are - it's like asking how many things are there that are apples.

    Language standards tell you the defined behaviour for a language.
    Anything that is not there, is undefined - that's simply what the word "undefined" means.

    Note that there are many other things besides language standards that
    define behaviour of code in practice - compilers or interpreters can add
    their own definitions to things that are not defined by the language
    standards, as can additional standards such as POSIX.

    If you write a function "foo" - perhaps written in the same language
    (such as C), perhaps in a completely different language - then its
    behaviour is not defined by the language standards. It is not mentioned anywhere in those documents, so it is undefined. (That is different
    from functions whose behaviour is specified in the standard, such as
    "memcpy".)

    Undefined behaviour, as far as language standards are concerned, are omnipresent in programming - for all languages. The problem only comes
    when you attempt to execute something that does not have its behaviour
    defined /anywhere/. Then it is incorrect code - a bug.


    When I learned to program (i.e., during my university education rather
    than from books, magazines and trial and error previous to that), we
    were very clear about how a function is specified. You have a
    pre-condition and a post-condition. The function can assume the
    pre-condition is logically "true", and it will guarantee that the post-condition is true at the exit. (Typically you also have an
    "invariant" that is a clause in both parts, but that is just for
    convenience.) If the function is called when the pre-condition is
    false, the function has no obligation to do anything - it can give an
    error, launch nasal daemons, give the answer it thinks the programmer
    hoped for, or anything else. The behaviour is undefined.

    This concept has existed since the dawn of programming:

    """
    On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into
    the machine wrong figures, will the right answers come out?' I am not
    able rightly to apprehend the kind of confusion of ideas that could
    provoke such a question.

    Charles Babbage
    """


    The C standards contain a fair number of explicit undefined behaviours.
    They do that for convenience and clarity, and often to encourage
    compiler developers towards greater efficiency rather than run-time
    checks, and to encourage programmers towards not assuming particular
    behaviours even if one compiler happens to define the behaviour. So a
    compiler writer knows that they can assume "a + b" never overflows (for
    integer arithmetic), and a programmer knows that they can't assume
    signed arithmetic is wrapping even if the compiler they are using at the
    time /guarantees/ wrapping behaviour. (I have never seen a C compiler
    that guarantees this without explicit flags.)

    C is a language that expects the programmer to take responsibility for
    his or her code, and ensure that it is correct. Fortunately, good
    compiler developers know this is difficult and provide tools to help
    people find their bugs. Thus you have a language that can give
    efficient results, /and/ provide good debugging and run-time checking,
    as long as you get good tools and understand how to use them.



    For example, a = b + c is precisely defined in C and C++ for
    floating point variables, but the result can be "undefined behaviour"
    for ordinary 32 bit signed integer values.


    Actually, it is not precisely defined for floating point operations - if
    there is an "exceptional condition" during the evaluation (the result is
    not mathematically defined or not in the range of representable values
    for its type), the behaviour is undefined. That applies to all
    expressions - integer and floating point.

    Now, it is very common (but certainly not universal) for C
    implementations to use IEEE floating point formats and rules. These
    provide the "mathematical definitions" for floating point operations,
    including handling of calculations outside the normal ranges. But if
    you are not using these, such calculations could result in undefined
    behaviour. (For example, if you use "gcc -ffast-math", the compiler
    will assume that all expressions are normal finite numbers - that's
    perfectly valid for C, and can be very much more efficient on a lot of targets.)

    Signed integer overflow is undefined behaviour on most compilers (the
    size is not necessarily 32-bit). The only one I know that defines the behaviour is gcc (and compatibles, such as clang and icc) with the
    "-fwrapv" flag enabled.

    And of course that makes perfect sense. It is logical to assume that if
    you add two positive numbers, you get a positive number - it is
    illogical to suppose that sometimes the "correct" answer will be
    negative. Some programming languages (such as Java) specifically define
    signed integer arithmetic to be wrapping - the result is that sometimes
    you get the wrong answer in Java, while in C you would get undefined
    behaviour. Wrong answers are less helpful - leaving the behaviour
    undefined means you get more efficient code and that you can use
    debugging tools (such as gcc's -fsantitize=undefined) to help find the
    errors in your code.


    If you want to stick to defined behaviours then you need
    to add extra code. For example, CERT recommends:

      if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
          ((si_b < 0) && (si_a < (INT_MIN - si_b)))) {
        /* Handle error */
      } else {
        sum = si_a + si_b;
      }


    That is /not/ code to "stick to defined behaviours". It is code to
    identify problems and perhaps find some way to handle it (depending on
    what the "handle error" code is).

    You can "stick to defined behaviour" much more simply:

    int sum = (unsigned int) si_a + (unsigned int) si_b;

    The behaviour is fully defined, and the result will be wrong if there is
    an overflow - just like when you use a language that has fully defined
    signed integer arithmetic by wrapping.


    The answer here is /not/ to worry about what happens when your
    expressions overflow and you get undefined behaviour. The answer is to
    think about the code you are writing, and make sure that the types and expressions you write are appropriate for the values you have. Check
    your values for validity when you get them in (from files, user input,
    etc.), then write code that is correct for the full range of values.
    Simple. (Well, as simple as any programming!)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Spiros Bousbouras@21:1/5 to Martin Ward on Sat Jan 8 03:41:55 2022
    On Fri, 7 Jan 2022 14:02:50 +0000
    Martin Ward <martin@gkc.org.uk> wrote:
    On 06/01/2022 08:11, David Brown wrote:
    The trick is to memorize the/defined/ behaviours, and stick to them.

    Isn't the set of defined behaviours bigger than the set
    of undefined behaviours?

    That depends on how you define those sets. For example, any finite string is
    a potential C source code and, of strings of length N (for any value of N), only a very small percentage have defined behaviour. But regardless, you
    need to know at least some defined behaviours to be able to programme at all and, as long as you stick to those, you are not using any undefined
    behaviours.

    How do you know what is defined
    if you don't know what is undefined?

    As David has already said, you know by reading the definitions. And this is
    the only way to know. Trying to guess what you're getting at, perhaps you
    are thinking of someone who learns some C, then makes some unwarranted assumptions from what they have learned and then has those assumptions scaled back by coming across explicit mentions of "undefined behaviour" in the C standard. Perhaps some people do behave this way. For example someone who already knows assembly and begins to learn C may assume that all address manipulations which would be legal in assembly are also legal using C
    pointers. The correct remedy is not to make unwarranted assumptions to begin with, whether one learns C or any other programming language. There is an infinite number of unwarranted assumptions one can make and the C standard
    can only caution against a finite number of them.

    For example, a = b + c is precisely defined in C and C++ for
    floating point variables, but the result can be "undefined behaviour"
    for ordinary 32 bit signed integer values.

    If you want to stick to defined behaviours then you need
    to add extra code. For example, CERT recommends:

    if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
    ((si_b < 0) && (si_a < (INT_MIN - si_b)))) {
    /* Handle error */
    } else {
    sum = si_a + si_b;
    }

    Whether you need to add code as the above will depend on what you already
    know about the types and values of si_a and si_b .

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Spiros Bousbouras on Sat Jan 8 09:31:06 2022
    Spiros Bousbouras <spibou@gmail.com> schrieb:
    On Thu, 6 Jan 2022 16:43:05 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:
    David Brown <david.brown@hesbynett.no> schrieb:

    There is no need to memorize undefined behaviours for a language -
    indeed, such a thing is impossible since everything not defined by a
    language standard is, by definition, undefined behaviour. (C and C++
    are not special here - the unusual thing is just that their standards
    say this explicitly.)

    This is a rather C-centric view of things. The Fortran standard
    uses a different model.

    There are constraints, which are numbered. Any violation of such
    a constraint needs to be reported by the compiler ("processor",
    in Fortran parlance). If it fails to do so, this is a bug in
    the compiler.

    There are also phrases which have "shall" or "shall not". If this
    is violated, this is an error in the program. Catching such a
    violation is a good thing from quality of implementation standpoint,
    but is not required. Many run-time errors such as array overruns
    fall into this category.

    This seems to me exactly like the C model. What difference do you see ?

    First, I see a difference in result. Highly intelligent and
    knowledgable people argue vehemently if a program should be able
    to use undefined behavior or not, and lot of vitriol is directed
    against compiler writers who use the assumption that undefined
    behavior cannot happen in their compilers for optimization,
    especially if it turns out that existing code was broken and no
    longer works after a compiler upgrade (Just read a few of Linus
    Torvald's comments on that matter).

    I see C conflating two separate concepts: Programm errors and
    behavior that is outside the standard. "Undefined behavior is
    always a programming error" does not work; that would make

    #include <unistd.h>
    #include <string.h>

    int main()
    {
    char a[] = "Hello, world!\n";
    write (1, a, strlen(a));
    return 0;
    }

    not more and not less erroneous than

    int main()
    {
    int *p = 0;
    *p = 42;
    }

    whereas I would argue that there is an important difference between
    the two.

    If the C standard replaced "the behavior is undefined" with "the
    program is in error, and the subsequent behavior is undefined"
    or something along those lines, the discussion would be much
    muted.

    (Somebody may point out to me that this what the standard is
    actually saying. If so, that would sort of reinforce my argument
    that it should be clearer :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anton Ertl@21:1/5 to David Brown on Sat Jan 8 17:52:02 2022
    David Brown <david.brown@hesbynett.no> writes:
    Undefined behaviour, as far as language standards are concerned, are >omnipresent in programming - for all languages.

    Please prove this astounding assertion. My impression is that managed languages define everything, at least to some extent, and leave
    nothing undefined. If they allowed nasal demons, the appeal of
    managed languages would evaporate instantly.

    - anton
    --
    M. Anton Ertl
    anton@mips.complang.tuwien.ac.at
    http://www.complang.tuwien.ac.at/anton/
    [Things like .NET define a lot but they still are at the mercy
    of their envronment when you ask for a variable sized chunk of
    storage. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Spiros Bousbouras@21:1/5 to Thomas Koenig on Sat Jan 8 22:28:00 2022
    On Sat, 8 Jan 2022 09:31:06 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:
    Spiros Bousbouras <spibou@gmail.com> schrieb:
    On Thu, 6 Jan 2022 16:43:05 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:
    This is a rather C-centric view of things. The Fortran standard
    uses a different model.

    There are constraints, which are numbered. Any violation of such
    a constraint needs to be reported by the compiler ("processor",
    in Fortran parlance). If it fails to do so, this is a bug in
    the compiler.

    There are also phrases which have "shall" or "shall not". If this
    is violated, this is an error in the program. Catching such a
    violation is a good thing from quality of implementation standpoint,
    but is not required. Many run-time errors such as array overruns
    fall into this category.

    This seems to me exactly like the C model. What difference do you see ?

    First, I see a difference in result. Highly intelligent and
    knowledgable people argue vehemently if a program should be able
    to use undefined behavior or not, and lot of vitriol is directed
    against compiler writers who use the assumption that undefined
    behavior cannot happen in their compilers for optimization,
    especially if it turns out that existing code was broken and no
    longer works after a compiler upgrade (Just read a few of Linus
    Torvald's comments on that matter).

    I see C conflating two separate concepts: Programm errors and
    behavior that is outside the standard. "Undefined behavior is
    always a programming error" does not work; that would make

    The C standard is in no position to say that some programme is in
    error. This would require near omniscience from the standard
    writers.

    #include <unistd.h>
    #include <string.h>

    int main()
    {
    char a[] = "Hello, world!\n";
    write (1, a, strlen(a));
    return 0;
    }

    not more and not less erroneous than

    int main()
    {
    int *p = 0;
    *p = 42;
    }

    whereas I would argue that there is an important difference between
    the two.

    The only difference I see between the two is that the first is defined
    by POSIX and the second is not. According to POSIX the first is required
    to print something on stdout. I cannot imagine any extension which
    would make the second programme do something useful and a conforming implementation may well compile it as essentially a no-op.

    But with something like

    int main(voidd) {
    int *p = 0 ;
    *p = 42 ;
    .... do other stuff ...
    return 0 ;
    }

    the C standard allows for a conforming implementation to do something
    useful like perhaps store 42 to address 0.

    If the C standard replaced "the behavior is undefined" with "the
    program is in error, and the subsequent behavior is undefined"
    or something along those lines, the discussion would be much
    muted.

    (Somebody may point out to me that this what the standard is
    actually saying. If so, that would sort of reinforce my argument
    that it should be clearer :-)

    No , it most definitely does not say that nor could it possibly say
    that.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Spiros Bousbouras on Sun Jan 9 00:09:19 2022
    Spiros Bousbouras <spibou@gmail.com> schrieb:
    On Sat, 8 Jan 2022 09:31:06 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:
    Spiros Bousbouras <spibou@gmail.com> schrieb:
    On Thu, 6 Jan 2022 16:43:05 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:
    This is a rather C-centric view of things. The Fortran standard
    uses a different model.

    There are constraints, which are numbered. Any violation of such
    a constraint needs to be reported by the compiler ("processor",
    in Fortran parlance). If it fails to do so, this is a bug in
    the compiler.

    There are also phrases which have "shall" or "shall not". If this
    is violated, this is an error in the program. Catching such a
    violation is a good thing from quality of implementation standpoint,
    but is not required. Many run-time errors such as array overruns
    fall into this category.

    This seems to me exactly like the C model. What difference do you see ?

    First, I see a difference in result. Highly intelligent and
    knowledgable people argue vehemently if a program should be able
    to use undefined behavior or not, and lot of vitriol is directed
    against compiler writers who use the assumption that undefined
    behavior cannot happen in their compilers for optimization,
    especially if it turns out that existing code was broken and no
    longer works after a compiler upgrade (Just read a few of Linus
    Torvald's comments on that matter).

    I see C conflating two separate concepts: Programm errors and
    behavior that is outside the standard. "Undefined behavior is
    always a programming error" does not work; that would make

    The C standard is in no position to say that some programme is in
    error. This would require near omniscience from the standard
    writers.

    A standard (or other specification document) is certainly able to
    state that some construct is in error. To grab an often-quoted
    example:

    J3/18-007r1, the Fortran 2018 interpretation documents, states in
    subclause 9.5.3, "Array elements and array sections",

    # The value of a subscript in an array element shall be within the
    # bounds for its dimension.

    No omnicience required to write or understand that sentence.

    This puts the burden on the programmer. The compiler might catch
    such an error error and abort the program, or other unpredictable
    things such as overwriting an unrelated variable might also happen.

    Reading a language standard can be hard. Quite often, information
    is scattered throughout the text and needs to be pieced together
    to find the necessary information, especially definition of terms
    which are crucial to understanding. Most programmers do do not
    read standards (at least final committee drafts can usually be
    found these days on the Internet), but compiler writers should at
    least be familiar with what they are implementing.

    Programmers often rely on books, but these can also get things wrong.

    Because programmers are human, they also can get ticked off when being
    told that a construct they have used for years has been illegal
    for decades :-|

    Having a good standard is crucial to being able to write good compilers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Spiros Bousbouras@21:1/5 to Thomas Koenig on Sun Jan 9 21:30:13 2022
    On Sun, 9 Jan 2022 00:09:19 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:
    Spiros Bousbouras <spibou@gmail.com> schrieb:
    On Sat, 8 Jan 2022 09:31:06 -0000 (UTC)
    Thomas Koenig <tkoenig@netcologne.de> wrote:
    I see C conflating two separate concepts: Programm errors and
    behavior that is outside the standard. "Undefined behavior is
    always a programming error" does not work; that would make

    The C standard is in no position to say that some programme is in
    error. This would require near omniscience from the standard
    writers.

    A standard (or other specification document) is certainly able to
    state that some construct is in error. To grab an often-quoted
    example:

    J3/18-007r1, the Fortran 2018 interpretation documents, states in
    subclause 9.5.3, "Array elements and array sections",

    # The value of a subscript in an array element shall be within the
    # bounds for its dimension.

    No omnicience required to write or understand that sentence.

    This puts the burden on the programmer. The compiler might catch
    such an error error and abort the program, or other unpredictable
    things such as overwriting an unrelated variable might also happen.

    I haven't read any Fortran standards so I can only go by the above quote.
    Only the programmer knows what their requirements are and why they think that the code they wrote will meet those requirements. My idea of error is that either the code does not meet the requirements or it does so only by accident and the programmer does not have a correct reasoning as to why their code
    will meet those requirements. You seem to be reading the quote as saying

    No matter what the programmer requirements and no matter what extensions
    their Fortram implementation offers , the programmer requirements will
    not be justifiably met if they use an array subscript outside the bounds
    for its dimension.

    Perhaps some Fortran implementation gives information as to the layout of distinct variables so that one knows what will be overwritten by writing off the bounds of some aray and it will be overwritten in the way the programmer wants. Unlikely (especially for Fortran) but it cannot be excluded. I can imagine a C implementation for small embedded systems which does provide such information and a programmer using it to reduce the number of instructions to achieve a desired result. A more realistic example is the following :

    #include <stdio.h>

    int main(void) {
    int a = 12 , b = 14 ;
    printf("%2$d %1$d\n" , a , b) ;
    return 0 ;
    }

    The above code has undefined behaviour according to the C standard. It is defined according to POSIX .Whether it is in error depends on whether the programmer really wanted to print
    14 12

    and no standards committee can possibly know this. So I still think that your reading requires omniscience from the Fortran standard writers. But perhaps there are other parts of the standard which justify your reading. For example some parts of the Common Lisp standard do state that an implementation must
    not extend some construct to provide useful functionality beyond what the standard specifies. I don't remember precisely how it states it and I can't find those parts now.

    Reading a language standard can be hard. Quite often, information
    is scattered throughout the text and needs to be pieced together
    to find the necessary information, especially definition of terms
    which are crucial to understanding. Most programmers do do not
    read standards (at least final committee drafts can usually be
    found these days on the Internet), but compiler writers should at
    least be familiar with what they are implementing.

    Programmers often rely on books, but these can also get things wrong.

    C books at least usually don't go into the fine details of undefined
    behaviour. To hone one's instincts in this area one should spend a few
    months systematically reading comp.lang.c while consulting a draft
    of the standard !

    Because programmers are human, they also can get ticked off when being
    told that a construct they have used for years has been illegal
    for decades :-|

    This may happen but my impression with C is that the strongest complaints
    come from people who

    - have read the C standard (or at least the relevant parts of it)

    - know that their code has undefined behaviour and know what the term means

    - they do not rely on any compiler extensions

    yet still feel certain (dare I say "entitled" ?) that their code ought to behave in a certain way. For an extreme example see Robert M. Hyatt of
    crafty fame (a chess programme which has won awards in the past) : http://www.open-chess.org/viewtopic.php?f=5&t=2519 .
    [Fortran used to require that arrays were stored in column major order, that double precision took twice the space of real and integer, and you were allowed to use EQUIVALENCE and adjustable dimensions in argument arrays to do overlaying
    assuming that layout. Dunno how much more modern Fortran has deprecated it. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Thomas Koenig on Sun Jan 9 23:00:46 2022
    On 08/01/2022 10:31, Thomas Koenig wrote:
    Spiros Bousbouras <spibou@gmail.com> schrieb:

    This seems to me exactly like the C model. What difference do you see ?

    First, I see a difference in result. Highly intelligent and
    knowledgable people argue vehemently if a program should be able
    to use undefined behavior or not, and lot of vitriol is directed
    against compiler writers who use the assumption that undefined
    behavior cannot happen in their compilers for optimization,
    especially if it turns out that existing code was broken and no
    longer works after a compiler upgrade (Just read a few of Linus
    Torvald's comments on that matter).

    People want compilers to do what the programmer meant, not what he or
    she wrote. And in particular, if a compiler did one thing once, they
    want it to continue to do the same thing with the same code - as long as
    they got what they wanted the first time round.

    This is, of course, entirely natural for humans. But it is not natural
    for computer programs like compilers.

    Linus Torvald's is known for blowing his top on matters that he either
    does not understand, or when he has mixed his personal opinions with
    facts, or while only looking at a small part of the big picture. (He is
    also known as an incredible programmer, a world-class project leader,
    and a charismatic visionary who revolutionised the software world - but
    that's beside the point here!).

    A key example of his complaints in this area revolve around a function
    that was something equivalent to :

    int foo(int * p) {
    int x = *p;
    if (!p) return -1;
    return x;
    }

    His complaint was that the compiler saw that "*p" was accessed, and
    therefore assumed "p" could not be zero and optimised away the test.
    The compiler did exactly what it was asked to do - the optimisation is perfectly valid according to the C standards and additional definitions
    given by the compiler. But it was not what the programmer wanted, and
    not what older versions of the compiler had done.

    Of course, when a new optimisation simply makes object code more
    efficient, programmers want that - they don't /always/ want the compiler
    to handle things the way older versions did. They want the compiler to
    read their minds and see what they meant to write, and generate optimal
    code for that.


    None of this is helped by the fact that C code often has to work
    efficiently on a variety of targets and compilers, and some compilers
    give extra guarantees about how they interpret code beyond the
    definitions given in the C standards. Many more compilers can be relied
    upon in practice to work in particular ways, though they don't guarantee
    or document it, and this means the most efficient code that works in
    practice on one compiler may be wrong and give incorrect results on
    another compiler. You can write C code that is correct and widely
    portable, but you can't write C code that is correct, optimally
    efficient, and widely portable.



    The big question here, is why do you think Fortran is any different? In theory, there isn't a difference - nothing you have said here convinces
    me that there is any fundamental difference between Fortran and C in
    regards to undefined behaviour. (And there's no difference in the implementations - the most commonly used Fortran compilers also handle
    C, C++, and perhaps other languages.)

    I believe it is a matter of who writes Fortran programs, and what these programs do. Now, I don't know or use Fortran myself, so I might be
    wrong here. However, it seems to me that Fortran is typically used by experienced professional programmers and for scientific or numerical programming. C is used by a much wider range of programmers, for a much
    wider range of programming tasks. I think it is inevitable that you'll
    get more people programming in C when they are not fully sure of what
    they are doing, more code where subtle mistakes can be made, more people
    using C when other languages would have been better choices, and more C programmers who are likely to blame their tools for their own mistakes.




    I see C conflating two separate concepts: Programm errors and
    behavior that is outside the standard. "Undefined behavior is
    always a programming error" does not work; that would make

    #include <unistd.h>
    #include <string.h>

    int main()
    {
    char a[] = "Hello, world!\n";
    write (1, a, strlen(a));
    return 0;
    }


    C does not have a "write" function in the standard library. So the
    behaviour of "write" is not defined by the C standards - but that does
    not mean the behaviour is undefined. It just means it is defined
    elsewhere, not in the C standards. If the programmer doesn't know what
    the "write" function does or how it is specified, then it might be
    undefined behaviour - certainly it is bad programming.


    not more and not less erroneous than

    int main()
    {
    int *p = 0;
    *p = 42;
    }

    whereas I would argue that there is an important difference between
    the two.


    There is no fundamental difference - if you know the behaviour is
    defined, it is defined. (The program is then correct or incorrect
    depending on how that definition matches your requirements.) If not, it
    is undefined (and incorrect). In neither case is the behaviour defined
    by the C standard, but the behaviour could be defined by something else (library documentation or external definition of "write", or a C
    compiler that specifically says it defines the behaviour of
    dereferencing null pointers).

    If the C standard replaced "the behavior is undefined" with "the
    program is in error, and the subsequent behavior is undefined"
    or something along those lines, the discussion would be much
    muted.


    That sounds like you dislike the "time travel" aspect of C's undefined behaviour. Many would agree with that - they don't like the idea that undefined behaviour later in the program can be used to change the
    behaviour of code earlier on. The C standard considers undefined
    behaviour to be program-wide - if you execute something that has
    undefined behaviour (remembering that this means there is no definition /anywhere/ of what will happen), the whole program is wrong and you
    can't expect anything from it.

    People often find this disturbing. They think perhaps it is fair enough
    that dereferencing a null pointer can crash a program, but it shouldn't
    affect things that came before it.

    However, there are two key points to think about. First, the standards handling of undefined behaviour means that a compiler /can/ use UB to
    change the object code generated for earlier source code, not that it
    /must/ do so. A compiler always balances efficient code generation with ease-of-use and ease-of-debugging. The ideal balance point will depend
    on the programmer writing the code, so compiler flags are used to tune
    it, but surprises can still happen.

    The other point is to consider how the standards could say anything
    else. If the standards required observable behaviour to be completed
    before undefined behaviour occurred, the results would be terrible. Dereferencing a null pointer or dividing by zero could cause a complete
    crash (remember the "Windows for Warships" affair? A single divide by
    zero brought the whole ship network down, leaving it dead in the water
    for hours). That means the compiler would need to make sure any
    volatile writes had hit main memory before reading a pointer. It would
    have to ensure all file stream buffers were flushed to disk before doing
    a division. You can be sure Linus Torvalds would have a thing or two to
    say about such a compiler.

    (Somebody may point out to me that this what the standard is
    actually saying. If so, that would sort of reinforce my argument
    that it should be clearer :-)
    [Fortran has in principle historically allowed rather aggressive optimization, e.g., A*B+A*C can turn into A*(B+C). On the other hand, in the real world, when IBM improved their optimizing compiler Fortran H into Fortran X, the developers said any new optimization had to produce bit identical results
    to what the old compiler did. So this is not a new issue. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Anton Ertl on Sun Jan 9 23:53:52 2022
    On 08/01/2022 18:52, Anton Ertl wrote:
    David Brown <david.brown@hesbynett.no> writes:
    Undefined behaviour, as far as language standards are concerned, are
    omnipresent in programming - for all languages.

    Please prove this astounding assertion. My impression is that managed languages define everything, at least to some extent, and leave
    nothing undefined. If they allowed nasal demons, the appeal of
    managed languages would evaporate instantly.


    Certainly managed languages define far more than unmanaged languages.
    But equally certainly, they do not define everything.

    In Python, I can write :

    x = flooble(123)

    Nowhere in any part of the documentation for Python is a definition of
    what the function "flooble" should do. Calling it is /undefined
    behaviour/ as far as the language standards are concerned.

    Certainly some aspects of calling it - such as the calling convention -
    are defined. What should happen if the function does not exist is
    defined. But the language and the standards do not define the behaviour
    of "flooble".


    Being "undefined behaviour as far as the language standards are
    concerned" does not mean you can get nasal daemons, it means that the
    language standards do not say what will happen. When one says "Division
    by 0 is undefined behaviour in C", that is what is meant - as a compiler
    or a host OS could give you well-defined and predictable behaviour when
    you attempt to divide by 0.

    A managed language may put limits on the kind of effect of undefined
    behaviour. In Python (at least, CPython), it is possible to call
    externally defined functions in shared libraries - even if the Python
    bytecode interpreter limits possible effects of pure Python code,
    calling external functions gets around those limits. I suppose you
    could have a more locked-down managed language that does not allow any
    external code, and has additional tracking on things like data space
    usage, time usage, and other resources to stop run-away code.

    Within such a closed language, you could have defined behaviour for all
    code, since any code run or functions called would be in the same
    language and have their definitions clear to the interpreter.


    Personally, I don't see minimising undefined behaviour as part of the
    appeal of managed languages. I make as much effort not to divide by
    zero or work with invalid references in my Python code as I do in my C
    code - it doesn't much matter if the program stops with Python exception
    or a crash. I use Python for the convenience of working with strings, dictionaries, and other data structures with little concern for memory management, for its libraries, and other high-level features.

    When running unknown code - such as javascript from a website - it is
    vital that the effect of any code is limited. Code may have behaviour
    that is undefined by the language standards, but it will be defined by
    other parts of the code or by its environment (browser, built-in
    libraries, etc.). And while it may crash the javascript program or hang
    the browser, it should never be able to launch nasal daemons.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to David Brown on Mon Jan 10 12:04:02 2022
    David Brown <david.brown@hesbynett.no> schrieb:

    The big question here, is why do you think Fortran is any different? In theory, there isn't a difference - nothing you have said here convinces
    me that there is any fundamental difference between Fortran and C in
    regards to undefined behaviour.

    I am not sure how to better explain it. I will try a bit, but
    this will be my last reply to you in this thread. We seem to have
    a fundamental difference in our understanding, and seem to be
    unable to resolve it.

    (And there's no difference in the
    implementations - the most commonly used Fortran compilers also handle
    C, C++, and perhaps other languages.)

    Sort of.

    At the risk of boring most readers of this group, a very short, but
    (hopefully) pertinent introduction of how modern compilers work:

    A front end translates the source to an abstract syntax tree (which
    you can view with gfortran with -fdump-fortran-original) and from
    that into an intermediate representation (which you can view with
    gfortran, or with gcc in general, with -fdump-tree-original).
    This intermediate representation is then optimized, in
    an architecture-independent way (usually using SSA) and then
    translated into assembler or directly to object code using a
    "back end", of which many compilers also have several.

    An example: The program

    print *,"Hello, world"
    end

    is translated into (code only)

    WRITE UNIT=6 FMT=-1
    TRANSFER 'Hello, world'
    DT_END

    and then, in the intermediate representation.

    MAIN__ ()
    {
    {
    struct __st_parameter_dt dt_parm.0;

    dt_parm.0.common.filename = &"hello.f90"[1]{lb: 1 sz: 1};
    dt_parm.0.common.line = 2;
    dt_parm.0.common.flags = 128;
    dt_parm.0.common.unit = 6;
    _gfortran_st_write (&dt_parm.0);
    _gfortran_transfer_character_write (&dt_parm.0, &"Hello, world"[1]{lb: 1 sz: 1}, 12);
    _gfortran_st_write_done (&dt_parm.0);
    }
    }

    There is no compiler (if you mean a single binary) that handles both
    C and Fortran. They are separate front ends to common middle
    and back ends.

    And there are certainly differences in the code that the front
    ends handle to the middle end, so saying that there is "no
    difference in the implementations" is not correct.

    I see C conflating two separate concepts: Programm errors and
    behavior that is outside the standard. "Undefined behavior is
    always a programming error" does not work; that would make

    #include <unistd.h>
    #include <string.h>

    int main()
    {
    char a[] = "Hello, world!\n";
    write (1, a, strlen(a));
    return 0;
    }


    C does not have a "write" function in the standard library. So the
    behaviour of "write" is not defined by the C standards - but that does
    not mean the behaviour is undefined.

    When interpreting at a language standard, you _must_ follow the
    definitions in the standards if they exist, you cannot use everyday interpretations.

    Subclause 3.4.3 (N2596) defines

    # undefined behavior

    # behavior, upon use of a nonportable or erroneous program
    # construct or of erroneous data, for which this document imposes
    # no requirements

    write() is nonportable and the C standard imposes no requirements
    on it. Therefore, the program above invokes undefined behavior.



    It just means it is defined
    elsewhere, not in the C standards.

    Nope, see above.

    (If you replaced every occurence of "undefined behavior" in the C
    standard with "WRTLPFMFT behavior" and "the behavior is undefined"
    with "the behavior is WRTLPFMFT", the meaning of the standard
    would not change.)
    [It seems like nitpicking here. Yes, the C and POSIX standards are
    different things, but we all know how common it is to use them
    together. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Thomas Koenig on Mon Jan 10 16:58:55 2022
    On Saturday, January 8, 2022 at 10:11:55 AM UTC-8, Thomas Koenig wrote:

    (snip)

    I see C conflating two separate concepts: Programm errors and
    behavior that is outside the standard. "Undefined behavior is
    always a programming error" does not work; that would make

    #include <unistd.h>
    #include <string.h>
    int main()
    {
    char a[] = "Hello, world!\n";
    write (1, a, strlen(a));
    return 0;
    }

    Without the:

    #include <unistd.h>

    I agree that this would be undefined behavior. But with the include file,
    you are agreeing to use whatever standard the include file belongs to.

    The include file defines the arguments to write(), but even more indicates
    that you either supply (in another file), or use an otherwise supplied library defining write().

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Thomas Koenig on Tue Jan 11 18:16:28 2022
    On 10/01/2022 13:04, Thomas Koenig wrote:
    David Brown <david.brown@hesbynett.no> schrieb:

    The big question here, is why do you think Fortran is any different? In
    theory, there isn't a difference - nothing you have said here convinces
    me that there is any fundamental difference between Fortran and C in
    regards to undefined behaviour.

    I am not sure how to better explain it. I will try a bit, but
    this will be my last reply to you in this thread. We seem to have
    a fundamental difference in our understanding, and seem to be
    unable to resolve it.


    Fair enough. Maybe in a future discussion, one of us will have an
    "Aha!" moment and understand the other's viewpoint, and progress will be
    made - until then, there's no point in going around in circles. I'll
    snip bits of your post here, and try to minimise new points (unless I
    get that "Aha!") - but be sure I am reading and appreciating your entire
    post.

    (And there's no difference in the
    implementations - the most commonly used Fortran compilers also handle
    C, C++, and perhaps other languages.)

    Sort of.

    At the risk of boring most readers of this group, a very short, but (hopefully) pertinent introduction of how modern compilers work:


    There is no compiler (if you mean a single binary) that handles both
    C and Fortran. They are separate front ends to common middle
    and back ends.

    Yes. But it is the middle end that handles most of the optimisations, including those based on undefined behaviour. The front end determines
    whether code can have undefined behaviour and in what circumstances.

    C does not have a "write" function in the standard library. So the
    behaviour of "write" is not defined by the C standards - but that does
    not mean the behaviour is undefined.

    When interpreting at a language standard, you _must_ follow the
    definitions in the standards if they exist, you cannot use everyday interpretations.

    Subclause 3.4.3 (N2596) defines

    # undefined behavior

    # behavior, upon use of a nonportable or erroneous program
    # construct or of erroneous data, for which this document imposes
    # no requirements

    write() is nonportable and the C standard imposes no requirements
    on it. Therefore, the program above invokes undefined behavior.

    No. (As always, this is based on my interpretation of the standards -
    consider everything to have "IMHO" attached.) The implementation of
    "write" is outside the scope of the standards, and is therefore
    undefined as far as the standards are concerned. That does not make it undefined behaviour in the program - it just means the standards don't
    say what "write" should do.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to Anton Ertl on Tue Jan 11 16:55:54 2022
    On 2022-01-08, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    David Brown <david.brown@hesbynett.no> writes:
    Undefined behaviour, as far as language standards are concerned, are >>omnipresent in programming - for all languages.

    Please prove this astounding assertion. My impression is that managed languages define everything, at least to some extent, and leave
    nothing undefined. If they allowed nasal demons, the appeal of
    managed languages would evaporate instantly.

    The Lisp-like programming language Scheme has unspecified order of
    argument evaluation. And you can stuff side effects into argument
    expressions, like in C.

    Its built-in imperative have undefined return values.

    ANSI Common Lisp leaves the effects undefined of modifying literals,
    just like C. ANSI Lisp code that perpetrates some kind of error is
    safe only if compiled in safe mode; if you compile with reduced safety,
    e.g. (declare (optimize (safety 0))), then error become undefined
    behavior, including type errors. If you declare that some quantity is
    a fixnum integer, and request safety 0 speed 3, and then it turns
    out that it's other than an integer, woe to that code.
    However, in these cases you're invoking the safety escape hatch;
    it's not like C where you are shackled by chains of undefined behavior
    which make themselves felt every time you squirm.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kaz Kylheku@21:1/5 to David Brown on Tue Jan 11 19:19:31 2022
    On 2022-01-11, David Brown <david.brown@hesbynett.no> wrote:
    On 10/01/2022 13:04, Thomas Koenig wrote:
    David Brown <david.brown@hesbynett.no> schrieb:

    The big question here, is why do you think Fortran is any different? In >>> theory, there isn't a difference - nothing you have said here convinces
    me that there is any fundamental difference between Fortran and C in
    regards to undefined behaviour.

    I am not sure how to better explain it. I will try a bit, but
    this will be my last reply to you in this thread. We seem to have
    a fundamental difference in our understanding, and seem to be
    unable to resolve it.

    Fair enough. Maybe in a future discussion, one of us will have an
    "Aha!" moment and understand the other's viewpoint, and progress will be
    made - until then, there's no point in going around in circles. I'll
    snip bits of your post here, and try to minimise new points (unless I
    get that "Aha!") - but be sure I am reading and appreciating your entire post.

    (And there's no difference in the
    implementations - the most commonly used Fortran compilers also handle
    C, C++, and perhaps other languages.)

    Sort of.

    At the risk of boring most readers of this group, a very short, but
    (hopefully) pertinent introduction of how modern compilers work:


    There is no compiler (if you mean a single binary) that handles both
    C and Fortran. They are separate front ends to common middle
    and back ends.

    Yes. But it is the middle end that handles most of the optimisations, including those based on undefined behaviour. The front end determines whether code can have undefined behaviour and in what circumstances.

    More precisely, optimizations are based on the absence of undefined
    behavior: the assumption that contracts are being upheld.

    More precisely, that contracts are being upheld in the face of the
    inability to determine and diagnose statically whether they are
    violated; i.e. there is a "blind trust". (Though there do exist
    situations in which, in principle, undefined behavior is easily
    deducible at translation time, without a requirement to do so.)

    Front-ends for different languages are written to the respective
    requirements of those languages. Their first aim is to handle
    well-defined constructs and situations. They target the intermediate
    language of the compiler middle. That language has its own contracts.
    The front end for each respective language has to ensure that every
    situation in which behavior is defined (contract is upheld) is
    translated to reliable intermediate code whose contract is upheld.
    Care has to be taken that the intermediate code is expressed in the
    right way so that it will not change behavior in invalid ways due to optimizations.

    This leaves a lot of room for Fortran and C to have entirely different defined/undefined behaviors.

    Even the front end for one single language can have a lot of switches
    affecting what is defined or not.

    Thre could be a switch which says that overflowing integer addition has
    two's complement wrapping behavior. In that case, the compiler then
    selects the intermediate instructions which provide that behavior
    reliably (possibly simulating signed arithmetic with unsigned), and
    also disables any inferences in the front end that might be based on the assumption that overflow has not occurred.

    C does not have a "write" function in the standard library. So the
    behaviour of "write" is not defined by the C standards - but that does
    not mean the behaviour is undefined.

    When interpreting at a language standard, you _must_ follow the
    definitions in the standards if they exist, you cannot use everyday
    interpretations.

    Subclause 3.4.3 (N2596) defines

    # undefined behavior

    # behavior, upon use of a nonportable or erroneous program
    # construct or of erroneous data, for which this document imposes
    # no requirements

    write() is nonportable and the C standard imposes no requirements
    on it. Therefore, the program above invokes undefined behavior.

    No. (As always, this is based on my interpretation of the standards -

    Yes; using any function that is not in the C program, or in the
    standard, is ISO C undefined behavior.

    A program which includes <unistd.h> is not required to compile
    according to ISO C; it can fail with an error message about the
    header not being defined. Or, #include <unistd.h> is allowed, in
    a conforming implementation, to bring in tokens which have nothing
    to do with POSIX.

    Furthermore, a program which calls write, and does not provide such a
    function itself, is not required to successfully link. If it does link,
    there is no requirement that this symbol is a function described by
    POSIX.

    POSIX implementations have to go out of their way to allow C programs
    to use write as an external name, which ISO C allows.

    For instance, the GNU C Library defines write as a weak symbol for
    some identifier which resembles __libc_write: the "strong" symbol.

    The C library internally uses only that __libc_write: it never calls
    write, because user code could replace it:

    int write(char *x) { ... }

    double write = 42.0;

    When the application defines the external name write, the weak symbol
    coming from glibc yields; it is suppressed in favor of the program's definition.

    consider everything to have "IMHO" attached.) The implementation of
    "write" is outside the scope of the standards, and is therefore
    undefined as far as the standards are concerned. That does not make it undefined behaviour in the program - it just means the standards don't
    say what "write" should do.

    Right; it's "ISO C formal undefined behavior", not "behavior that is
    not defined by any party whatsoever" ... though it could well be.

    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From gah4@21:1/5 to Kaz Kylheku on Tue Jan 11 14:18:56 2022
    On Tuesday, January 11, 2022 at 11:47:26 AM UTC-8, Kaz Kylheku wrote:

    (big snip)

    This leaves a lot of room for Fortran and C to have entirely different defined/undefined behaviors.

    Even the front end for one single language can have a lot of switches affecting what is defined or not.

    I suppose so. But more usual, the compiler works to the least
    common denominator.

    For one, C requires static variables, and especially external ones, to initialize to zero, but Fortran doesn't. Fortran compilers that use C
    compiler middle and back ends, tend to zero such variables.

    I suspect that there are many more that I don't know about.
    As long as the cost is small, and it satisfies both standards,
    not much reason not to do it.

    Fortran has stricter rules on aliasing than C. I don't actually know
    about any effect on C programs, though, but it might be that
    compilers do the same for C.

    One that is not C or Fortran, but IEEE 754, is the effect of
    relational operators with NaN. Comparisons with NaN,
    except for "not equal", return false. That means that compilers
    have to be careful optimizing such, and especially that
    "greater than or equal" is not the logical complement of "less than".
    (I haven't looked at how compilers handle this, or, even more,
    how the hardware handles it.)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From George Neuner@21:1/5 to 480-992-1380@kylheku.com on Tue Jan 11 22:01:37 2022
    On Tue, 11 Jan 2022 16:55:54 -0000 (UTC), Kaz Kylheku <480-992-1380@kylheku.com> wrote:

    On 2022-01-08, Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
    David Brown <david.brown@hesbynett.no> writes:
    Undefined behaviour, as far as language standards are concerned, are >>>omnipresent in programming - for all languages.

    Please prove this astounding assertion. My impression is that managed
    languages define everything, at least to some extent, and leave
    nothing undefined. If they allowed nasal demons, the appeal of
    managed languages would evaporate instantly.

    The Lisp-like programming language Scheme has unspecified order of
    argument evaluation. And you can stuff side effects into argument >expressions, like in C.

    In Scheme the order of evaluation for let expressions similarly is
    unspecified.

    There is at least one Scheme which deliberately randomizes the order
    of function argument and let evaluation. And there are parallel
    Schemes which evaluate function arguments and lets in parallel.


    Its built-in imperative have undefined return values.

    ANSI Common Lisp leaves the effects undefined of modifying literals,
    just like C. ANSI Lisp code that perpetrates some kind of error is
    safe only if compiled in safe mode; if you compile with reduced safety,
    e.g. (declare (optimize (safety 0))), then error become undefined
    behavior, including type errors. If you declare that some quantity is
    a fixnum integer, and request safety 0 speed 3, and then it turns
    out that it's other than an integer, woe to that code.
    However, in these cases you're invoking the safety escape hatch;
    it's not like C where you are shackled by chains of undefined behavior
    which make themselves felt every time you squirm.

    And Lisp's optimization settings can be changed per function or per
    compilation unit as well as globally. ["declaim" vs "declare"]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to gah4@u.washington.edu on Wed Jan 12 19:02:48 2022
    gah4 <gah4@u.washington.edu> schrieb:
    On Tuesday, January 11, 2022 at 11:47:26 AM UTC-8, Kaz Kylheku wrote:

    (big snip)

    This leaves a lot of room for Fortran and C to have entirely different
    defined/undefined behaviors.

    Even the front end for one single language can have a lot of switches
    affecting what is defined or not.

    I suppose so. But more usual, the compiler works to the least
    common denominator.

    For one, C requires static variables, and especially external ones, to initialize to zero, but Fortran doesn't. Fortran compilers that use C compiler middle and back ends, tend to zero such variables.

    This is more a matter of operating system and linker conventions
    than of compilers.

    Looking at the ELF standard, one finds

    .bss

    This section holds uninitialized data that contribute to the program's
    memory image. By definition, the system initializes the data with zeros
    when the program begins to run. The section occupies no file space, as indicated by the section type, SHT_NOBITS.

    which, unsurprisingly, matches exactly what C is doing.

    Anybody who writes a Fortran compiler for an ELF system will
    use .bss for COMMOM blocks, because it is easiest. Initialization
    with zeros then happens automatically.

    I suspect that there are many more that I don't know about.
    As long as the cost is small, and it satisfies both standards,
    not much reason not to do it.

    Fortran has stricter rules on aliasing than C. I don't actually know
    about any effect on C programs, though, but it might be that
    compilers do the same for C.

    The rules are different, and unless C is the intermediate language,
    a good compiler will hand the corresponding hints to the middle end.
    [I have used Fortran systems that initialized otherwise undefined data to a value that would
    trap, to help find use-before-set errors. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Thomas Koenig on Thu Jan 13 08:24:32 2022
    On 12/01/2022 20:02, Thomas Koenig wrote:
    gah4 <gah4@u.washington.edu> schrieb:
    On Tuesday, January 11, 2022 at 11:47:26 AM UTC-8, Kaz Kylheku wrote:

    For one, C requires static variables, and especially external ones, to
    initialize to zero, but Fortran doesn't. Fortran compilers that use C
    compiler middle and back ends, tend to zero such variables.

    This is more a matter of operating system and linker conventions
    than of compilers.

    Looking at the ELF standard, one finds

    .bss

    This section holds uninitialized data that contribute to the program's
    memory image. By definition, the system initializes the data with zeros
    when the program begins to run. The section occupies no file space, as indicated by the section type, SHT_NOBITS.

    which, unsurprisingly, matches exactly what C is doing.

    Anybody who writes a Fortran compiler for an ELF system will
    use .bss for COMMOM blocks, because it is easiest. Initialization
    with zeros then happens automatically.

    I was under the impression that FORTRAN compilers typically put data in
    the ".common" section of object files. A key difference between .common
    and .bss is that (with standard linker setup) duplicate symbols in .bss
    are an error, while duplicate symbols in .common are merged. But in C
    startup code, .common is also zeroed (FORTRAN may have different startup
    code here - with no experience of the language, I don't know such details).

    The use of ".common" by C compilers such as gcc was common practice
    precisely to improve compatibility with FORTRAN in the early days, and
    it let people write "int global_x;" in headers and have everything work,
    rather than the correct practice of "extern int global_x;" in headers
    and a single "int global_x;" in one object file. The big disadvantages
    are that if you have "int local_x;" in two files, and don't use static,
    they'll be merged with no error, and if you have "int global_x;" in one
    file and "double global_x;" in another, it's a mess. Modern gcc now
    uses "-fno-common" to avoid this.


    I suspect that there are many more that I don't know about.
    As long as the cost is small, and it satisfies both standards,
    not much reason not to do it.

    Fortran has stricter rules on aliasing than C. I don't actually know
    about any effect on C programs, though, but it might be that
    compilers do the same for C.

    The rules are different, and unless C is the intermediate language,
    a good compiler will hand the corresponding hints to the middle end.

    AFAIUI the difference in aliasing rules is that in FORTRAN, pointer or
    array parameters are assumed not to alias, while in C the compiler must
    assume that they might alias, unless you use "restrict". Are there
    other differences?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas Koenig@21:1/5 to Thomas Koenig on Thu Jan 13 11:17:13 2022
    Thomas Koenig <tkoenig@netcologne.de> schrieb:

    [I have used Fortran systems that initialized otherwise undefined
    data to a value that would trap, to help find use-before-set errors.
    -John]

    That usually is still available, but optional. An short example:

    $ cat a.f90
    program main
    print *,a
    end program main
    $ gfortran -g -ffpe-trap=invalid -finit-real=snan a.f90
    $ ./a.out

    Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

    with a backtrace pointing to the offending line.

    It does not necessarily work on COMMON blocks, though.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)