• Add @ to basic character set?

    From Philipp Klaus Krause@21:1/5 to All on Sat Dec 5 08:58:17 2020
    I wonder if it would make sense to add @ to the basic character set.
    Virtually everyone is using it in comments and strings already anyway
    (for email addresses), and I don't see anything preventing
    implementations from supporting it, as it is available in both ASCII and
    common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From James Kuyper@21:1/5 to Philipp Klaus Krause on Sat Dec 5 10:53:40 2020
    On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:
    I wonder if it would make sense to add @ to the basic character set. Virtually everyone is using it in comments and strings already anyway
    (for email addresses), and I don't see anything preventing
    implementations from supporting it, as it is available in both ASCII and common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html

    '@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
    French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
    national variants, that code point is assigned to some other character.
    With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
    becoming so common place, that is less of a concern than it used to be,
    but it is still something the committee is likely to pay attention to.
    There are other characters that already are part of the C basic
    character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^". However, all of those characters played an important role in C syntax
    long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
    invented to allow those characters to be used on systems that didn't
    support them natively.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to James Kuyper on Sat Dec 5 17:15:24 2020
    On 05/12/2020 16:53, James Kuyper wrote:
    On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:
    I wonder if it would make sense to add @ to the basic character set.
    Virtually everyone is using it in comments and strings already anyway
    (for email addresses), and I don't see anything preventing
    implementations from supporting it, as it is available in both ASCII and
    common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html

    '@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
    French, French Canadian, German, Italian, Spanish, Swedish, and Swiss national variants, that code point is assigned to some other character.
    With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
    becoming so common place, that is less of a concern than it used to be,
    but it is still something the committee is likely to pay attention to.
    There are other characters that already are part of the C basic
    character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^". However, all of those characters played an important role in C syntax
    long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
    invented to allow those characters to be used on systems that didn't
    support them natively.


    @ is used in existing C implementations as an extension feature. In particular, a number of embedded C compilers allow a syntax like
    "uint8_t reg @ 0x1234;" to mean "reg is a uint8_t object located at
    absolute address 0x1234". If people are consistent about using spaces,
    this could easily be solved if @ is made a letter by simply making @
    alone into a keyword. But if those compilers accept "uint8_t
    reg@0x1234", then that fails.

    Another cause for concern is if the symbol is used in identifiers, then
    these could cause trouble for assemblers and/or linkers on some systems.
    (This applies to the common extension of allowing $ as a "letter" in C
    - the gcc manual notes that this is not supported on some targets due to
    the meaning of $ in assembly on those targets.)

    The standards committee are always reluctant to make changes that could interfere with known implementations and existing code, even if the
    conflict is with an implementation-specific extension to the language.

    I also think it makes sense to reserve such symbols for future purposes
    in C or C++ - good punctuation symbols are too useful to waste as
    letters. For example, the proposed "metaclasses" in C++ suggests using
    $ as part of the syntax, which I think is a very good idea. It is
    perhaps more likely that @ would find similar use in future C++ features
    than future C features, but no one would benefit from adding new
    conflicts between those languages.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Philipp Klaus Krause@21:1/5 to All on Sat Dec 5 20:55:24 2020
    Am 05.12.20 um 17:15 schrieb David Brown:

    @ is used in existing C implementations as an extension feature. In particular, a number of embedded C compilers allow a syntax like
    "uint8_t reg @ 0x1234;" to mean "reg is a uint8_t object located at
    absolute address 0x1234". If people are consistent about using spaces,
    this could easily be solved if @ is made a letter by simply making @
    alone into a keyword. But if those compilers accept "uint8_t
    reg@0x1234", then that fails.

    Another cause for concern is if the symbol is used in identifiers, then
    these could cause trouble for assemblers and/or linkers on some systems.
    (This applies to the common extension of allowing $ as a "letter" in C
    - the gcc manual notes that this is not supported on some targets due to
    the meaning of $ in assembly on those targets.)

    The standards committee are always reluctant to make changes that could interfere with known implementations and existing code, even if the
    conflict is with an implementation-specific extension to the language.

    None of these would be a problem for adding it to the basic source
    character set.
    The allowed characters in identifiers are different from the basic
    source character set. E.g. ] is in the basic source character set, but
    not allowed in identifiers.
    By adding it to the basic source character set, we can portably use it
    in comments and string and character literals.

    Philipp

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to James Kuyper on Sat Dec 5 14:17:30 2020
    James Kuyper <jameskuyper@alumni.caltech.edu> writes:
    On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:
    I wonder if it would make sense to add @ to the basic character set.
    Virtually everyone is using it in comments and strings already anyway
    (for email addresses), and I don't see anything preventing
    implementations from supporting it, as it is available in both ASCII and
    common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html

    '@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
    French, French Canadian, German, Italian, Spanish, Swedish, and Swiss national variants, that code point is assigned to some other character.
    With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
    becoming so common place, that is less of a concern than it used to be,
    but it is still something the committee is likely to pay attention to.
    There are other characters that already are part of the C basic
    character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^". However, all of those characters played an important role in C syntax
    long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
    invented to allow those characters to be used on systems that didn't
    support them natively.

    Apparently the C++ committee felt that it was of so little concern that
    they removed trigraphs in C++17. I don't know of any plans to do the
    same in C.

    There are three printable ASCII characters that aren't in C's basic
    character set: '$', '`', and '@'. A guarantee that all three can be
    used in string literals, character constants, and comments could be
    useful. (Most programmers probably already assume they can be.)

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Philips Healthcare
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Francis Glassborow@21:1/5 to Keith Thompson on Sun Dec 6 12:25:42 2020
    On 05/12/2020 22:17, Keith Thompson wrote:
    James Kuyper <jameskuyper@alumni.caltech.edu> writes:
    On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:
    I wonder if it would make sense to add @ to the basic character set.
    Virtually everyone is using it in comments and strings already anyway
    (for email addresses), and I don't see anything preventing
    implementations from supporting it, as it is available in both ASCII and >>> common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html

    '@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
    French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
    national variants, that code point is assigned to some other character.
    With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
    becoming so common place, that is less of a concern than it used to be,
    but it is still something the committee is likely to pay attention to.
    There are other characters that already are part of the C basic
    character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^".
    However, all of those characters played an important role in C syntax
    long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
    invented to allow those characters to be used on systems that didn't
    support them natively.

    Apparently the C++ committee felt that it was of so little concern that
    they removed trigraphs in C++17. I don't know of any plans to do the
    same in C.

    There are three printable ASCII characters that aren't in C's basic
    character set: '$', '`', and '@'. A guarantee that all three can be
    used in string literals, character constants, and comments could be
    useful. (Most programmers probably already assume they can be.)


    1) Trigraphs were proving to be a road-block for C++. In addition they
    are so rarely used (certainly in C++) that many (probably most)
    programmers fail to recognise them. WG14 appears reluctant to remove
    things even when they have no practical use in modern code. The argument
    that they are needed for legacy systems is, I think, very weak;
    compilers will continue to support them where necessary by providing
    legacy code switches.

    2) As one design feature of C is portability it is time that the three characters you mention that are added to the basic character set. I do
    not see how that would have a negative effect on implementations that
    already use them for extensions. Those uses do not (or should not) rely
    on them not being part of the basic character set.

    3) Instead of speculating that their inclusion would cause problems to
    some programmers we need evidence that that is the case. Considering
    that it would be hard to use a modern computer system without having
    both @ and $ available (think mobile and portable computer technology) I
    would be surprised if it would be a serious problem for anyone.

    Just my 2c/p/d

    Francis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Brown@21:1/5 to Francis Glassborow on Sun Dec 6 13:47:40 2020
    On 06/12/2020 13:25, Francis Glassborow wrote:
    On 05/12/2020 22:17, Keith Thompson wrote:
    James Kuyper <jameskuyper@alumni.caltech.edu> writes:
    On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:
    I wonder if it would make sense to add @ to the basic character set.
    Virtually everyone is using it in comments and strings already anyway
    (for email addresses), and I don't see anything preventing
    implementations from supporting it, as it is available in both ASCII
    and
    common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html

    '@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
    French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
    national variants, that code point is assigned to some other character.
    With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
    becoming so common place, that is less of a concern than it used to be,
    but it is still something the committee is likely to pay attention to.
    There are other characters that already are part of the C basic
    character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^".
    However, all of those characters played an important role in C syntax
    long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
    invented to allow those characters to be used on systems that didn't
    support them natively.

    Apparently the C++ committee felt that it was of so little concern that
    they removed trigraphs in C++17.  I don't know of any plans to do the
    same in C.

    There are three printable ASCII characters that aren't in C's basic
    character set: '$', '`', and '@'.  A guarantee that all three can be
    used in string literals, character constants, and comments could be
    useful.  (Most programmers probably already assume they can be.)


    Agreed.


    1) Trigraphs were proving to be a road-block for C++. In addition they
    are so rarely used (certainly in C++) that many (probably most)
    programmers fail to recognise them. WG14 appears reluctant to remove
    things even when they have no practical use in modern code. The argument
    that they are needed for legacy systems is, I think, very weak;
    compilers will continue to support them where necessary by providing
    legacy code switches.


    There is also the difference that C is used on a much wider range of
    systems than C++, especially older systems. C++ is able to drop support
    for odder systems (such as those with more limited character sets, or
    stranger integer representation) simply because it has not been used on
    such systems.

    2) As one design feature of C is portability it is time that the three characters you mention that are added to the  basic character set. I do
    not see how that would have a negative effect on implementations that
    already use them for extensions. Those uses do not (or should not) rely
    on them not being part of the basic character set.


    As long as they are only available (by standard) for using in strings
    and comments, not identifiers, there should be no conflict unless they
    can't be represented in the source (for comments) or execution (for
    string literals) character set of the system. But if these characters
    are supported by the relevant character sets, then in any real-world
    compiler (such as any that support ASCII), they will already be
    supported as extended characters.

    In other words, there is not actually anything significant useful to be
    gained by putting these characters in the basic character set. Equally,
    there is no real risk in doing so. It is purely a hypothetical issue,
    AFAICS. And the C standards committee are not known for spending extra
    effort on something that makes no difference in reality.

    3) Instead of speculating that their inclusion would cause problems to
    some programmers we need evidence that that is the case. Considering
    that it would be hard to use a modern computer system without having
    both @ and $ available (think mobile and portable computer technology) I would be surprised if it would be a serious problem for anyone.

    Just my 2c/p/d

    Francis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to David Brown on Sun Dec 6 08:42:49 2020
    On 12/6/20 7:47 AM, David Brown wrote:
    On 06/12/2020 13:25, Francis Glassborow wrote:
    On 05/12/2020 22:17, Keith Thompson wrote:
    James Kuyper <jameskuyper@alumni.caltech.edu> writes:
    On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:
    I wonder if it would make sense to add @ to the basic character set. >>>>> Virtually everyone is using it in comments and strings already anyway >>>>> (for email addresses), and I don't see anything preventing
    implementations from supporting it, as it is available in both ASCII >>>>> and
    common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html

    '@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
    French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
    national variants, that code point is assigned to some other character. >>>> With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
    becoming so common place, that is less of a concern than it used to be, >>>> but it is still something the committee is likely to pay attention to. >>>> There are other characters that already are part of the C basic
    character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^". >>>> However, all of those characters played an important role in C syntax
    long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
    invented to allow those characters to be used on systems that didn't
    support them natively.

    Apparently the C++ committee felt that it was of so little concern that
    they removed trigraphs in C++17.  I don't know of any plans to do the
    same in C.

    There are three printable ASCII characters that aren't in C's basic
    character set: '$', '`', and '@'.  A guarantee that all three can be
    used in string literals, character constants, and comments could be
    useful.  (Most programmers probably already assume they can be.)


    Agreed.


    1) Trigraphs were proving to be a road-block for C++. In addition they
    are so rarely used (certainly in C++) that many (probably most)
    programmers fail to recognise them. WG14 appears reluctant to remove
    things even when they have no practical use in modern code. The argument
    that they are needed for legacy systems is, I think, very weak;
    compilers will continue to support them where necessary by providing
    legacy code switches.


    There is also the difference that C is used on a much wider range of
    systems than C++, especially older systems. C++ is able to drop support
    for odder systems (such as those with more limited character sets, or stranger integer representation) simply because it has not been used on
    such systems.

    2) As one design feature of C is portability it is time that the three
    characters you mention that are added to the  basic character set. I do
    not see how that would have a negative effect on implementations that
    already use them for extensions. Those uses do not (or should not) rely
    on them not being part of the basic character set.


    As long as they are only available (by standard) for using in strings
    and comments, not identifiers, there should be no conflict unless they
    can't be represented in the source (for comments) or execution (for
    string literals) character set of the system. But if these characters
    are supported by the relevant character sets, then in any real-world
    compiler (such as any that support ASCII), they will already be
    supported as extended characters.

    In other words, there is not actually anything significant useful to be gained by putting these characters in the basic character set. Equally, there is no real risk in doing so. It is purely a hypothetical issue, AFAICS. And the C standards committee are not known for spending extra effort on something that makes no difference in reality.


    The issue with making them part of the basic character set is that it
    makes any system that can't do this, because it uses a strange character
    set, non-conforming. Since systems ARE allowed to add any characters
    they want to the source or execution character set, those that currently support them can do so. Forcing them to be included drops some system
    from being able to have a conforming implementation, and the committee
    has traditionally avoided gratuitously making systems non-conforming.

    The only case that can be made to make them part, is that then programs
    that use those characters might be able to become strictly conforming
    programs instead of just being conforming programs, but strict
    conformance isn't really that big of a deal in practicality, as
    virtually all real programs are going to fail strict performance because
    they are going to depend on some aspect of the environment (Like how I/O actually works)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to Richard Damon on Sun Dec 6 14:07:12 2020
    Richard Damon <Richard@Damon-Family.org> writes:
    [...]
    The issue with making them part of the basic character set is that it
    makes any system that can't do this, because it uses a strange character
    set, non-conforming. Since systems ARE allowed to add any characters
    they want to the source or execution character set, those that currently support them can do so. Forcing them to be included drops some system
    from being able to have a conforming implementation, and the committee
    has traditionally avoided gratuitously making systems non-conforming.

    (Context: The ASCII characters '@', '$', and '`'.)

    I'd be interested in seeing an implementation for which this would
    be relevant. Such an implementation (a) would be unable to (easily)
    represent those three character in source code and/or during
    execution *and* (b) would otherwise conform to the hypothetical
    edition of the C standard that would add them to the basic character
    set if it were not for this change.

    Implementations that can't support those characters are likely to be
    for tiny exotic target systems, and very likely won't be conforming
    anyway, and so could simply ignore the addition of those characters
    to the basic character set.

    The only case that can be made to make them part, is that then programs
    that use those characters might be able to become strictly conforming programs instead of just being conforming programs, but strict
    conformance isn't really that big of a deal in practicality, as
    virtually all real programs are going to fail strict performance because
    they are going to depend on some aspect of the environment (Like how I/O actually works)

    I suppose I agree that it's not that big a deal. Code that uses
    those characters is *practically* 100% portable already, and I haven't
    found a way to coax either gcc or clang to warn about puts("$@`").
    The benefit would be minor, and the cost would be very close to zero
    (unless an implementation as I've described above actually exists).
    It would be one less thing to think about when writing code that's
    intended to be as portable as possible.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Philips Healthcare
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Keith Thompson on Sun Dec 6 17:44:50 2020
    On 12/6/20 5:07 PM, Keith Thompson wrote:
    Richard Damon <Richard@Damon-Family.org> writes:
    [...]
    The issue with making them part of the basic character set is that it
    makes any system that can't do this, because it uses a strange character
    set, non-conforming. Since systems ARE allowed to add any characters
    they want to the source or execution character set, those that currently
    support them can do so. Forcing them to be included drops some system
    from being able to have a conforming implementation, and the committee
    has traditionally avoided gratuitously making systems non-conforming.

    (Context: The ASCII characters '@', '$', and '`'.)

    I'd be interested in seeing an implementation for which this would
    be relevant. Such an implementation (a) would be unable to (easily) represent those three character in source code and/or during
    execution *and* (b) would otherwise conform to the hypothetical
    edition of the C standard that would add them to the basic character
    set if it were not for this change.

    As was mentioned, all that you need is to want to support ISO/IEC 646
    for a naional character set that doesn't define code point 64 as @

    This includes Canadian, French, German, Irish, and a number of others.

    See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.


    Implementations that can't support those characters are likely to be
    for tiny exotic target systems, and very likely won't be conforming
    anyway, and so could simply ignore the addition of those characters
    to the basic character set.

    The only case that can be made to make them part, is that then programs
    that use those characters might be able to become strictly conforming
    programs instead of just being conforming programs, but strict
    conformance isn't really that big of a deal in practicality, as
    virtually all real programs are going to fail strict performance because
    they are going to depend on some aspect of the environment (Like how I/O
    actually works)

    I suppose I agree that it's not that big a deal. Code that uses
    those characters is *practically* 100% portable already, and I haven't
    found a way to coax either gcc or clang to warn about puts("$@`").
    The benefit would be minor, and the cost would be very close to zero
    (unless an implementation as I've described above actually exists).
    It would be one less thing to think about when writing code that's
    intended to be as portable as possible.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to Richard Damon on Sun Dec 6 15:49:36 2020
    Richard Damon <Richard@Damon-Family.org> writes:
    On 12/6/20 5:07 PM, Keith Thompson wrote:
    Richard Damon <Richard@Damon-Family.org> writes:
    [...]
    The issue with making them part of the basic character set is that it
    makes any system that can't do this, because it uses a strange character >>> set, non-conforming. Since systems ARE allowed to add any characters
    they want to the source or execution character set, those that currently >>> support them can do so. Forcing them to be included drops some system
    from being able to have a conforming implementation, and the committee
    has traditionally avoided gratuitously making systems non-conforming.

    (Context: The ASCII characters '@', '$', and '`'.)

    I'd be interested in seeing an implementation for which this would
    be relevant. Such an implementation (a) would be unable to (easily)
    represent those three character in source code and/or during
    execution *and* (b) would otherwise conform to the hypothetical
    edition of the C standard that would add them to the basic character
    set if it were not for this change.

    As was mentioned, all that you need is to want to support ISO/IEC 646
    for a naional character set that doesn't define code point 64 as @

    This includes Canadian, French, German, Irish, and a number of others.

    See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.

    What C implementations support those character sets (and are likely to
    attempt to conform to a future C standard that adds '@' to the basic
    character set)?

    The following characters are also not part of the invariant character
    set: # [ \ ] ^ { | } ~ (We have trigraphs for those. I *do not*
    suggest adding trigraphs for @ $ `.)

    C++ has already dropped trigraphs because support for the old 7-bit
    national character sets was considered unimportant. (But C++17
    did not add @ $ ` to its basic character set.) I understand that
    C has different issues than C++, but in my opinion adding @ $
    ` to C's basic character set would cause no actual harm.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Philips Healthcare
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Philipp Klaus Krause@21:1/5 to All on Mon Dec 7 09:30:09 2020
    Am 06.12.20 um 23:07 schrieb Keith Thompson:


    Implementations that can't support […] are likely to be
    for tiny exotic target systems,
    I made that mistake before, with N2576. Spoiler: ctype.h would be hard
    to provide for freestanding implementations targeting IBM mainframes.

    I don't expect @ $ ` to be a problem for tiny targets. But I am not
    familiar with IBm mainframes using EBCDIC and I am not familiar with
    weird character sets that might still be in use in parts of Asia.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Philipp Klaus Krause@21:1/5 to All on Mon Dec 7 09:17:37 2020
    Am 05.12.20 um 23:17 schrieb Keith Thompson:


    There are three printable ASCII characters that aren't in C's basic
    character set: '$', '`', and '@'. A guarantee that all three can be
    used in string literals, character constants, and comments could be
    useful. (Most programmers probably already assume they can be.)


    ` is a bit different from the other two: Some EBCDIC code pages that
    contain $ and @ do not contain it, e.g. codepage 410 Cyrillic. AFAIK,
    one can currently write the basic character set (with use of digraphs
    for { and }) in EBCDIC codepage 410, which would no longer be possible
    when ` gets added.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Philipp Klaus Krause@21:1/5 to All on Mon Dec 7 09:31:38 2020
    I think that as this point, WG14 mostly thinks that adding trigraphs was
    a mistake. But a mistake tat can't be undone.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Keith Thompson on Mon Dec 7 07:24:32 2020
    On 12/6/20 6:49 PM, Keith Thompson wrote:
    Richard Damon <Richard@Damon-Family.org> writes:
    On 12/6/20 5:07 PM, Keith Thompson wrote:
    Richard Damon <Richard@Damon-Family.org> writes:
    [...]
    The issue with making them part of the basic character set is that it
    makes any system that can't do this, because it uses a strange character >>>> set, non-conforming. Since systems ARE allowed to add any characters
    they want to the source or execution character set, those that currently >>>> support them can do so. Forcing them to be included drops some system
    from being able to have a conforming implementation, and the committee >>>> has traditionally avoided gratuitously making systems non-conforming.

    (Context: The ASCII characters '@', '$', and '`'.)

    I'd be interested in seeing an implementation for which this would
    be relevant. Such an implementation (a) would be unable to (easily)
    represent those three character in source code and/or during
    execution *and* (b) would otherwise conform to the hypothetical
    edition of the C standard that would add them to the basic character
    set if it were not for this change.

    As was mentioned, all that you need is to want to support ISO/IEC 646
    for a naional character set that doesn't define code point 64 as @

    This includes Canadian, French, German, Irish, and a number of others.

    See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.

    What C implementations support those character sets (and are likely to attempt to conform to a future C standard that adds '@' to the basic character set)?

    gcc (and many others) with the right choice of file encoding options.
    The key point here is that this change would be telling a number of
    national bodies that their whole national character set (and thus in
    some respects their language) will no longer be supported.


    The following characters are also not part of the invariant character
    set: # [ \ ] ^ { | } ~ (We have trigraphs for those. I *do not*
    suggest adding trigraphs for @ $ `.)

    The issue is that trigraphs were created to solve the issue of the
    character set, and was adoptted precisely so that those national bodies
    would allow the C language to become a standard. They were done a bit
    hastily, and it shows, but did provide a solution that satisfied the
    national bodies requesting a solution.


    C++ has already dropped trigraphs because support for the old 7-bit
    national character sets was considered unimportant. (But C++17
    did not add @ $ ` to its basic character set.) I understand that
    C has different issues than C++, but in my opinion adding @ $
    ` to C's basic character set would cause no actual harm.


    C++ has historically been much less concerned about backwards
    compatibility issues.

    I will add that while you think it causes little harm (besides making
    programs stored in these national character sets perhaps no longer
    conforming), it also adds little benefit. As has been pointed out,
    basically all existing implementations include them in the extended
    character set (when the encoding has those characters), so there is no
    change to the programmer as far as allowing them in comments or strings,
    which would be their only universal usage.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas David Rivers@21:1/5 to Philipp Klaus Krause on Sun Dec 6 16:11:14 2020
    Philipp Klaus Krause wrote:

    I wonder if it would make sense to add @ to the basic character set. >Virtually everyone is using it in comments and strings already anyway
    (for email addresses), and I don't see anything preventing
    implementations from supporting it, as it is available in both ASCII and >common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html


    Just to add to the "used as an extension" list of compilers; the Dignus compilers (and the SAS/C compilers) for the mainframe use @ to be similar
    to &, except that it can accept an rvalue. If an rvalue is present
    after a @, then the address of a copy is generated. The copy is
    declared within
    the inner-most scope.

    This is helpful in some situations on the mainframe where pass-by-reference
    is the norm, as in:

    FOO(@1, @2);

    (where FOO is defined in some other language, e.g. PL/I, where the
    parameters
    are pass-by-reference.)

    - Dave R. -

    --
    rivers@dignus.com Work: (919) 676-0847
    Get your mainframe programming tools at http://www.dignus.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to Thomas David Rivers on Mon Dec 7 12:19:48 2020
    Thomas David Rivers <rivers@dignus.com> writes:
    Philipp Klaus Krause wrote:

    I wonder if it would make sense to add @ to the basic character set. >>Virtually everyone is using it in comments and strings already anyway
    (for email addresses), and I don't see anything preventing
    implementations from supporting it, as it is available in both ASCII and >>common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html

    Just to add to the "used as an extension" list of compilers; the Dignus compilers (and the SAS/C compilers) for the mainframe use @ to be similar
    to &, except that it can accept an rvalue. If an rvalue is present
    after a @, then the address of a copy is generated. The copy is
    declared within
    the inner-most scope.

    This is helpful in some situations on the mainframe where pass-by-reference is the norm, as in:

    FOO(@1, @2);

    (where FOO is defined in some other language, e.g. PL/I, where the
    parameters
    are pass-by-reference.)

    You can do the same thing with a compound literal starting in C99:

    #include <stdio.h>

    void FOO(int *a, int *b) {
    printf("%d %d\n", *a, *b);
    }

    int main(void) {
    FOO(&(int){1}, &(int){2});
    }

    I suspect the extension predates compound literals.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Philips Healthcare
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to Richard Damon on Mon Dec 7 13:10:59 2020
    Richard Damon <Richard@Damon-Family.org> writes:
    On 12/7/20 3:16 PM, Keith Thompson wrote:
    Richard Damon <Richard@Damon-Family.org> writes:
    On 12/6/20 6:49 PM, Keith Thompson wrote:
    Richard Damon <Richard@Damon-Family.org> writes:
    On 12/6/20 5:07 PM, Keith Thompson wrote:
    Richard Damon <Richard@Damon-Family.org> writes:
    [...]
    The issue with making them part of the basic character set is that it >>>>>>> makes any system that can't do this, because it uses a strange character
    set, non-conforming. Since systems ARE allowed to add any characters >>>>>>> they want to the source or execution character set, those that currently
    support them can do so. Forcing them to be included drops some system >>>>>>> from being able to have a conforming implementation, and the committee >>>>>>> has traditionally avoided gratuitously making systems non-conforming. >>>>>>
    (Context: The ASCII characters '@', '$', and '`'.)

    I'd be interested in seeing an implementation for which this would >>>>>> be relevant. Such an implementation (a) would be unable to (easily) >>>>>> represent those three character in source code and/or during
    execution *and* (b) would otherwise conform to the hypothetical
    edition of the C standard that would add them to the basic character >>>>>> set if it were not for this change.

    As was mentioned, all that you need is to want to support ISO/IEC 646 >>>>> for a naional character set that doesn't define code point 64 as @

    This includes Canadian, French, German, Irish, and a number of others. >>>>>
    See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.

    What C implementations support those character sets (and are likely to >>>> attempt to conform to a future C standard that adds '@' to the basic
    character set)?

    gcc (and many others) with the right choice of file encoding options.
    The key point here is that this change would be telling a number of
    national bodies that their whole national character set (and thus in
    some respects their language) will no longer be supported.

    OK. Can you explain precisely how to invoke gcc with the right choice
    of file encoding options? I've found this option in the gcc manual:

    '-finput-charset=CHARSET'
    Set the input character set, used for translation from the
    character set of the input file to the source character set used by
    GCC. If the locale does not specify, or GCC cannot get this
    information from the locale, the default is UTF-8. This can be
    overridden by either the locale or this command-line option.
    Currently the command-line option takes precedence if there's a
    conflict. CHARSET can be any encoding supported by the system's
    'iconv' library routine.

    but I had never used it.

    I just used "iconv -l" to get what I presume is a list of valid CHARSET
    values (there are over 1000 of them), which led me to this:

    gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c

    With this source file:

    #include <stdio.h>
    int main(void) {
    puts("$@`");
    }

    it produced a cascade of errors, starting with:

    In file included from <command-line>:31:
    /usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
    18 | #ifndef _STDC_PREDEF_H
    | ^

    It looks like something translated the # character to \302 (0xc2).
    I have no idea why. (And it didn't complain about "$@`".)

    If there's a way to invoke gcc telling it to use a character set that
    doesn't include those characters, that would be a good refutation
    to my point. If doing so is actually useful in some contexts,
    it would be an even better refutation. So far I'm not convinced,
    but I'm prepared to be.

    My impression is that the old 7-bit national character sets are
    no longer relevant, and that dropping support for them in the
    C standard (more precisely, updating the C standard in a manner
    that's inconsistent with those character sets) would be very nearly
    harmless. I'm looking for evidence that that's not the case.

    [...]


    One problem is that file is NOT compatible with ISO646-FR as the '#' character in it would not be a HashTag (or Pound Sign), but would be the character £ which is illegal in C. It is one of the encodings that NEEDS
    the trigraphs or digraphs in the files to use C.

    The first file it complains about, /usr/include/stdc-predef.h,
    is part of the implementation (specifically part of glibc).
    Either the implementation doesn't support ISO646-FR, or there's
    some configuration I would need to perform to make it support it.

    I'd still be interested in seeing an existing implementation that
    does support ISO646-FR or something similar, and that would become non-conforming if '@' were made part of the basic character set.

    I recognize that the burden of proof is on any proposal to make a
    change to the standard, but so far I've seen no evidence that such a
    change would actually break anything (at least anything that isn't
    already broken).

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Philips Healthcare
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to Richard Damon on Mon Dec 7 12:16:14 2020
    Richard Damon <Richard@Damon-Family.org> writes:
    On 12/6/20 6:49 PM, Keith Thompson wrote:
    Richard Damon <Richard@Damon-Family.org> writes:
    On 12/6/20 5:07 PM, Keith Thompson wrote:
    Richard Damon <Richard@Damon-Family.org> writes:
    [...]
    The issue with making them part of the basic character set is that it >>>>> makes any system that can't do this, because it uses a strange character >>>>> set, non-conforming. Since systems ARE allowed to add any characters >>>>> they want to the source or execution character set, those that currently >>>>> support them can do so. Forcing them to be included drops some system >>>>> from being able to have a conforming implementation, and the committee >>>>> has traditionally avoided gratuitously making systems non-conforming. >>>>
    (Context: The ASCII characters '@', '$', and '`'.)

    I'd be interested in seeing an implementation for which this would
    be relevant. Such an implementation (a) would be unable to (easily)
    represent those three character in source code and/or during
    execution *and* (b) would otherwise conform to the hypothetical
    edition of the C standard that would add them to the basic character
    set if it were not for this change.

    As was mentioned, all that you need is to want to support ISO/IEC 646
    for a naional character set that doesn't define code point 64 as @

    This includes Canadian, French, German, Irish, and a number of others.

    See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.

    What C implementations support those character sets (and are likely to
    attempt to conform to a future C standard that adds '@' to the basic
    character set)?

    gcc (and many others) with the right choice of file encoding options.
    The key point here is that this change would be telling a number of
    national bodies that their whole national character set (and thus in
    some respects their language) will no longer be supported.

    OK. Can you explain precisely how to invoke gcc with the right choice
    of file encoding options? I've found this option in the gcc manual:

    '-finput-charset=CHARSET'
    Set the input character set, used for translation from the
    character set of the input file to the source character set used by
    GCC. If the locale does not specify, or GCC cannot get this
    information from the locale, the default is UTF-8. This can be
    overridden by either the locale or this command-line option.
    Currently the command-line option takes precedence if there's a
    conflict. CHARSET can be any encoding supported by the system's
    'iconv' library routine.

    but I had never used it.

    I just used "iconv -l" to get what I presume is a list of valid CHARSET
    values (there are over 1000 of them), which led me to this:

    gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c

    With this source file:

    #include <stdio.h>
    int main(void) {
    puts("$@`");
    }

    it produced a cascade of errors, starting with:

    In file included from <command-line>:31:
    /usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
    18 | #ifndef _STDC_PREDEF_H
    | ^

    It looks like something translated the # character to \302 (0xc2).
    I have no idea why. (And it didn't complain about "$@`".)

    If there's a way to invoke gcc telling it to use a character set that
    doesn't include those characters, that would be a good refutation
    to my point. If doing so is actually useful in some contexts,
    it would be an even better refutation. So far I'm not convinced,
    but I'm prepared to be.

    My impression is that the old 7-bit national character sets are
    no longer relevant, and that dropping support for them in the
    C standard (more precisely, updating the C standard in a manner
    that's inconsistent with those character sets) would be very nearly
    harmless. I'm looking for evidence that that's not the case.

    [...]

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Philips Healthcare
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Keith Thompson on Mon Dec 7 15:51:34 2020
    On 12/7/20 3:16 PM, Keith Thompson wrote:
    Richard Damon <Richard@Damon-Family.org> writes:
    On 12/6/20 6:49 PM, Keith Thompson wrote:
    Richard Damon <Richard@Damon-Family.org> writes:
    On 12/6/20 5:07 PM, Keith Thompson wrote:
    Richard Damon <Richard@Damon-Family.org> writes:
    [...]
    The issue with making them part of the basic character set is that it >>>>>> makes any system that can't do this, because it uses a strange character >>>>>> set, non-conforming. Since systems ARE allowed to add any characters >>>>>> they want to the source or execution character set, those that currently >>>>>> support them can do so. Forcing them to be included drops some system >>>>>> from being able to have a conforming implementation, and the committee >>>>>> has traditionally avoided gratuitously making systems non-conforming. >>>>>
    (Context: The ASCII characters '@', '$', and '`'.)

    I'd be interested in seeing an implementation for which this would
    be relevant. Such an implementation (a) would be unable to (easily) >>>>> represent those three character in source code and/or during
    execution *and* (b) would otherwise conform to the hypothetical
    edition of the C standard that would add them to the basic character >>>>> set if it were not for this change.

    As was mentioned, all that you need is to want to support ISO/IEC 646
    for a naional character set that doesn't define code point 64 as @

    This includes Canadian, French, German, Irish, and a number of others. >>>>
    See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.

    What C implementations support those character sets (and are likely to
    attempt to conform to a future C standard that adds '@' to the basic
    character set)?

    gcc (and many others) with the right choice of file encoding options.
    The key point here is that this change would be telling a number of
    national bodies that their whole national character set (and thus in
    some respects their language) will no longer be supported.

    OK. Can you explain precisely how to invoke gcc with the right choice
    of file encoding options? I've found this option in the gcc manual:

    '-finput-charset=CHARSET'
    Set the input character set, used for translation from the
    character set of the input file to the source character set used by
    GCC. If the locale does not specify, or GCC cannot get this
    information from the locale, the default is UTF-8. This can be
    overridden by either the locale or this command-line option.
    Currently the command-line option takes precedence if there's a
    conflict. CHARSET can be any encoding supported by the system's
    'iconv' library routine.

    but I had never used it.

    I just used "iconv -l" to get what I presume is a list of valid CHARSET values (there are over 1000 of them), which led me to this:

    gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c

    With this source file:

    #include <stdio.h>
    int main(void) {
    puts("$@`");
    }

    it produced a cascade of errors, starting with:

    In file included from <command-line>:31:
    /usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
    18 | #ifndef _STDC_PREDEF_H
    | ^

    It looks like something translated the # character to \302 (0xc2).
    I have no idea why. (And it didn't complain about "$@`".)

    If there's a way to invoke gcc telling it to use a character set that
    doesn't include those characters, that would be a good refutation
    to my point. If doing so is actually useful in some contexts,
    it would be an even better refutation. So far I'm not convinced,
    but I'm prepared to be.

    My impression is that the old 7-bit national character sets are
    no longer relevant, and that dropping support for them in the
    C standard (more precisely, updating the C standard in a manner
    that's inconsistent with those character sets) would be very nearly
    harmless. I'm looking for evidence that that's not the case.

    [...]


    One problem is that file is NOT compatible with ISO646-FR as the '#'
    character in it would not be a HashTag (or Pound Sign), but would be the character £ which is illegal in C. It is one of the encodings that NEEDS
    the trigraphs or digraphs in the files to use C.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Schwab@21:1/5 to Keith Thompson on Mon Dec 7 23:08:13 2020
    On Dez 07 2020, Keith Thompson wrote:

    I just used "iconv -l" to get what I presume is a list of valid CHARSET values (there are over 1000 of them), which led me to this:

    gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c

    With this source file:

    #include <stdio.h>
    int main(void) {
    puts("$@`");
    }

    it produced a cascade of errors, starting with:

    In file included from <command-line>:31:
    /usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
    18 | #ifndef _STDC_PREDEF_H
    | ^

    It looks like something translated the # character to \302 (0xc2).
    I have no idea why. (And it didn't complain about "$@`".)

    That is the first byte of the UTF-8 representation of <U00A3>, which is
    what 0x23 translates to in ISO646-FR.

    Andreas.

    --
    Andreas Schwab, schwab@linux-m68k.org
    GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
    "And now for something completely different."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andreas Schwab@21:1/5 to Keith Thompson on Mon Dec 7 23:52:05 2020
    On Dez 07 2020, Keith Thompson wrote:

    The first file it complains about, /usr/include/stdc-predef.h,
    is part of the implementation (specifically part of glibc).
    Either the implementation doesn't support ISO646-FR, or there's
    some configuration I would need to perform to make it support it.

    The system files are encoded in UTF-8, so if you want to use them in a ISO646-FR context, you have to convert them first.

    Andreas.

    --
    Andreas Schwab, schwab@linux-m68k.org
    GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
    "And now for something completely different."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to Andreas Schwab on Mon Dec 7 15:27:07 2020
    Andreas Schwab <schwab@linux-m68k.org> writes:
    On Dez 07 2020, Keith Thompson wrote:
    The first file it complains about, /usr/include/stdc-predef.h,
    is part of the implementation (specifically part of glibc).
    Either the implementation doesn't support ISO646-FR, or there's
    some configuration I would need to perform to make it support it.

    The system files are encoded in UTF-8, so if you want to use them in a ISO646-FR context, you have to convert them first.

    I suppose that would work (and would break the implementation for my
    normal use).

    That's not a reasonable thing to expect a user to do. If that's the
    simplest way to get the implementation to support ISO646-FR, then I'd
    say the implementation doesn't support ISO646-FR.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Philips Healthcare
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Keith Thompson on Mon Dec 7 18:54:02 2020
    On 12/7/20 6:27 PM, Keith Thompson wrote:
    Andreas Schwab <schwab@linux-m68k.org> writes:
    On Dez 07 2020, Keith Thompson wrote:
    The first file it complains about, /usr/include/stdc-predef.h,
    is part of the implementation (specifically part of glibc).
    Either the implementation doesn't support ISO646-FR, or there's
    some configuration I would need to perform to make it support it.

    The system files are encoded in UTF-8, so if you want to use them in a
    ISO646-FR context, you have to convert them first.

    I suppose that would work (and would break the implementation for my
    normal use).

    That's not a reasonable thing to expect a user to do. If that's the
    simplest way to get the implementation to support ISO646-FR, then I'd
    say the implementation doesn't support ISO646-FR.


    Actually, unless the files use characteres outside the basic set, all
    that it requires is encoding the problematic characters as trigraphs or digraphs, which will work for all users.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to Richard Damon on Mon Dec 7 16:10:02 2020
    Richard Damon <Richard@Damon-Family.org> writes:
    On 12/7/20 6:27 PM, Keith Thompson wrote:
    Andreas Schwab <schwab@linux-m68k.org> writes:
    On Dez 07 2020, Keith Thompson wrote:
    The first file it complains about, /usr/include/stdc-predef.h,
    is part of the implementation (specifically part of glibc).
    Either the implementation doesn't support ISO646-FR, or there's
    some configuration I would need to perform to make it support it.

    The system files are encoded in UTF-8, so if you want to use them in a
    ISO646-FR context, you have to convert them first.

    I suppose that would work (and would break the implementation for my
    normal use).

    That's not a reasonable thing to expect a user to do. If that's the
    simplest way to get the implementation to support ISO646-FR, then I'd
    say the implementation doesn't support ISO646-FR.

    Actually, unless the files use characteres outside the basic set, all
    that it requires is encoding the problematic characters as trigraphs or digraphs, which will work for all users.

    Yes, that could work too. (It would break the implementation for
    modes in which trigraphs are disabled, but since such modes are
    non-conforming it's not relevant to the current disucssion.)

    Still, it doesn't provide what I'm looking for: an example of an
    existing real-world conforming implementation that would not be
    conforming if '@' et al were added to the basic character set.

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Philips Healthcare
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Damon@21:1/5 to Andreas Schwab on Mon Dec 7 18:31:03 2020
    On 12/7/20 5:52 PM, Andreas Schwab wrote:
    On Dez 07 2020, Keith Thompson wrote:

    The first file it complains about, /usr/include/stdc-predef.h,
    is part of the implementation (specifically part of glibc).
    Either the implementation doesn't support ISO646-FR, or there's
    some configuration I would need to perform to make it support it.

    The system files are encoded in UTF-8, so if you want to use them in a ISO646-FR context, you have to convert them first.

    Andreas.


    It is perhaps a weakness in GCC that is seems that there is just one
    global file encoding parameter, so you need different versions of them
    for each encoding of your source files. I think you can make parrallel directories and change the system directory path for eacn.

    Now, likely your don't REALLY need all those different copies, as you
    could make just one copy for the ones that are missing any of the needed characters and replace them with trigraph or digraph encodings.

    You could of course just always use that encoded file, but that version
    would be less readable to those using the more 'normal' character sets.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas David Rivers@21:1/5 to Keith Thompson on Mon Dec 7 17:02:15 2020
    Keith Thompson wrote:

    Thomas David Rivers <rivers@dignus.com> writes:


    Philipp Klaus Krause wrote:



    I wonder if it would make sense to add @ to the basic character set. >>>Virtually everyone is using it in comments and strings already anyway >>>(for email addresses), and I don't see anything preventing >>>implementations from supporting it, as it is available in both ASCII and >>>common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html



    Just to add to the "used as an extension" list of compilers; the Dignus >>compilers (and the SAS/C compilers) for the mainframe use @ to be similar >>to &, except that it can accept an rvalue. If an rvalue is present
    after a @, then the address of a copy is generated. The copy is
    declared within
    the inner-most scope.

    This is helpful in some situations on the mainframe where pass-by-reference >>is the norm, as in:

    FOO(@1, @2);

    (where FOO is defined in some other language, e.g. PL/I, where the >>parameters
    are pass-by-reference.)



    You can do the same thing with a compound literal starting in C99:

    #include <stdio.h>

    void FOO(int *a, int *b) {
    printf("%d %d\n", *a, *b);
    }

    int main(void) {
    FOO(&(int){1}, &(int){2});
    }

    I suspect the extension predates compound literals.



    Yep - this extension predates those.

    And - very clever use of them! It certainly does what someone would need
    in this situation.

    - Dave R. -



    --
    rivers@dignus.com Work: (919) 676-0847
    Get your mainframe programming tools at http://www.dignus.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Philipp Klaus Krause@21:1/5 to All on Thu Mar 11 22:50:26 2021
    Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause:
    I wonder if it would make sense to add @ to the basic character set. Virtually everyone is using it in comments and strings already anyway
    (for email addresses), and I don't see anything preventing
    implementations from supporting it, as it is available in both ASCII and common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html


    After some discussion and thought, IMO, the way forward is to add @ to
    the source and execution character sets, but not the basic source
    character set:

    http://www.colecovision.eu/stuff/proposal-@.html

    Do you think this proposal makes sense as is? If yes, do you have a
    preference for adding them as single bytes vs. not specifying if they
    are single bytes? If yes, why?

    Philipp

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Keith Thompson@21:1/5 to Philipp Klaus Krause on Thu Mar 11 15:40:28 2021
    Philipp Klaus Krause <pkk@spth.de> writes:
    Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause:
    I wonder if it would make sense to add @ to the basic character set.
    Virtually everyone is using it in comments and strings already anyway
    (for email addresses), and I don't see anything preventing
    implementations from supporting it, as it is available in both ASCII and
    common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html

    After some discussion and thought, IMO, the way forward is to add @ to
    the source and execution character sets, but not the basic source
    character set:

    http://www.colecovision.eu/stuff/proposal-@.html

    Do you think this proposal makes sense as is? If yes, do you have a preference for adding them as single bytes vs. not specifying if they
    are single bytes? If yes, why?

    It's not *necesary*, but I wouldn't object to it.

    If this change is going to be made, I'd advocate also adding $
    (mentioned in the proposal) and ` (not mentioned). None of @,
    $, and ` are required for any C tokens, but many implementations
    allow $ in identifiers. @, $, and ` are the only ASCII characters
    that are not part of the C basic character sets. All are commonly
    used in character constants and string literals. (`, backtick,
    is used in Markdown and some other languages.)

    The *basic* characters are those that are required for all
    implementations. The set of *extended* characters is
    implementation-defined, and may be empty. The @, $, and ` characters
    are extended characters in most or all current implementations. If @, $,
    and ` are going to be required, I think they should be in the basic
    character set. That's the point of the distinction between basic and
    extended characters.

    Both ASCII and the EBCDIC code pages that support them represent
    all these characters in one byte. Their representations should be
    required to fit in a byte, since that already applies to all the
    other basic characters; allowing them to be multi-byte wouldn't
    help portability and would add complexity.

    The vast majority of implementations already conform to this proposal,
    except perhaps for a minor documentation update.

    The only reasons I can think of *not* to make this change are (a) *any*
    change to the standard needs to justify the work needed to make the
    change and this one isn't really necessary, and (b) apparently some
    EBCDIC codepages don't support all these characters. If the latter
    affects any actual implementations, the could pick some other printable characters to stand in (similar things have been done in the past for
    old ASCII variants).

    --
    Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
    Working, but not speaking, for Philips Healthcare
    void Void(void) { Void(); } /* The recursive call of the void */

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Philipp Klaus Krause@21:1/5 to All on Fri Mar 12 15:25:25 2021
    Am 12.03.21 um 00:40 schrieb Keith Thompson:
    Philipp Klaus Krause <pkk@spth.de> writes:
    Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause:
    I wonder if it would make sense to add @ to the basic character set.
    Virtually everyone is using it in comments and strings already anyway
    (for email addresses), and I don't see anything preventing
    implementations from supporting it, as it is available in both ASCII and >>> common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html

    After some discussion and thought, IMO, the way forward is to add @ to
    the source and execution character sets, but not the basic source
    character set:

    http://www.colecovision.eu/stuff/proposal-@.html

    Do you think this proposal makes sense as is? If yes, do you have a
    preference for adding them as single bytes vs. not specifying if they
    are single bytes? If yes, why?

    It's not *necesary*, but I wouldn't object to it.

    If this change is going to be made, I'd advocate also adding $
    (mentioned in the proposal) and ` (not mentioned). None of @,
    $, and ` are required for any C tokens, but many implementations
    allow $ in identifiers. @, $, and ` are the only ASCII characters
    that are not part of the C basic character sets. All are commonly
    used in character constants and string literals. (`, backtick,
    is used in Markdown and some other languages.)

    ` makes sense. However, I don't know if WG14 wants it, so I'd make that
    a separate question in the same paper.


    The *basic* characters are those that are required for all
    implementations. The set of *extended* characters is
    implementation-defined, and may be empty. The @, $, and ` characters
    are extended characters in most or all current implementations. If @, $,
    and ` are going to be required, I think they should be in the basic
    character set. That's the point of the distinction between basic and extended characters.

    On the other hand, currently, using universal character names for
    characters in the basic source character set is not allowed, so moving characters into the basic source character set can actually break things.

    Also, there is undefined behaviour when a character outside the basic
    source character set is encountered in a source file, except in an
    identifier, a character constant, a string literal, a header name, a
    comment, or a preprocessing token that is never converted to a token.
    Since some implementations use @ and $ for special purposes, is makes
    sense to keep this undefined behaviour.

    Philipp

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tim Rentsch@21:1/5 to Philipp Klaus Krause on Sat Jul 10 08:46:23 2021
    Philipp Klaus Krause <pkk@spth.de> writes:

    Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause:

    I wonder if it would make sense to add @ to the basic character set.
    Virtually everyone is using it in comments and strings already anyway
    (for email addresses), and I don't see anything preventing
    implementations from supporting it, as it is available in both ASCII and
    common EBCDIC code pages:

    http://www.colecovision.eu/stuff/proposal-basic-@.html

    After some discussion and thought, IMO, the way forward is to add @ to
    the source and execution character sets, but not the basic source
    character set:

    http://www.colecovision.eu/stuff/proposal-@.html

    Do you think this proposal makes sense as is? If yes, do you have a preference for adding them as single bytes vs. not specifying if they
    are single bytes? If yes, why?

    I would vote against the proposal, because it does nothing useful.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)