Forum: >>> Magnum BBS <<<

Add @ to basic character set?

From Philipp Klaus Krause@21:1/5 to All on Sat Dec 5 08:58:17 2020

I wonder if it would make sense to add @ to the basic character set.
Virtually everyone is using it in comments and strings already anyway
(for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII and
common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From James Kuyper@21:1/5 to Philipp Klaus Krause on Sat Dec 5 10:53:40 2020

On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:

I wonder if it would make sense to add @ to the basic character set. Virtually everyone is using it in comments and strings already anyway
(for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII and common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

'@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
national variants, that code point is assigned to some other character.
With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
becoming so common place, that is less of a concern than it used to be,
but it is still something the committee is likely to pay attention to.
There are other characters that already are part of the C basic
character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^". However, all of those characters played an important role in C syntax
long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
invented to allow those characters to be used on systems that didn't
support them natively.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to James Kuyper on Sat Dec 5 17:15:24 2020

On 05/12/2020 16:53, James Kuyper wrote:

On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:

I wonder if it would make sense to add @ to the basic character set.
Virtually everyone is using it in comments and strings already anyway
(for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII and
common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

'@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
French, French Canadian, German, Italian, Spanish, Swedish, and Swiss national variants, that code point is assigned to some other character.
With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
becoming so common place, that is less of a concern than it used to be,
but it is still something the committee is likely to pay attention to.
There are other characters that already are part of the C basic
character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^". However, all of those characters played an important role in C syntax
long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
invented to allow those characters to be used on systems that didn't
support them natively.

@ is used in existing C implementations as an extension feature. In particular, a number of embedded C compilers allow a syntax like
"uint8_t reg @ 0x1234;" to mean "reg is a uint8_t object located at
absolute address 0x1234". If people are consistent about using spaces,
this could easily be solved if @ is made a letter by simply making @
alone into a keyword. But if those compilers accept "uint8_t
reg@0x1234", then that fails.

Another cause for concern is if the symbol is used in identifiers, then
these could cause trouble for assemblers and/or linkers on some systems.
(This applies to the common extension of allowing $ as a "letter" in C
- the gcc manual notes that this is not supported on some targets due to
the meaning of $ in assembly on those targets.)

The standards committee are always reluctant to make changes that could interfere with known implementations and existing code, even if the
conflict is with an implementation-specific extension to the language.

I also think it makes sense to reserve such symbols for future purposes
in C or C++ - good punctuation symbols are too useful to waste as
letters. For example, the proposed "metaclasses" in C++ suggests using
$ as part of the syntax, which I think is a very good idea. It is
perhaps more likely that @ would find similar use in future C++ features
than future C features, but no one would benefit from adding new
conflicts between those languages.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Philipp Klaus Krause@21:1/5 to All on Sat Dec 5 20:55:24 2020

Am 05.12.20 um 17:15 schrieb David Brown:

@ is used in existing C implementations as an extension feature. In particular, a number of embedded C compilers allow a syntax like
"uint8_t reg @ 0x1234;" to mean "reg is a uint8_t object located at
absolute address 0x1234". If people are consistent about using spaces,
this could easily be solved if @ is made a letter by simply making @
alone into a keyword. But if those compilers accept "uint8_t
reg@0x1234", then that fails.

Another cause for concern is if the symbol is used in identifiers, then
these could cause trouble for assemblers and/or linkers on some systems.
(This applies to the common extension of allowing $ as a "letter" in C
- the gcc manual notes that this is not supported on some targets due to
the meaning of $ in assembly on those targets.)

The standards committee are always reluctant to make changes that could interfere with known implementations and existing code, even if the
conflict is with an implementation-specific extension to the language.

None of these would be a problem for adding it to the basic source
character set.
The allowed characters in identifiers are different from the basic
source character set. E.g. ] is in the basic source character set, but
not allowed in identifiers.
By adding it to the basic source character set, we can portably use it
in comments and string and character literals.

Philipp

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Keith Thompson@21:1/5 to James Kuyper on Sat Dec 5 14:17:30 2020

James Kuyper <jameskuyper@alumni.caltech.edu> writes:

On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:

I wonder if it would make sense to add @ to the basic character set.
Virtually everyone is using it in comments and strings already anyway
(for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII and
common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

'@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
French, French Canadian, German, Italian, Spanish, Swedish, and Swiss national variants, that code point is assigned to some other character.
With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
becoming so common place, that is less of a concern than it used to be,
but it is still something the committee is likely to pay attention to.
There are other characters that already are part of the C basic
character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^". However, all of those characters played an important role in C syntax
long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
invented to allow those characters to be used on systems that didn't
support them natively.

Apparently the C++ committee felt that it was of so little concern that
they removed trigraphs in C++17. I don't know of any plans to do the
same in C.

There are three printable ASCII characters that aren't in C's basic
character set: '$', '`', and '@'. A guarantee that all three can be
used in string literals, character constants, and comments could be
useful. (Most programmers probably already assume they can be.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Francis Glassborow@21:1/5 to Keith Thompson on Sun Dec 6 12:25:42 2020

On 05/12/2020 22:17, Keith Thompson wrote:

James Kuyper <jameskuyper@alumni.caltech.edu> writes:

On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:

I wonder if it would make sense to add @ to the basic character set.
Virtually everyone is using it in comments and strings already anyway
(for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII and >>> common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

'@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
national variants, that code point is assigned to some other character.
With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
becoming so common place, that is less of a concern than it used to be,
but it is still something the committee is likely to pay attention to.
There are other characters that already are part of the C basic
character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^".
However, all of those characters played an important role in C syntax
long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
invented to allow those characters to be used on systems that didn't
support them natively.

Apparently the C++ committee felt that it was of so little concern that
they removed trigraphs in C++17. I don't know of any plans to do the
same in C.

There are three printable ASCII characters that aren't in C's basic
character set: '$', '`', and '@'. A guarantee that all three can be
used in string literals, character constants, and comments could be
useful. (Most programmers probably already assume they can be.)

1) Trigraphs were proving to be a road-block for C++. In addition they
are so rarely used (certainly in C++) that many (probably most)
programmers fail to recognise them. WG14 appears reluctant to remove
things even when they have no practical use in modern code. The argument
that they are needed for legacy systems is, I think, very weak;
compilers will continue to support them where necessary by providing
legacy code switches.

2) As one design feature of C is portability it is time that the three characters you mention that are added to the basic character set. I do
not see how that would have a negative effect on implementations that
already use them for extensions. Those uses do not (or should not) rely
on them not being part of the basic character set.

3) Instead of speculating that their inclusion would cause problems to
some programmers we need evidence that that is the case. Considering
that it would be hard to use a modern computer system without having
both @ and $ available (think mobile and portable computer technology) I
would be surprised if it would be a serious problem for anyone.

Just my 2c/p/d

Francis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Brown@21:1/5 to Francis Glassborow on Sun Dec 6 13:47:40 2020

On 06/12/2020 13:25, Francis Glassborow wrote:

On 05/12/2020 22:17, Keith Thompson wrote:

James Kuyper <jameskuyper@alumni.caltech.edu> writes:

On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:

I wonder if it would make sense to add @ to the basic character set.
Virtually everyone is using it in comments and strings already anyway
(for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII
and
common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

'@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
national variants, that code point is assigned to some other character.
With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
becoming so common place, that is less of a concern than it used to be,
but it is still something the committee is likely to pay attention to.
There are other characters that already are part of the C basic
character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^".
However, all of those characters played an important role in C syntax
long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
invented to allow those characters to be used on systems that didn't
support them natively.

Apparently the C++ committee felt that it was of so little concern that
they removed trigraphs in C++17. I don't know of any plans to do the
same in C.

There are three printable ASCII characters that aren't in C's basic
character set: '$', '`', and '@'. A guarantee that all three can be
used in string literals, character constants, and comments could be
useful. (Most programmers probably already assume they can be.)

Agreed.

1) Trigraphs were proving to be a road-block for C++. In addition they
are so rarely used (certainly in C++) that many (probably most)
programmers fail to recognise them. WG14 appears reluctant to remove
things even when they have no practical use in modern code. The argument
that they are needed for legacy systems is, I think, very weak;
compilers will continue to support them where necessary by providing
legacy code switches.

There is also the difference that C is used on a much wider range of
systems than C++, especially older systems. C++ is able to drop support
for odder systems (such as those with more limited character sets, or
stranger integer representation) simply because it has not been used on
such systems.

2) As one design feature of C is portability it is time that the three characters you mention that are added to the basic character set. I do
not see how that would have a negative effect on implementations that
already use them for extensions. Those uses do not (or should not) rely
on them not being part of the basic character set.

As long as they are only available (by standard) for using in strings
and comments, not identifiers, there should be no conflict unless they
can't be represented in the source (for comments) or execution (for
string literals) character set of the system. But if these characters
are supported by the relevant character sets, then in any real-world
compiler (such as any that support ASCII), they will already be
supported as extended characters.

In other words, there is not actually anything significant useful to be
gained by putting these characters in the basic character set. Equally,
there is no real risk in doing so. It is purely a hypothetical issue,
AFAICS. And the C standards committee are not known for spending extra
effort on something that makes no difference in reality.

3) Instead of speculating that their inclusion would cause problems to
some programmers we need evidence that that is the case. Considering
that it would be hard to use a modern computer system without having
both @ and $ available (think mobile and portable computer technology) I would be surprised if it would be a serious problem for anyone.

Just my 2c/p/d

Francis

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Damon@21:1/5 to David Brown on Sun Dec 6 08:42:49 2020

On 12/6/20 7:47 AM, David Brown wrote:

On 06/12/2020 13:25, Francis Glassborow wrote:

On 05/12/2020 22:17, Keith Thompson wrote:

James Kuyper <jameskuyper@alumni.caltech.edu> writes:

On 12/5/20 2:58 AM, Philipp Klaus Krause wrote:

I wonder if it would make sense to add @ to the basic character set. >>>>> Virtually everyone is using it in comments and strings already anyway >>>>> (for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII >>>>> and
common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

'@' is not in the ISO/IEC 646 invariant subset; in the Danish, Dutch,
French, French Canadian, German, Italian, Spanish, Swedish, and Swiss
national variants, that code point is assigned to some other character. >>>> With UTF-8 (on Unix-like systems) and UTF-16 (on Windows systems)
becoming so common place, that is less of a concern than it used to be, >>>> but it is still something the committee is likely to pay attention to. >>>> There are other characters that already are part of the C basic
character set that aren't in the invariant subset: "# [ ] { } \ | ~ ^". >>>> However, all of those characters played an important role in C syntax
long before ISO/IEC 646; that's not the case for '@'. Trigraphs were
invented to allow those characters to be used on systems that didn't
support them natively.

Apparently the C++ committee felt that it was of so little concern that
they removed trigraphs in C++17. I don't know of any plans to do the
same in C.

There are three printable ASCII characters that aren't in C's basic
character set: '$', '`', and '@'. A guarantee that all three can be
used in string literals, character constants, and comments could be
useful. (Most programmers probably already assume they can be.)

Agreed.

1) Trigraphs were proving to be a road-block for C++. In addition they
are so rarely used (certainly in C++) that many (probably most)
programmers fail to recognise them. WG14 appears reluctant to remove
things even when they have no practical use in modern code. The argument
that they are needed for legacy systems is, I think, very weak;
compilers will continue to support them where necessary by providing
legacy code switches.

There is also the difference that C is used on a much wider range of
systems than C++, especially older systems. C++ is able to drop support
for odder systems (such as those with more limited character sets, or stranger integer representation) simply because it has not been used on
such systems.

2) As one design feature of C is portability it is time that the three
characters you mention that are added to the basic character set. I do
not see how that would have a negative effect on implementations that
already use them for extensions. Those uses do not (or should not) rely
on them not being part of the basic character set.

As long as they are only available (by standard) for using in strings
and comments, not identifiers, there should be no conflict unless they
can't be represented in the source (for comments) or execution (for
string literals) character set of the system. But if these characters
are supported by the relevant character sets, then in any real-world
compiler (such as any that support ASCII), they will already be
supported as extended characters.

In other words, there is not actually anything significant useful to be gained by putting these characters in the basic character set. Equally, there is no real risk in doing so. It is purely a hypothetical issue, AFAICS. And the C standards committee are not known for spending extra effort on something that makes no difference in reality.

The issue with making them part of the basic character set is that it
makes any system that can't do this, because it uses a strange character
set, non-conforming. Since systems ARE allowed to add any characters
they want to the source or execution character set, those that currently support them can do so. Forcing them to be included drops some system
from being able to have a conforming implementation, and the committee
has traditionally avoided gratuitously making systems non-conforming.

The only case that can be made to make them part, is that then programs
that use those characters might be able to become strictly conforming
programs instead of just being conforming programs, but strict
conformance isn't really that big of a deal in practicality, as
virtually all real programs are going to fail strict performance because
they are going to depend on some aspect of the environment (Like how I/O actually works)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Keith Thompson@21:1/5 to Richard Damon on Sun Dec 6 14:07:12 2020

Richard Damon <Richard@Damon-Family.org> writes:
[...]

The issue with making them part of the basic character set is that it
makes any system that can't do this, because it uses a strange character
set, non-conforming. Since systems ARE allowed to add any characters
they want to the source or execution character set, those that currently support them can do so. Forcing them to be included drops some system
from being able to have a conforming implementation, and the committee
has traditionally avoided gratuitously making systems non-conforming.

(Context: The ASCII characters '@', '$', and '`'.)

I'd be interested in seeing an implementation for which this would
be relevant. Such an implementation (a) would be unable to (easily)
represent those three character in source code and/or during
execution *and* (b) would otherwise conform to the hypothetical
edition of the C standard that would add them to the basic character
set if it were not for this change.

Implementations that can't support those characters are likely to be
for tiny exotic target systems, and very likely won't be conforming
anyway, and so could simply ignore the addition of those characters
to the basic character set.

The only case that can be made to make them part, is that then programs
that use those characters might be able to become strictly conforming programs instead of just being conforming programs, but strict
conformance isn't really that big of a deal in practicality, as
virtually all real programs are going to fail strict performance because
they are going to depend on some aspect of the environment (Like how I/O actually works)

I suppose I agree that it's not that big a deal. Code that uses
those characters is *practically* 100% portable already, and I haven't
found a way to coax either gcc or clang to warn about puts("$@`").
The benefit would be minor, and the cost would be very close to zero
(unless an implementation as I've described above actually exists).
It would be one less thing to think about when writing code that's
intended to be as portable as possible.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Damon@21:1/5 to Keith Thompson on Sun Dec 6 17:44:50 2020

On 12/6/20 5:07 PM, Keith Thompson wrote:

Richard Damon <Richard@Damon-Family.org> writes:
[...]

The issue with making them part of the basic character set is that it
makes any system that can't do this, because it uses a strange character
set, non-conforming. Since systems ARE allowed to add any characters
they want to the source or execution character set, those that currently
support them can do so. Forcing them to be included drops some system
from being able to have a conforming implementation, and the committee
has traditionally avoided gratuitously making systems non-conforming.

(Context: The ASCII characters '@', '$', and '`'.)

I'd be interested in seeing an implementation for which this would
be relevant. Such an implementation (a) would be unable to (easily) represent those three character in source code and/or during
execution *and* (b) would otherwise conform to the hypothetical
edition of the C standard that would add them to the basic character
set if it were not for this change.

As was mentioned, all that you need is to want to support ISO/IEC 646
for a naional character set that doesn't define code point 64 as @

This includes Canadian, French, German, Irish, and a number of others.

See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.

Implementations that can't support those characters are likely to be
for tiny exotic target systems, and very likely won't be conforming
anyway, and so could simply ignore the addition of those characters
to the basic character set.

The only case that can be made to make them part, is that then programs
that use those characters might be able to become strictly conforming
programs instead of just being conforming programs, but strict
conformance isn't really that big of a deal in practicality, as
virtually all real programs are going to fail strict performance because
they are going to depend on some aspect of the environment (Like how I/O
actually works)

I suppose I agree that it's not that big a deal. Code that uses
those characters is *practically* 100% portable already, and I haven't
found a way to coax either gcc or clang to warn about puts("$@`").
The benefit would be minor, and the cost would be very close to zero
(unless an implementation as I've described above actually exists).
It would be one less thing to think about when writing code that's
intended to be as portable as possible.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Keith Thompson@21:1/5 to Richard Damon on Sun Dec 6 15:49:36 2020

Richard Damon <Richard@Damon-Family.org> writes:

On 12/6/20 5:07 PM, Keith Thompson wrote:

Richard Damon <Richard@Damon-Family.org> writes:
[...]

The issue with making them part of the basic character set is that it
makes any system that can't do this, because it uses a strange character >>> set, non-conforming. Since systems ARE allowed to add any characters
they want to the source or execution character set, those that currently >>> support them can do so. Forcing them to be included drops some system
from being able to have a conforming implementation, and the committee
has traditionally avoided gratuitously making systems non-conforming.

(Context: The ASCII characters '@', '$', and '`'.)

I'd be interested in seeing an implementation for which this would
be relevant. Such an implementation (a) would be unable to (easily)
represent those three character in source code and/or during
execution *and* (b) would otherwise conform to the hypothetical
edition of the C standard that would add them to the basic character
set if it were not for this change.

As was mentioned, all that you need is to want to support ISO/IEC 646
for a naional character set that doesn't define code point 64 as @

This includes Canadian, French, German, Irish, and a number of others.

See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.

What C implementations support those character sets (and are likely to
attempt to conform to a future C standard that adds '@' to the basic
character set)?

The following characters are also not part of the invariant character
set: # [ \ ] ^ { | } ~ (We have trigraphs for those. I *do not*
suggest adding trigraphs for @ $ `.)

C++ has already dropped trigraphs because support for the old 7-bit
national character sets was considered unimportant. (But C++17
did not add @ $ ` to its basic character set.) I understand that
C has different issues than C++, but in my opinion adding @ $
` to C's basic character set would cause no actual harm.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Philipp Klaus Krause@21:1/5 to All on Mon Dec 7 09:30:09 2020

Am 06.12.20 um 23:07 schrieb Keith Thompson:

Implementations that can't support […] are likely to be
for tiny exotic target systems,

I made that mistake before, with N2576. Spoiler: ctype.h would be hard
to provide for freestanding implementations targeting IBM mainframes.

I don't expect @ $ ` to be a problem for tiny targets. But I am not
familiar with IBm mainframes using EBCDIC and I am not familiar with
weird character sets that might still be in use in parts of Asia.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Philipp Klaus Krause@21:1/5 to All on Mon Dec 7 09:17:37 2020

Am 05.12.20 um 23:17 schrieb Keith Thompson:

There are three printable ASCII characters that aren't in C's basic
character set: '$', '`', and '@'. A guarantee that all three can be
used in string literals, character constants, and comments could be
useful. (Most programmers probably already assume they can be.)

` is a bit different from the other two: Some EBCDIC code pages that
contain $ and @ do not contain it, e.g. codepage 410 Cyrillic. AFAIK,
one can currently write the basic character set (with use of digraphs
for { and }) in EBCDIC codepage 410, which would no longer be possible
when ` gets added.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Philipp Klaus Krause@21:1/5 to All on Mon Dec 7 09:31:38 2020

I think that as this point, WG14 mostly thinks that adding trigraphs was
a mistake. But a mistake tat can't be undone.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Damon@21:1/5 to Keith Thompson on Mon Dec 7 07:24:32 2020

On 12/6/20 6:49 PM, Keith Thompson wrote:

Richard Damon <Richard@Damon-Family.org> writes:

On 12/6/20 5:07 PM, Keith Thompson wrote:

Richard Damon <Richard@Damon-Family.org> writes:
[...]

The issue with making them part of the basic character set is that it
makes any system that can't do this, because it uses a strange character >>>> set, non-conforming. Since systems ARE allowed to add any characters
they want to the source or execution character set, those that currently >>>> support them can do so. Forcing them to be included drops some system
from being able to have a conforming implementation, and the committee >>>> has traditionally avoided gratuitously making systems non-conforming.

(Context: The ASCII characters '@', '$', and '`'.)

I'd be interested in seeing an implementation for which this would
be relevant. Such an implementation (a) would be unable to (easily)
represent those three character in source code and/or during
execution *and* (b) would otherwise conform to the hypothetical
edition of the C standard that would add them to the basic character
set if it were not for this change.

As was mentioned, all that you need is to want to support ISO/IEC 646
for a naional character set that doesn't define code point 64 as @

This includes Canadian, French, German, Irish, and a number of others.

See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.

What C implementations support those character sets (and are likely to attempt to conform to a future C standard that adds '@' to the basic character set)?

gcc (and many others) with the right choice of file encoding options.
The key point here is that this change would be telling a number of
national bodies that their whole national character set (and thus in
some respects their language) will no longer be supported.

The following characters are also not part of the invariant character
set: # [ \ ] ^ { | } ~ (We have trigraphs for those. I *do not*
suggest adding trigraphs for @ $ `.)

The issue is that trigraphs were created to solve the issue of the
character set, and was adoptted precisely so that those national bodies
would allow the C language to become a standard. They were done a bit
hastily, and it shows, but did provide a solution that satisfied the
national bodies requesting a solution.

C++ has already dropped trigraphs because support for the old 7-bit
national character sets was considered unimportant. (But C++17
did not add @ $ ` to its basic character set.) I understand that
C has different issues than C++, but in my opinion adding @ $
` to C's basic character set would cause no actual harm.

C++ has historically been much less concerned about backwards
compatibility issues.

I will add that while you think it causes little harm (besides making
programs stored in these national character sets perhaps no longer
conforming), it also adds little benefit. As has been pointed out,
basically all existing implementations include them in the extended
character set (when the encoding has those characters), so there is no
change to the programmer as far as allowing them in comments or strings,
which would be their only universal usage.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas David Rivers@21:1/5 to Philipp Klaus Krause on Sun Dec 6 16:11:14 2020

Philipp Klaus Krause wrote:

I wonder if it would make sense to add @ to the basic character set. >Virtually everyone is using it in comments and strings already anyway
(for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII and >common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

Just to add to the "used as an extension" list of compilers; the Dignus compilers (and the SAS/C compilers) for the mainframe use @ to be similar
to &, except that it can accept an rvalue. If an rvalue is present
after a @, then the address of a copy is generated. The copy is
declared within
the inner-most scope.

This is helpful in some situations on the mainframe where pass-by-reference
is the norm, as in:

FOO(@1, @2);

(where FOO is defined in some other language, e.g. PL/I, where the
parameters
are pass-by-reference.)

- Dave R. -

--
rivers@dignus.com Work: (919) 676-0847
Get your mainframe programming tools at http://www.dignus.com

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Keith Thompson@21:1/5 to Thomas David Rivers on Mon Dec 7 12:19:48 2020

Thomas David Rivers <rivers@dignus.com> writes:

Philipp Klaus Krause wrote:

I wonder if it would make sense to add @ to the basic character set. >>Virtually everyone is using it in comments and strings already anyway
(for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII and >>common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

Just to add to the "used as an extension" list of compilers; the Dignus compilers (and the SAS/C compilers) for the mainframe use @ to be similar
to &, except that it can accept an rvalue. If an rvalue is present
after a @, then the address of a copy is generated. The copy is
declared within
the inner-most scope.

This is helpful in some situations on the mainframe where pass-by-reference is the norm, as in:

FOO(@1, @2);

(where FOO is defined in some other language, e.g. PL/I, where the
parameters
are pass-by-reference.)

You can do the same thing with a compound literal starting in C99:

#include <stdio.h>

void FOO(int *a, int *b) {
printf("%d %d\n", *a, *b);
}

int main(void) {
FOO(&(int){1}, &(int){2});
}

I suspect the extension predates compound literals.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Keith Thompson@21:1/5 to Richard Damon on Mon Dec 7 13:10:59 2020

Richard Damon <Richard@Damon-Family.org> writes:

On 12/7/20 3:16 PM, Keith Thompson wrote:

Richard Damon <Richard@Damon-Family.org> writes:

On 12/6/20 6:49 PM, Keith Thompson wrote:

Richard Damon <Richard@Damon-Family.org> writes:

On 12/6/20 5:07 PM, Keith Thompson wrote:

Richard Damon <Richard@Damon-Family.org> writes:
[...]

The issue with making them part of the basic character set is that it >>>>>>> makes any system that can't do this, because it uses a strange character
set, non-conforming. Since systems ARE allowed to add any characters >>>>>>> they want to the source or execution character set, those that currently
support them can do so. Forcing them to be included drops some system >>>>>>> from being able to have a conforming implementation, and the committee >>>>>>> has traditionally avoided gratuitously making systems non-conforming. >>>>>>

(Context: The ASCII characters '@', '$', and '`'.)

I'd be interested in seeing an implementation for which this would >>>>>> be relevant. Such an implementation (a) would be unable to (easily) >>>>>> represent those three character in source code and/or during
execution *and* (b) would otherwise conform to the hypothetical
edition of the C standard that would add them to the basic character >>>>>> set if it were not for this change.

As was mentioned, all that you need is to want to support ISO/IEC 646 >>>>> for a naional character set that doesn't define code point 64 as @

This includes Canadian, French, German, Irish, and a number of others. >>>>>
See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.

What C implementations support those character sets (and are likely to >>>> attempt to conform to a future C standard that adds '@' to the basic
character set)?

gcc (and many others) with the right choice of file encoding options.
The key point here is that this change would be telling a number of
national bodies that their whole national character set (and thus in
some respects their language) will no longer be supported.

OK. Can you explain precisely how to invoke gcc with the right choice
of file encoding options? I've found this option in the gcc manual:

'-finput-charset=CHARSET'
Set the input character set, used for translation from the
character set of the input file to the source character set used by
GCC. If the locale does not specify, or GCC cannot get this
information from the locale, the default is UTF-8. This can be
overridden by either the locale or this command-line option.
Currently the command-line option takes precedence if there's a
conflict. CHARSET can be any encoding supported by the system's
'iconv' library routine.

but I had never used it.

I just used "iconv -l" to get what I presume is a list of valid CHARSET
values (there are over 1000 of them), which led me to this:

gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c

With this source file:

#include <stdio.h>
int main(void) {
puts("$@`");
}

it produced a cascade of errors, starting with:

In file included from <command-line>:31:
/usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
18 | #ifndef _STDC_PREDEF_H
| ^

It looks like something translated the # character to \302 (0xc2).
I have no idea why. (And it didn't complain about "$@`".)

If there's a way to invoke gcc telling it to use a character set that
doesn't include those characters, that would be a good refutation
to my point. If doing so is actually useful in some contexts,
it would be an even better refutation. So far I'm not convinced,
but I'm prepared to be.

My impression is that the old 7-bit national character sets are
no longer relevant, and that dropping support for them in the
C standard (more precisely, updating the C standard in a manner
that's inconsistent with those character sets) would be very nearly
harmless. I'm looking for evidence that that's not the case.

[...]

One problem is that file is NOT compatible with ISO646-FR as the '#' character in it would not be a HashTag (or Pound Sign), but would be the character £ which is illegal in C. It is one of the encodings that NEEDS
the trigraphs or digraphs in the files to use C.

The first file it complains about, /usr/include/stdc-predef.h,
is part of the implementation (specifically part of glibc).
Either the implementation doesn't support ISO646-FR, or there's
some configuration I would need to perform to make it support it.

I'd still be interested in seeing an existing implementation that
does support ISO646-FR or something similar, and that would become non-conforming if '@' were made part of the basic character set.

I recognize that the burden of proof is on any proposal to make a
change to the standard, but so far I've seen no evidence that such a
change would actually break anything (at least anything that isn't
already broken).

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Keith Thompson@21:1/5 to Richard Damon on Mon Dec 7 12:16:14 2020

Richard Damon <Richard@Damon-Family.org> writes:

On 12/6/20 6:49 PM, Keith Thompson wrote:

Richard Damon <Richard@Damon-Family.org> writes:

On 12/6/20 5:07 PM, Keith Thompson wrote:

Richard Damon <Richard@Damon-Family.org> writes:
[...]

The issue with making them part of the basic character set is that it >>>>> makes any system that can't do this, because it uses a strange character >>>>> set, non-conforming. Since systems ARE allowed to add any characters >>>>> they want to the source or execution character set, those that currently >>>>> support them can do so. Forcing them to be included drops some system >>>>> from being able to have a conforming implementation, and the committee >>>>> has traditionally avoided gratuitously making systems non-conforming. >>>>

(Context: The ASCII characters '@', '$', and '`'.)

I'd be interested in seeing an implementation for which this would
be relevant. Such an implementation (a) would be unable to (easily)
represent those three character in source code and/or during
execution *and* (b) would otherwise conform to the hypothetical
edition of the C standard that would add them to the basic character
set if it were not for this change.

As was mentioned, all that you need is to want to support ISO/IEC 646
for a naional character set that doesn't define code point 64 as @

This includes Canadian, French, German, Irish, and a number of others.

See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.

What C implementations support those character sets (and are likely to
attempt to conform to a future C standard that adds '@' to the basic
character set)?

gcc (and many others) with the right choice of file encoding options.
The key point here is that this change would be telling a number of
national bodies that their whole national character set (and thus in
some respects their language) will no longer be supported.

OK. Can you explain precisely how to invoke gcc with the right choice
of file encoding options? I've found this option in the gcc manual:

'-finput-charset=CHARSET'
Set the input character set, used for translation from the
character set of the input file to the source character set used by
GCC. If the locale does not specify, or GCC cannot get this
information from the locale, the default is UTF-8. This can be
overridden by either the locale or this command-line option.
Currently the command-line option takes precedence if there's a
conflict. CHARSET can be any encoding supported by the system's
'iconv' library routine.

but I had never used it.

I just used "iconv -l" to get what I presume is a list of valid CHARSET
values (there are over 1000 of them), which led me to this:

gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c

With this source file:

#include <stdio.h>
int main(void) {
puts("$@`");
}

it produced a cascade of errors, starting with:

In file included from <command-line>:31:
/usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
18 | #ifndef _STDC_PREDEF_H
| ^

It looks like something translated the # character to \302 (0xc2).
I have no idea why. (And it didn't complain about "$@`".)

If there's a way to invoke gcc telling it to use a character set that
doesn't include those characters, that would be a good refutation
to my point. If doing so is actually useful in some contexts,
it would be an even better refutation. So far I'm not convinced,
but I'm prepared to be.

My impression is that the old 7-bit national character sets are
no longer relevant, and that dropping support for them in the
C standard (more precisely, updating the C standard in a manner
that's inconsistent with those character sets) would be very nearly
harmless. I'm looking for evidence that that's not the case.

[...]

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Damon@21:1/5 to Keith Thompson on Mon Dec 7 15:51:34 2020

On 12/7/20 3:16 PM, Keith Thompson wrote:

Richard Damon <Richard@Damon-Family.org> writes:

On 12/6/20 6:49 PM, Keith Thompson wrote:

Richard Damon <Richard@Damon-Family.org> writes:

On 12/6/20 5:07 PM, Keith Thompson wrote:

Richard Damon <Richard@Damon-Family.org> writes:
[...]

The issue with making them part of the basic character set is that it >>>>>> makes any system that can't do this, because it uses a strange character >>>>>> set, non-conforming. Since systems ARE allowed to add any characters >>>>>> they want to the source or execution character set, those that currently >>>>>> support them can do so. Forcing them to be included drops some system >>>>>> from being able to have a conforming implementation, and the committee >>>>>> has traditionally avoided gratuitously making systems non-conforming. >>>>>

(Context: The ASCII characters '@', '$', and '`'.)

I'd be interested in seeing an implementation for which this would
be relevant. Such an implementation (a) would be unable to (easily) >>>>> represent those three character in source code and/or during
execution *and* (b) would otherwise conform to the hypothetical
edition of the C standard that would add them to the basic character >>>>> set if it were not for this change.

As was mentioned, all that you need is to want to support ISO/IEC 646
for a naional character set that doesn't define code point 64 as @

This includes Canadian, French, German, Irish, and a number of others. >>>>
See https://en.wikipedia.org/wiki/ISO/IEC_646 for a chart of these.

What C implementations support those character sets (and are likely to
attempt to conform to a future C standard that adds '@' to the basic
character set)?

gcc (and many others) with the right choice of file encoding options.
The key point here is that this change would be telling a number of
national bodies that their whole national character set (and thus in
some respects their language) will no longer be supported.

OK. Can you explain precisely how to invoke gcc with the right choice
of file encoding options? I've found this option in the gcc manual:

'-finput-charset=CHARSET'
Set the input character set, used for translation from the
character set of the input file to the source character set used by
GCC. If the locale does not specify, or GCC cannot get this
information from the locale, the default is UTF-8. This can be
overridden by either the locale or this command-line option.
Currently the command-line option takes precedence if there's a
conflict. CHARSET can be any encoding supported by the system's
'iconv' library routine.

but I had never used it.

I just used "iconv -l" to get what I presume is a list of valid CHARSET values (there are over 1000 of them), which led me to this:

gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c

With this source file:

#include <stdio.h>
int main(void) {
puts("$@`");
}

it produced a cascade of errors, starting with:

In file included from <command-line>:31:
/usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
18 | #ifndef _STDC_PREDEF_H
| ^

It looks like something translated the # character to \302 (0xc2).
I have no idea why. (And it didn't complain about "$@`".)

If there's a way to invoke gcc telling it to use a character set that
doesn't include those characters, that would be a good refutation
to my point. If doing so is actually useful in some contexts,
it would be an even better refutation. So far I'm not convinced,
but I'm prepared to be.

My impression is that the old 7-bit national character sets are
no longer relevant, and that dropping support for them in the
C standard (more precisely, updating the C standard in a manner
that's inconsistent with those character sets) would be very nearly
harmless. I'm looking for evidence that that's not the case.

[...]

One problem is that file is NOT compatible with ISO646-FR as the '#'
character in it would not be a HashTag (or Pound Sign), but would be the character £ which is illegal in C. It is one of the encodings that NEEDS
the trigraphs or digraphs in the files to use C.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andreas Schwab@21:1/5 to Keith Thompson on Mon Dec 7 23:08:13 2020

On Dez 07 2020, Keith Thompson wrote:

I just used "iconv -l" to get what I presume is a list of valid CHARSET values (there are over 1000 of them), which led me to this:

gcc -std=c11 -pedantic-errors -finput-charset=ISO646-FR -c c.c

With this source file:

#include <stdio.h>
int main(void) {
puts("$@`");
}

it produced a cascade of errors, starting with:

In file included from <command-line>:31:
/usr/include/stdc-predef.h:18:1: error: stray ‘\302’ in program
18 | #ifndef _STDC_PREDEF_H
| ^

It looks like something translated the # character to \302 (0xc2).
I have no idea why. (And it didn't complain about "$@`".)

That is the first byte of the UTF-8 representation of <U00A3>, which is
what 0x23 translates to in ISO646-FR.

Andreas.

--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Andreas Schwab@21:1/5 to Keith Thompson on Mon Dec 7 23:52:05 2020

On Dez 07 2020, Keith Thompson wrote:

The first file it complains about, /usr/include/stdc-predef.h,
is part of the implementation (specifically part of glibc).
Either the implementation doesn't support ISO646-FR, or there's
some configuration I would need to perform to make it support it.

The system files are encoded in UTF-8, so if you want to use them in a ISO646-FR context, you have to convert them first.

Andreas.

--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Keith Thompson@21:1/5 to Andreas Schwab on Mon Dec 7 15:27:07 2020

Andreas Schwab <schwab@linux-m68k.org> writes:

On Dez 07 2020, Keith Thompson wrote:

The first file it complains about, /usr/include/stdc-predef.h,
is part of the implementation (specifically part of glibc).
Either the implementation doesn't support ISO646-FR, or there's
some configuration I would need to perform to make it support it.

The system files are encoded in UTF-8, so if you want to use them in a ISO646-FR context, you have to convert them first.

I suppose that would work (and would break the implementation for my
normal use).

That's not a reasonable thing to expect a user to do. If that's the
simplest way to get the implementation to support ISO646-FR, then I'd
say the implementation doesn't support ISO646-FR.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Damon@21:1/5 to Keith Thompson on Mon Dec 7 18:54:02 2020

On 12/7/20 6:27 PM, Keith Thompson wrote:

Andreas Schwab <schwab@linux-m68k.org> writes:

On Dez 07 2020, Keith Thompson wrote:

The first file it complains about, /usr/include/stdc-predef.h,
is part of the implementation (specifically part of glibc).
Either the implementation doesn't support ISO646-FR, or there's
some configuration I would need to perform to make it support it.

The system files are encoded in UTF-8, so if you want to use them in a
ISO646-FR context, you have to convert them first.

I suppose that would work (and would break the implementation for my
normal use).

That's not a reasonable thing to expect a user to do. If that's the
simplest way to get the implementation to support ISO646-FR, then I'd
say the implementation doesn't support ISO646-FR.

Actually, unless the files use characteres outside the basic set, all
that it requires is encoding the problematic characters as trigraphs or digraphs, which will work for all users.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Keith Thompson@21:1/5 to Richard Damon on Mon Dec 7 16:10:02 2020

Richard Damon <Richard@Damon-Family.org> writes:

On 12/7/20 6:27 PM, Keith Thompson wrote:

Andreas Schwab <schwab@linux-m68k.org> writes:

On Dez 07 2020, Keith Thompson wrote:

The first file it complains about, /usr/include/stdc-predef.h,
is part of the implementation (specifically part of glibc).
Either the implementation doesn't support ISO646-FR, or there's
some configuration I would need to perform to make it support it.

The system files are encoded in UTF-8, so if you want to use them in a
ISO646-FR context, you have to convert them first.

I suppose that would work (and would break the implementation for my
normal use).

That's not a reasonable thing to expect a user to do. If that's the
simplest way to get the implementation to support ISO646-FR, then I'd
say the implementation doesn't support ISO646-FR.

Actually, unless the files use characteres outside the basic set, all
that it requires is encoding the problematic characters as trigraphs or digraphs, which will work for all users.

Yes, that could work too. (It would break the implementation for
modes in which trigraphs are disabled, but since such modes are
non-conforming it's not relevant to the current disucssion.)

Still, it doesn't provide what I'm looking for: an example of an
existing real-world conforming implementation that would not be
conforming if '@' et al were added to the basic character set.

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard Damon@21:1/5 to Andreas Schwab on Mon Dec 7 18:31:03 2020

On 12/7/20 5:52 PM, Andreas Schwab wrote:

On Dez 07 2020, Keith Thompson wrote:

The first file it complains about, /usr/include/stdc-predef.h,
is part of the implementation (specifically part of glibc).
Either the implementation doesn't support ISO646-FR, or there's
some configuration I would need to perform to make it support it.

The system files are encoded in UTF-8, so if you want to use them in a ISO646-FR context, you have to convert them first.

Andreas.

It is perhaps a weakness in GCC that is seems that there is just one
global file encoding parameter, so you need different versions of them
for each encoding of your source files. I think you can make parrallel directories and change the system directory path for eacn.

Now, likely your don't REALLY need all those different copies, as you
could make just one copy for the ones that are missing any of the needed characters and replace them with trigraph or digraph encodings.

You could of course just always use that encoded file, but that version
would be less readable to those using the more 'normal' character sets.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas David Rivers@21:1/5 to Keith Thompson on Mon Dec 7 17:02:15 2020

Keith Thompson wrote:

Thomas David Rivers <rivers@dignus.com> writes:

Philipp Klaus Krause wrote:

I wonder if it would make sense to add @ to the basic character set. >>>Virtually everyone is using it in comments and strings already anyway >>>(for email addresses), and I don't see anything preventing >>>implementations from supporting it, as it is available in both ASCII and >>>common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

Just to add to the "used as an extension" list of compilers; the Dignus >>compilers (and the SAS/C compilers) for the mainframe use @ to be similar >>to &, except that it can accept an rvalue. If an rvalue is present
after a @, then the address of a copy is generated. The copy is
declared within
the inner-most scope.

This is helpful in some situations on the mainframe where pass-by-reference >>is the norm, as in:

FOO(@1, @2);

(where FOO is defined in some other language, e.g. PL/I, where the >>parameters
are pass-by-reference.)

You can do the same thing with a compound literal starting in C99:

#include <stdio.h>

void FOO(int *a, int *b) {
printf("%d %d\n", *a, *b);
}

int main(void) {
FOO(&(int){1}, &(int){2});
}

I suspect the extension predates compound literals.

Yep - this extension predates those.

And - very clever use of them! It certainly does what someone would need
in this situation.

- Dave R. -

--
rivers@dignus.com Work: (919) 676-0847
Get your mainframe programming tools at http://www.dignus.com

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Philipp Klaus Krause@21:1/5 to All on Thu Mar 11 22:50:26 2021

Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause:

I wonder if it would make sense to add @ to the basic character set. Virtually everyone is using it in comments and strings already anyway
(for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII and common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

After some discussion and thought, IMO, the way forward is to add @ to
the source and execution character sets, but not the basic source
character set:

http://www.colecovision.eu/stuff/proposal-@.html

Do you think this proposal makes sense as is? If yes, do you have a
preference for adding them as single bytes vs. not specifying if they
are single bytes? If yes, why?

Philipp

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Keith Thompson@21:1/5 to Philipp Klaus Krause on Thu Mar 11 15:40:28 2021

Philipp Klaus Krause <pkk@spth.de> writes:

Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause:

I wonder if it would make sense to add @ to the basic character set.
Virtually everyone is using it in comments and strings already anyway
(for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII and
common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

After some discussion and thought, IMO, the way forward is to add @ to
the source and execution character sets, but not the basic source
character set:

http://www.colecovision.eu/stuff/proposal-@.html

Do you think this proposal makes sense as is? If yes, do you have a preference for adding them as single bytes vs. not specifying if they
are single bytes? If yes, why?

It's not *necesary*, but I wouldn't object to it.

If this change is going to be made, I'd advocate also adding $
(mentioned in the proposal) and ` (not mentioned). None of @,
$, and ` are required for any C tokens, but many implementations
allow $ in identifiers. @, $, and ` are the only ASCII characters
that are not part of the C basic character sets. All are commonly
used in character constants and string literals. (`, backtick,
is used in Markdown and some other languages.)

The *basic* characters are those that are required for all
implementations. The set of *extended* characters is
implementation-defined, and may be empty. The @, $, and ` characters
are extended characters in most or all current implementations. If @, $,
and ` are going to be required, I think they should be in the basic
character set. That's the point of the distinction between basic and
extended characters.

Both ASCII and the EBCDIC code pages that support them represent
all these characters in one byte. Their representations should be
required to fit in a byte, since that already applies to all the
other basic characters; allowing them to be multi-byte wouldn't
help portability and would add complexity.

The vast majority of implementations already conform to this proposal,
except perhaps for a minor documentation update.

The only reasons I can think of *not* to make this change are (a) *any*
change to the standard needs to justify the work needed to make the
change and this one isn't really necessary, and (b) apparently some
EBCDIC codepages don't support all these characters. If the latter
affects any actual implementations, the could pick some other printable characters to stand in (similar things have been done in the past for
old ASCII variants).

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Philipp Klaus Krause@21:1/5 to All on Fri Mar 12 15:25:25 2021

Am 12.03.21 um 00:40 schrieb Keith Thompson:

Philipp Klaus Krause <pkk@spth.de> writes:

Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause:

I wonder if it would make sense to add @ to the basic character set.
Virtually everyone is using it in comments and strings already anyway
(for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII and >>> common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

After some discussion and thought, IMO, the way forward is to add @ to
the source and execution character sets, but not the basic source
character set:

http://www.colecovision.eu/stuff/proposal-@.html

Do you think this proposal makes sense as is? If yes, do you have a
preference for adding them as single bytes vs. not specifying if they
are single bytes? If yes, why?

It's not *necesary*, but I wouldn't object to it.

If this change is going to be made, I'd advocate also adding $
(mentioned in the proposal) and ` (not mentioned). None of @,
$, and ` are required for any C tokens, but many implementations
allow $ in identifiers. @, $, and ` are the only ASCII characters
that are not part of the C basic character sets. All are commonly
used in character constants and string literals. (`, backtick,
is used in Markdown and some other languages.)

` makes sense. However, I don't know if WG14 wants it, so I'd make that
a separate question in the same paper.

The *basic* characters are those that are required for all
implementations. The set of *extended* characters is
implementation-defined, and may be empty. The @, $, and ` characters
are extended characters in most or all current implementations. If @, $,
and ` are going to be required, I think they should be in the basic
character set. That's the point of the distinction between basic and extended characters.

On the other hand, currently, using universal character names for
characters in the basic source character set is not allowed, so moving characters into the basic source character set can actually break things.

Also, there is undefined behaviour when a character outside the basic
source character set is encountered in a source file, except in an
identifier, a character constant, a string literal, a header name, a
comment, or a preprocessing token that is never converted to a token.
Since some implementations use @ and $ for special purposes, is makes
sense to keep this undefined behaviour.

Philipp

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Tim Rentsch@21:1/5 to Philipp Klaus Krause on Sat Jul 10 08:46:23 2021

Philipp Klaus Krause <pkk@spth.de> writes:

Am 05.12.20 um 08:58 schrieb Philipp Klaus Krause:

I wonder if it would make sense to add @ to the basic character set.
Virtually everyone is using it in comments and strings already anyway
(for email addresses), and I don't see anything preventing
implementations from supporting it, as it is available in both ASCII and
common EBCDIC code pages:

http://www.colecovision.eu/stuff/proposal-basic-@.html

After some discussion and thought, IMO, the way forward is to add @ to
the source and execution character sets, but not the basic source
character set:

http://www.colecovision.eu/stuff/proposal-@.html

Do you think this proposal makes sense as is? If yes, do you have a preference for adding them as single bytes vs. not specifying if they
are single bytes? If yes, why?

I would vote against the proposal, because it does nothing useful.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Fri Apr 26 15:47:21 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 26 10:09:36 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 26 08:24:20 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 26 06:40:30 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	81:48:54
Calls:	6,658
Calls today:	4
Files:	12,203
Messages:	5,333,318
Posted today:	1

Add @ to basic character set?

Who's Online

Recent Visitors

System Info