After decades I'm again writing some C code and intended to use some
Unicode characters for output. I'm using C99. I have two questions.
I am able to inline the character in the code like: printf ("█\n");
But I also want to make it a printf argument: printf ("%c\n", '█');
which doesn't work (at least not in the depicted way).
And I want to declare such characters, like: char ch = '█';
which also doesn't work, and neither does: wchar_t ch = '█';
And ideally the character should not be copy/pasted into the code
but given by some standard representation like '\u2588' (or so).
Without giving all the gory details about the "problems of Unicode",
are there practical answers to those questions that "simply work"
and reliably?
I have experimented and observed that working with strings at least
*seems* to work: char * ch = "\u2588"; printf ("%s\n", ch);
Is that an acceptable/reliable and the usual way in C to tackle the
issue?
Thanks.
Janis
After decades I'm again writing some C code and intended to use some
Unicode characters for output. I'm using C99. I have two questions.
I am able to inline the character in the code like: printf ("█\n");
But I also want to make it a printf argument: printf ("%c\n", '█');
which doesn't work (at least not in the depicted way).
And I want to declare such characters, like: char ch = '█';
which also doesn't work, and neither does: wchar_t ch = '█';
And ideally the character should not be copy/pasted into the code
but given by some standard representation like '\u2588' (or so).
Without giving all the gory details about the "problems of Unicode",
are there practical answers to those questions that "simply work"
and reliably?
I have experimented and observed that working with strings at least
*seems* to work: char * ch = "\u2588"; printf ("%s\n", ch);
Is that an acceptable/reliable and the usual way in C to tackle the
issue?
Thanks.
Janis
cat foo.c#include <stdio.h>
hexdump -C char.txt00000000 e2 96 88 |...|
python>>> u'\u2588'.encode('utf-8')
My own approach would be to do as much as possible in my own code.
A lot
depends on whether you need to pass your own characters (of whatever type) to some external library which expects a specific type like wchar_t or not. There are many different scenarios so I will cover what would be most likely to occur in my own code.
- No external library involved.
- Output encoded in UTF-8
- The text editor I use to write the code stores everything as UTF-8.
With the above assumptions I would simply use ordinary C strings and put UTF-8 in them like "ΑΒΓΔΕΖΗΘ..." and output them in the ordinary way.
It's not guaranteed to work but it most likely will.
[...]
And ideally the character should not be copy/pasted into the code
but given by some standard representation like '\u2588' (or so).
Why is that ? It seems to me that it makes the code harder to understand.
What works reliably depends a lot on what you're trying to do. Unicode in general is messy.
I have experimented and observed that working with strings at least
*seems* to work: char * ch = "\u2588"; printf ("%s\n", ch);
Is that an acceptable/reliable and the usual way in C to tackle the
issue?
If you do
char * ch = "\u2588"
size_t i ;
for (i = 0 ; ch[i] != 0 ; i++) {
printf("%d " , ch[i]) ;
}
puts("") ;
what output do you get ? I will guess that you see the bytes
226 150 136 .
On Sat, 9 Dec 2023 17:59:32 +0100
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
In my case the characters are just "graphical candy", so it's not
important to "read" them; a comment behind the \u encoding appears
to me to be sufficient.
Well , it's your code. If it is some kind of block characters based
"art" then it may even be more important to be able to see it in the
source.
On Sat, 9 Dec 2023 15:59:08 +0100
jak <nospam@please.ty> wrote:
To explain myself better if I write a program that prints an extended
unicode character and my terminal uses the UTF-8 characters if the
program does not convert the character from Unicode to UTF-8 I will not
see anything. To prove it I will send the character to a file:
cat foo.c#include <stdio.h>
#include <stddef.h>
#include <wchar.h>
#include <locale.h>
int main()
{
wchar_t wch = L'\u2588';
FILE *fp;
setlocale(LC_ALL, "");
if((fp = fopen("char.txt", "wb")) != NULL)
{
fwprintf(fp, L"%lc", wch);
fclose(fp);
}
return 0;
}
hexdump -C char.txt00000000 e2 96 88 |...|
00000003
As you can see the character code is not the same that I sent.
In what way is it not the same as what you sent ? With hexdump you
can only hope to see octets regardless of what the octets encode. So
you read back the octets which are the UTF-8 encoding of codepoint
U+2588 .What you got is exactly what I would expect to see. If you
use a terminal which supports UTF-8 and has the necessary font and
you do
cat foo.c#include <stdio.h>
gcc foo.c -o foo
foo
od -t x1 char.txt0000000 e2 96 88
tcc foo.c
foo
od -t x1 char.txt0000000 88 25
cat char.txt
what do you see ? I expect you will see the block character.
With python it is easy to highlight the conversion:
python>>> u'\u2588'.encode('utf-8')
b'\xe2\x96\x88'
After decades I'm again writing some C code and intended to use some
Unicode characters for output. I'm using C99. I have two questions.
I am able to inline the character in the code like: printf ("█\n");
But I also want to make it a printf argument: printf ("%c\n", '█');
which doesn't work (at least not in the depicted way).
And I want to declare such characters, like: char ch = '█';
which also doesn't work, and neither does: wchar_t ch = '█';
And ideally the character should not be copy/pasted into the code
but given by some standard representation like '\u2588' (or so).
Without giving all the gory details about the "problems of Unicode",
are there practical answers to those questions that "simply work"
and reliably?
I have experimented and observed that working with strings at least
*seems* to work: char * ch = "\u2588"; printf ("%s\n", ch);
Is that an acceptable/reliable and the usual way in C to tackle the
issue?
Thanks.
Janis
在 2023/12/9 15:04, Janis Papanagnou 写道:
[...] intended to use some Unicode characters for output. [...]
printf("%c",ch), the ch must <0xFF, <255
In c lang, The character must be a character of an ASCII table,
i.e. < (int)255. A string is a collection of characters.
printf("%c",ch), the ch must <0xFF, <255
In c lang, The character must be a character of an ASCII table, i.e. <
printf("%c",ch), the ch must <0xFF, <255
In c lang, The character must be a character of an ASCII table, i.e. < (int)255. A string is a collection of characters.
On Wed, 13 Dec 2023 11:05:45 +0800, spender wrote:
printf("%c",ch), the ch must <0xFF, <255
Not quite.
1) ch /must/ represent an integer value.
2) ch /should/ represent a C char value. Note that a C char /is not/
defined as an 8-bit unsigned quantity, but as a CHAR_BIT quantity,
with implementation-defined sign, where CHAR_BIT is /at least/
8 bits. [...]
On 12/12/23 22:05, spender wrote:
printf("%c",ch), the ch must <0xFF, <255
The only 'ch' in the code that you responded to was declared as
"char *", not char, [...]
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Lew Pitcher <lew.pitcher@digitalfreehold.ca> writes:
On Wed, 13 Dec 2023 11:05:45 +0800, spender wrote:
printf("%c",ch), the ch must <0xFF, <255
Not quite.
1) ch /must/ represent an integer value.
More specifically, it must have a type that is or promotes
to int, or a type that is or promotes to unsigned int, with
a value that is in the common range of int and unsigned int.
Not quite. "If no l length modifier is present, the int argument
is converted to an unsigned char, and the resulting character is
written." For example printf("%c", -193) is equivalent to
printf("%c", 63), which assuming an ASCII-based character set will
print '?'.
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
Lew Pitcher <lew.pitcher@digitalfreehold.ca> writes:
On Wed, 13 Dec 2023 11:05:45 +0800, spender wrote:
printf("%c",ch), the ch must <0xFF, <255
Not quite.
1) ch /must/ represent an integer value.
More specifically, it must have a type that is or promotes
to int, or a type that is or promotes to unsigned int, with
a value that is in the common range of int and unsigned int.
Not quite. "If no l length modifier is present, the int argument
is converted to an unsigned char, and the resulting character is
written." For example printf("%c", -193) is equivalent to
printf("%c", 63), which assuming an ASCII-based character set will
print '?'.
The rule for arguments to printf() is the same as the rule for
accessing variadic arguments using va_arg(). That has always
been true, although not expressed clearly in early versions of
the C standard. Fortunately that shortcoming is addressed in
the upcoming C23 (is it still not yet ratified?): in N3096,
paragraph 9 in section 7.23.6.1 says in part
fprintf shall behave as if it uses va_arg with a type
argument naming the type resulting from applying the
default argument promotions to the type corresponding
to the conversion specification [...]
and the rule for va_arg (in 7.16.1.1 p2) says in part
one type is a signed integer type, the other type is
the corresponding unsigned integer type, and the value
is representable in both types
So supplying an unsigned int argument is okay, provided of
course the value is in the range of values of signed int.
Re-reading what you wrote, I think I misunderstood your intent (and I
think what you wrote was ambiguous).
"%c" specifies an int argument.
You wrote:
More specifically, it must have a type that is or promotes to int,
or a type that is or promotes to unsigned int, with a value that is
in the common range of int and unsigned int.
I read that as:
More specifically,
(it must have a type that is or promotes to int, or a type that is
or promotes to unsigned int),
with a value that is in the common range of int and unsigned int.
which would incorrectly imply that a negative int value is not allowed.
It's now clear to me that you meant was:
More specifically,
(it must have a type that is or promotes to int),
or
(a type that is or promotes to unsigned int, with a value that is in
the common range of int and unsigned int).
I agree with that.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 299 |
Nodes: | 16 (2 / 14) |
Uptime: | 51:05:58 |
Calls: | 6,689 |
Calls today: | 7 |
Files: | 12,225 |
Messages: | 5,344,600 |
Posted today: | 1 |