I'm using ANSI escape codes ("\033[%d;%dH") to position Unicode
characters on a terminal window. The indices to provide for %d
are suited for (e.g.) the Latin character sets, but not for
character sets where characters require more than one unit for
the displayed glyph, e.g. like the Chinese characters. So with
a Latin character set I'd use indices 1, 2, 3, ... and for the
Asian sets I's use 1, 3, 5, ... to position the characters at
the screen. My question:
Is the size that the character glyphs need for representation
on a terminal somehow retrievable, so that I get, say, for
Unicode character \U0041 a value of 1 and for \U30ee a value
of 2, so that I can automatize the displaying on a terminal?
On 27/12/2021 at 14.07, Janis Papanagnou wrote:
I'm using ANSI escape codes ("\033[%d;%dH") to position Unicode
characters on a terminal window. The indices to provide for %d
are suited for (e.g.) the Latin character sets, but not for
character sets where characters require more than one unit for
the displayed glyph, e.g. like the Chinese characters. So with
a Latin character set I'd use indices 1, 2, 3, ... and for the
Asian sets I's use 1, 3, 5, ... to position the characters at
the screen. My question:
Is the size that the character glyphs need for representation
on a terminal somehow retrievable, so that I get, say, for
Unicode character \U0041 a value of 1 and for \U30ee a value
of 2, so that I can automatize the displaying on a terminal?
Quick search reveals: https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters
On 27.12.2021 14:39, marrgol wrote:
On 27/12/2021 at 14.07, Janis Papanagnou wrote:
I'm using ANSI escape codes ("\033[%d;%dH") to position Unicode
characters on a terminal window. The indices to provide for %d
are suited for (e.g.) the Latin character sets, but not for
character sets where characters require more than one unit for
the displayed glyph, e.g. like the Chinese characters. So with
a Latin character set I'd use indices 1, 2, 3, ... and for the
Asian sets I's use 1, 3, 5, ... to position the characters at
the screen. My question:
Is the size that the character glyphs need for representation
on a terminal somehow retrievable, so that I get, say, for
Unicode character \U0041 a value of 1 and for \U30ee a value
of 2, so that I can automatize the displaying on a terminal?
Quick search reveals:
https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters
Interesting, Stephane asked that question. And wc -L seems to be
the solution; non-standard but at least works on my system. Thanks!
$ printf "\U30ee" | wc -L
2
$ printf "\U0041" | wc -L
1
Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
On 27.12.2021 14:39, marrgol wrote:
On 27/12/2021 at 14.07, Janis Papanagnou wrote:
I'm using ANSI escape codes ("\033[%d;%dH") to position Unicode
characters on a terminal window. The indices to provide for %d
are suited for (e.g.) the Latin character sets, but not for
character sets where characters require more than one unit for
the displayed glyph, e.g. like the Chinese characters. So with
a Latin character set I'd use indices 1, 2, 3, ... and for the
Asian sets I's use 1, 3, 5, ... to position the characters at
the screen. My question:
Is the size that the character glyphs need for representation
on a terminal somehow retrievable, so that I get, say, for
Unicode character \U0041 a value of 1 and for \U30ee a value
of 2, so that I can automatize the displaying on a terminal?
Quick search reveals:
https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters
Interesting, Stephane asked that question. And wc -L seems to be
the solution; non-standard but at least works on my system. Thanks!
$ printf "\U30ee" | wc -L
2
$ printf "\U0041" | wc -L
1
Interally, `wc -L` uses the POSIX `wcwidth()` function.
https://pubs.opengroup.org/onlinepubs/9699919799/functions/wcwidth.html
I'm not 100% clear on how the number of column positions for a given character is defined.
On 27.12.2021 22:38, Keith Thompson wrote:
Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
On 27.12.2021 14:39, marrgol wrote:
Quick search reveals:
https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters
Interesting, Stephane asked that question. And wc -L seems to be
the solution; non-standard but at least works on my system. Thanks!
$ printf "\U30ee" | wc -L
2
$ printf "\U0041" | wc -L
1
Interally, `wc -L` uses the POSIX `wcwidth()` function.
Yes, that function seems to be the standard base for a couple tools.
It's good to have access to that function on Linux in such a simple
way. (Not sure how reliable that is, though; see below.)
https://pubs.opengroup.org/onlinepubs/9699919799/functions/wcwidth.html
I'm not 100% clear on how the number of column positions for a given character is defined.
The issue seems to be quite a mess. In the SE thread Stefane gave a link
to an article on the Unicode topic that I found interesting and amusing: https://eev.ee/blog/2015/09/12/dark-corners-of-unicode/#combining-characters-and-character-width
On 27.12.2021 14:39, marrgol wrote:
On 27/12/2021 at 14.07, Janis Papanagnou wrote:
I'm using ANSI escape codes ("\033[%d;%dH") to position Unicode
characters on a terminal window. The indices to provide for %d
are suited for (e.g.) the Latin character sets, but not for
character sets where characters require more than one unit for
the displayed glyph, e.g. like the Chinese characters. So with
a Latin character set I'd use indices 1, 2, 3, ... and for the
Asian sets I's use 1, 3, 5, ... to position the characters at
the screen. My question:
Is the size that the character glyphs need for representation
on a terminal somehow retrievable, so that I get, say, for
Unicode character \U0041 a value of 1 and for \U30ee a value
of 2, so that I can automatize the displaying on a terminal?
Quick search reveals:
https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters
Interesting, Stephane asked that question. And wc -L seems to be
the solution; non-standard but at least works on my system. Thanks!
$ printf "\U30ee" | wc -L
2
$ printf "\U0041" | wc -L
1
On 27.12.2021 15:56, Janis Papanagnou wrote:
On 27.12.2021 14:39, marrgol wrote:
On 27/12/2021 at 14.07, Janis Papanagnou wrote:
I'm using ANSI escape codes ("\033[%d;%dH") to position Unicode
characters on a terminal window. The indices to provide for %d
are suited for (e.g.) the Latin character sets, but not for
character sets where characters require more than one unit for
the displayed glyph, e.g. like the Chinese characters. So with
a Latin character set I'd use indices 1, 2, 3, ... and for the
Asian sets I's use 1, 3, 5, ... to position the characters at
the screen. My question:
Is the size that the character glyphs need for representation
on a terminal somehow retrievable, so that I get, say, for
Unicode character \U0041 a value of 1 and for \U30ee a value
of 2, so that I can automatize the displaying on a terminal?
Quick search reveals:
https://unix.stackexchange.com/questions/245013/get-the-display-width-of-a-string-of-characters
Interesting, Stephane asked that question. And wc -L seems to be
the solution; non-standard but at least works on my system. Thanks!
$ printf "\U30ee" | wc -L
2
$ printf "\U0041" | wc -L
1
Just tried that for the Unicode-smileys starting in the Unicode tables
from position U+1F600 (128512), but for these symbols 'wc -L' returns
0, as if these symbols wouldn't require any space. - Too bad.
Janis Papanagnou <janis_papanagnou@hotmail.com> writes:
Just tried that for the Unicode-smileys starting in the Unicode tables
from position U+1F600 (128512), but for these symbols 'wc -L' returns
0, as if these symbols wouldn't require any space. - Too bad.
$ printf "\U1f600" | wc -L
2
Maybe a locale setting?
On 27.12.2021 22:38, Keith Thompson wrote:
Interally, `wc -L` uses the POSIX `wcwidth()` function.Yes, that function seems to be the standard base for a couple tools.
It's good to have access to that function on Linux in such a simple
way. (Not sure how reliable that is, though; see below.)
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 422 |
Nodes: | 16 (0 / 16) |
Uptime: | 180:08:22 |
Calls: | 8,942 |
Calls today: | 9 |
Files: | 13,352 |
Messages: | 5,990,986 |