Forum: >>> Magnum BBS <<<

Re: Using "textwrap" package for unwrappable languages (Japanese)

From Peter J. Holzer@21:1/5 to c.buhtz--- via Python-list on Wed Aug 30 14:07:06 2023

On 2023-08-30 11:32:02 +0000, c.buhtz--- via Python-list wrote:

I do use "textwrap" package to wrap longer texts passages. Works well with English.
But the source string used is translated via gettext before it is wrapped.

Using languages like Japanese or Chinese would IMHO result in unwrapped
text. Japanese rules do allow to break a line nearly where ever you want.

How can I handle it with "textwrap"?

At runtime I don't know which language is really used. So I'm not able to decide using "textwrap" or just inserting "\n" every 65 characters.

I don't have a solution but want to add another caveat: Japanese
characters are usually double-width. So (unless your line length is 130 characters for English) you would want to add that line break every 32 characters. (unicodedata.east_asian_width() seems to be the canonical
name to find the width of a character, but it returns a code (like 'W'
or 'Na') not a number.)

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmTvMOUACgkQ8g5IURL+ KF1IPg//bHCUQ+f72lbBTBSn1WHZ6JlDCEcNu3xClpOlb14E3gJGtFvN7i3JqdTk TZ1ZUrmVy3UdC7tyJsN7BazO6dHOPftAfAf6nfzUiixXleSoJmtTUalgbi4aS0px crwQty+Yl9NJkXKkG+6J0YpMOPmhq7x4ml4neVLmgbG+0k9Krhe1t5825YfvYm6y XlFVm5q8ULdCN+9MxVCaVLiFTlUC007/W49WHq7X50W5CrSWxr2BtPlskJd9fDrU y8y6Few3Ko0XxaHtLiw8FzEVHu1q8dZ0c8Wd8lUjmk2WtkJ5yVWq/UReZRQbdJHG o6c53gATUM1W0ouzyJaFcqjC/yat50PL/khH3Uei0Bc2W2BhYBTc4NpyLE6sZQvi O1mk+J6nL+CJB+4uFCRl5P6vTq0SrROn/BnIr722FhIdgHAxRriVqQaqhfxZyTQ6 /WitEsikIXL8XUxzd3UjDJU48x8P5ZS9eTQuSXt1AhtZLkgrSrT+Xz92qtyFKE9K LdhLbrRIHLpbXdf48xHEbOfcqhE34Eovbm+tjZnnN7Aj5R3fZ+iw1RA3YDy7hT+t EHGCGBkyYqzEa9LPF6sfrqxaTx7gCyMA+gc2kIKVzwzc0J6+C2mScr2Fjf2hxIFD j9BSea4uLRIhPbGA9QNj1L7o9JtOWkzIzyNlyXu

From c.buhtz@posteo.jp@21:1/5 to All on Wed Aug 30 11:32:02 2023

Hi,

I do use "textwrap" package to wrap longer texts passages. Works well
with English.
But the source string used is translated via gettext before it is
wrapped.

Using languages like Japanese or Chinese would IMHO result in unwrapped
text. Japanese rules do allow to break a line nearly where ever you
want.

How can I handle it with "textwrap"?

At runtime I don't know which language is really used. So I'm not able
to decide using "textwrap" or just inserting "\n" every 65 characters.

Another approach would be to let the translators handle the line breaks.
But I would like to avoid it because some of them don't know what "\n"
means and they don't know the length rule (in my case 65 characters).

Any ideas about it?

Kind
Christian

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From c.buhtz@posteo.jp@21:1/5 to All on Wed Aug 30 13:18:25 2023

Dear Peter,

thanks for your reply. That is a very interesting topic.

I was a bit wrong. I realized that textwrap.wrap() do insert linebreaks
when "words" are to long. So even a string without any blank space well
get wrapped.

Am 30.08.2023 14:07 schrieb Peter J. Holzer via Python-list:

another caveat: Japanese
characters are usually double-width. So (unless your line length is 130 characters for English) you would want to add that line break every 32 characters.

I don't get your calculation here. Original line length is 130 but for "double-with" characters you would break at 32 instead of 65 ?

(

Then I will do something like this

unicodedata.east_asian_width(mystring[0])

W is "wide". But there is also "F" (full-width).
What is the difference between "wide" and "full-width"?

My application do support (currently 46) languages including Simplified
and Traditional Chinese, Vietnamese, Korean, Japanese, Cyrylic.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Peter J. Holzer@21:1/5 to c.buhtz--- via Python-list on Wed Aug 30 19:52:50 2023

On 2023-08-30 13:18:25 +0000, c.buhtz--- via Python-list wrote:

Am 30.08.2023 14:07 schrieb Peter J. Holzer via Python-list:

another caveat: Japanese characters are usually double-width. So
(unless your line length is 130 characters for English) you would
want to add that line break every 32 characters.

I don't get your calculation here. Original line length is 130 but for "double-with" characters you would break at 32 instead of 65 ?

No, I wrote "*unless* your original line length was 130 characters".

I assumed that you want your line to be 65 latin characters wide since
this is what fits nicely on an A4 (or letter) page with a bit of a
margin on both sides. Or on an 80 character terminal screen or window.
And it's also generally considered to be a good line length for
readability.

But Asian "full width" or "wide" characters are twice as wide, so you
can fit only half as many in a single line. Hence 65 // 2 = 32.

But that was only my assumption. I considered it possible that you
started with 130 characters per line (many terminals back in the day had
a 132 character mode, and that's also approximately the line length in landscape mode or when using a compressed typeface - so 132 is also a
common length limit, although rarely for text (too wide to read
comfortably) and more for code, tables, etc.), divided that by two and
arrived at 65 Japanese characters per line that way. So I mentioned that
to indicate that I had considered the possibility but concluded that it probably wasn't what you meant.

(And as usual when I write a short sentence to clarify something
I wind up writing 4 paragraphs clarifying the clarification :-/)

Then I will do something like this

unicodedata.east_asian_width(mystring[0])

W is "wide". But there is also "F" (full-width).
What is the difference between "wide" and "full-width"?

I'm not an expert on Japanese typography by any means. But they have
some full width variants of latin characters and halfwidth variants of
katakana characters. I assume that the categories 'F' and 'H' are for
those, while "normal" Japanese characters are "W":

unicodedata.east_asian_width("\N{DIGIT ONE}")

'Na'

unicodedata.east_asian_width("\N{FULLWIDTH DIGIT ONE}")

'F'

unicodedata.east_asian_width("\N{KATAKANA LETTER ME}")

'W'

unicodedata.east_asian_width("\N{HALFWIDTH KATAKANA LETTER ME}")

'H'

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

-----BEGIN PGP SIGNATURE-----

iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmTvge4ACgkQ8g5IURL+ KF3wwhAAr6HscQ5ju9MMwo8Ux4EOuLAuS1YPS9sg8AX9NFU1EVnjrS2BdvDeN7q1 0XNKKhdeSwMLI6H9MIqtVvr2QaPTEYX7Aun9Bnm7PDvGPt7P0vgwg97vcjAV/9Ru 8Ye1273B4Li8HUBrzTsjVXxzd80Y/jxD8Hvygf2iS2E+HLerAI9B2YaM0AhBEd/s 62k03WFXAS2zUKZ4ivL9xcoBlhsSqm7YSeITRe3Fh19w22bCItClvADAYt8kF2bg Zwqc9k7AuVLpsl7ftESLUrIrzXANz32mzQ0xxTtXYA0c6S+9we+6uNj0L7f0VTSt xhh+gjskXrRwIwEq/0I0k4nw/7a+hpqvgURsSP17FXzA2AjtN1Y1mAno9XBXG0C0 ejUlk39BS5OZInwAt9cgEn6qmN4BXuZFNP4KZXWsa30RMu4MaiuEmVPjJ+ugXfZG TEQATE245f9TKxJxQEwjYUBcEPiH8ObVQsOTsuRxPtX0xhCUz0nOjxCyc0uc9gQB l+P+O+TidHnruCCflueCP4nnNexM3fC5K+Bv+PohPNdgfepCQqdxUNPCD0rLJ5Fq obQqp1bsQ6DAng/xS63PLPGzhEnEqzp5qR6anHXqHCwHlsS1n3OViFjMAyr8OLmN PZrZ8kNR6jdZT9vCHBb5o8eSyq4zufZ8KhSAtba

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	71:08:25
Calls:	6,712
Files:	12,244
Messages:	5,356,967

Re: Using "textwrap" package for unwrappable languages (Japanese)

Who's Online

System Info