Forum: >>> Magnum BBS <<<

Documentation for utf8decode and ushow (nee unicodeshow)

From David Newall@21:1/5 to All on Sun Mar 6 23:54:58 2022

Hello All,

I've almost finished my UTF-8 decode and Unicode show work. Significant changes are:

1. unicodeshow is now ushow. The names all got a bit long.
2. The decoder has been rewritten based on code by Thompson and Pike in
Plan 9. They wrote an incredibly clever test for overlong sequences.
3. Documentation. Yes, really! https://davidnewall.com/software/utf8show/PostScript%20UTF8%20Extension%20Reference.pdf

Have I left anything out?

Regards,

David

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From luser droog@21:1/5 to David Newall on Thu Mar 10 22:26:23 2022

On Sunday, March 6, 2022 at 6:55:09 AM UTC-6, David Newall wrote:

Hello All,

I've almost finished my UTF-8 decode and Unicode show work. Significant changes are:

1. unicodeshow is now ushow. The names all got a bit long.
2. The decoder has been rewritten based on code by Thompson and Pike in
Plan 9. They wrote an incredibly clever test for overlong sequences.
3. Documentation. Yes, really! https://davidnewall.com/software/utf8show/PostScript%20UTF8%20Extension%20Reference.pdf

Have I left anything out?

Regards,

David

Beautiful!

Found just a few typos:
p. 3 operator summary: kushow omits map argument
p.6 "it's CharProcs dictionary" errant apostrophe

possible improvement: Maybe add example code for the ushow details.
Sure, it's everywhere else. And you see it soon enough by scrolling either
up or down to the nearest example. But if you bee-line to the simple case,
you don't get the full view/narrow scope base case model all in a nutshell,
you know? Does that make sense?

possible improvement: show off more fancy characters in the examples?
This is friggin' brilliant. Crow it to the murder!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Anthk@21:1/5 to David Newall on Tue Dec 13 20:51:12 2022

On 2022-03-06, David Newall <davidn@davidnewall.com> wrote:

Hello All,

I've almost finished my UTF-8 decode and Unicode show work. Significant changes are:

1. unicodeshow is now ushow. The names all got a bit long.
2. The decoder has been rewritten based on code by Thompson and Pike in
Plan 9. They wrote an incredibly clever test for overlong sequences.
3. Documentation. Yes, really! https://davidnewall.com/software/utf8show/PostScript%20UTF8%20Extension%20Reference.pdf

Have I left anything out?

Regards,

David

I coudn't get your examples running in GhostScript:

## code
openbsd$ gs utf8test.ps
GPL Ghostscript 9.56.1 (2022-04-04)
Copyright (C) 2022 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Error: /stackunderflow in --def--
Operand stack:
Map
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1
3 %oparray_pop 1833 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval--
Dictionary stack:
--dict:761/1123(ro)(G)-- --dict:0/20(G)-- --dict:75/200(L)--
Current allocation mode is local
Current file position is 561

## eof

Any tips on using it, for instance, with an example how to
integrate your code as a "library"?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From David Newall@21:1/5 to Ander GM on Thu Dec 29 17:49:29 2022

On 14/12/22 07:51, Anthk wrote:

On 2022-03-06, David Newall<davidn@davidnewall.com> wrote:

I've almost finished my UTF-8 decode and Unicode show work

I coudn't get your examples running in GhostScript:

## code
openbsd$ gs utf8test.ps
[...]
Any tips on using it, for instance, with an example how to
integrate your code as a "library"?

You'll see some document manager directives in that file:

%%IncludeResource: procset utf8decode
%%IncludeResource: procset unicodeshow
%%IncludeResource: font unifontmedium
%%IncludeResource: encoding unifontmedium

The first two, utf8decode and unicodeshow, are in the same directory.
The last two define unifontmedium which is a GNU font available at: https://fontlibrary.org/en/font/gnu-unifont. The encoding was generated
from that font using fontforge.

Include all unifontmedium in the correct place (or adjust the PostScript
in utf8test) and pass the other files to GS before utf8test:

$ gs -DNOSAFER utf8decode unicodeshow unifontmedium.t42 utf8test.ps

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Richard L. Hamilton@21:1/5 to David Newall on Thu Dec 29 11:46:54 2022

In article <6224af22$1@news.ausics.net>,
David Newall <davidn@davidnewall.com> writes:
[...]

2. The decoder has been rewritten based on code by Thompson and Pike in
Plan 9. They wrote an incredibly clever test for overlong sequences.

Seeing as how they invented UTF-8 encoding, I'm not surprised they had some elegant ideas for how to work with it. :-)

http://doc.cat-v.org/bell_labs/utf-8_history

--

get |fortune

377 I/O error: smart remark generator failed

Lasik/PRK theme music:
"In the Hall of the Mountain King", from "Peer Gynt"
(read act 2, scene 6 of the play if that doesn't make sense)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Brianm
  Sat May 4 21:37:30 2024
  from Glasgow via Telnet
- Guest
  Sat May 4 19:06:15 2024
  from London via SSH
- Guest
  Sat May 4 18:55:48 2024
  from London, Uk via SSH
- Michal Wronka
  Sun May 5 08:14:32 2024
  from Wroclaw, Poland via SSH

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (0 / 16)
Uptime:	110:15:23
Calls:	6,701
Calls today:	1
Files:	12,233
Messages:	5,348,551

Documentation for utf8decode and ushow (nee unicodeshow)

Who's Online

Recent Visitors

System Info