• Documentation for utf8decode and ushow (nee unicodeshow)

    From David Newall@21:1/5 to All on Sun Mar 6 23:54:58 2022
    Hello All,

    I've almost finished my UTF-8 decode and Unicode show work. Significant changes are:

    1. unicodeshow is now ushow. The names all got a bit long.
    2. The decoder has been rewritten based on code by Thompson and Pike in
    Plan 9. They wrote an incredibly clever test for overlong sequences.
    3. Documentation. Yes, really! https://davidnewall.com/software/utf8show/PostScript%20UTF8%20Extension%20Reference.pdf

    Have I left anything out?

    Regards,

    David

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From luser droog@21:1/5 to David Newall on Thu Mar 10 22:26:23 2022
    On Sunday, March 6, 2022 at 6:55:09 AM UTC-6, David Newall wrote:
    Hello All,

    I've almost finished my UTF-8 decode and Unicode show work. Significant changes are:

    1. unicodeshow is now ushow. The names all got a bit long.
    2. The decoder has been rewritten based on code by Thompson and Pike in
    Plan 9. They wrote an incredibly clever test for overlong sequences.
    3. Documentation. Yes, really! https://davidnewall.com/software/utf8show/PostScript%20UTF8%20Extension%20Reference.pdf

    Have I left anything out?

    Regards,

    David

    Beautiful!

    Found just a few typos:
    p. 3 operator summary: kushow omits map argument
    p.6 "it's CharProcs dictionary" errant apostrophe

    possible improvement: Maybe add example code for the ushow details.
    Sure, it's everywhere else. And you see it soon enough by scrolling either
    up or down to the nearest example. But if you bee-line to the simple case,
    you don't get the full view/narrow scope base case model all in a nutshell,
    you know? Does that make sense?

    possible improvement: show off more fancy characters in the examples?
    This is friggin' brilliant. Crow it to the murder!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Anthk@21:1/5 to David Newall on Tue Dec 13 20:51:12 2022
    On 2022-03-06, David Newall <davidn@davidnewall.com> wrote:
    Hello All,

    I've almost finished my UTF-8 decode and Unicode show work. Significant changes are:

    1. unicodeshow is now ushow. The names all got a bit long.
    2. The decoder has been rewritten based on code by Thompson and Pike in
    Plan 9. They wrote an incredibly clever test for overlong sequences.
    3. Documentation. Yes, really! https://davidnewall.com/software/utf8show/PostScript%20UTF8%20Extension%20Reference.pdf

    Have I left anything out?

    Regards,

    David

    I coudn't get your examples running in GhostScript:

    ## code
    openbsd$ gs utf8test.ps
    GPL Ghostscript 9.56.1 (2022-04-04)
    Copyright (C) 2022 Artifex Software, Inc. All rights reserved.
    This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
    see the file COPYING for details.
    Error: /stackunderflow in --def--
    Operand stack:
    Map
    Execution stack:
    %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1
    3 %oparray_pop 1833 1 3 %oparray_pop --nostringval-- %errorexec_pop .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval--
    Dictionary stack:
    --dict:761/1123(ro)(G)-- --dict:0/20(G)-- --dict:75/200(L)--
    Current allocation mode is local
    Current file position is 561

    ## eof

    Any tips on using it, for instance, with an example how to
    integrate your code as a "library"?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From David Newall@21:1/5 to Ander GM on Thu Dec 29 17:49:29 2022
    On 14/12/22 07:51, Anthk wrote:
    On 2022-03-06, David Newall<davidn@davidnewall.com> wrote:
    I've almost finished my UTF-8 decode and Unicode show work

    I coudn't get your examples running in GhostScript:

    ## code
    openbsd$ gs utf8test.ps
    [...]
    Any tips on using it, for instance, with an example how to
    integrate your code as a "library"?
    You'll see some document manager directives in that file:

    %%IncludeResource: procset utf8decode
    %%IncludeResource: procset unicodeshow
    %%IncludeResource: font unifontmedium
    %%IncludeResource: encoding unifontmedium

    The first two, utf8decode and unicodeshow, are in the same directory.
    The last two define unifontmedium which is a GNU font available at: https://fontlibrary.org/en/font/gnu-unifont. The encoding was generated
    from that font using fontforge.

    Include all unifontmedium in the correct place (or adjust the PostScript
    in utf8test) and pass the other files to GS before utf8test:

    $ gs -DNOSAFER utf8decode unicodeshow unifontmedium.t42 utf8test.ps

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard L. Hamilton@21:1/5 to David Newall on Thu Dec 29 11:46:54 2022
    In article <6224af22$1@news.ausics.net>,
    David Newall <davidn@davidnewall.com> writes:
    [...]
    2. The decoder has been rewritten based on code by Thompson and Pike in
    Plan 9. They wrote an incredibly clever test for overlong sequences.

    Seeing as how they invented UTF-8 encoding, I'm not surprised they had some elegant ideas for how to work with it. :-)

    http://doc.cat-v.org/bell_labs/utf-8_history

    --
    get |fortune
    377 I/O error: smart remark generator failed

    Lasik/PRK theme music:
    "In the Hall of the Mountain King", from "Peer Gynt"
    (read act 2, scene 6 of the play if that doesn't make sense)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)