def
def
Hello All,
I've written some PostScript to allow me to print UTF8-encoded
strings:
% print a single unicode codepoint:[...]
% integer unicodeshow -
/unicodeshow {
/utfshow {[...]
UTF8_ACCEPT 0 UTF8_ACCEPT % prev codep current
4 -1 roll {
decode
dup UTF8_ACCEPT eq { 1 index unicodeshow } if
David Newall <davidn@davidnewall.com>:
I've written some PostScript to allow me to print UTF8-encoded
strings:
This is great!
Doesn't "x glyphshow y glyphshow" lose the kerning between x and y?
(I'm not really sure)
I've written some PostScript to allow me to print UTF8-encoded strings
I've written some PostScript to allow me to print UTF8-encoded strings
...
I also use a table which Adobe published ("UNICODE translation table for non-ASCII characters"), which they say is for going from a glyph name to
a Unicode codepoint. I (ab)use it in the reverse direction. I turned
it into a dictionary keyed on the codepoint.
Hi All,
I'm soliciting opinions...
On 21/1/22 9:56 pm, David Newall wrote:
I've written some PostScript to allow me to print UTF8-encodedMany (most?) fonts have glyphs which aren't in Adobe's table, or which
strings ...
I also use a table which Adobe published ("UNICODE translation
table for non-ASCII characters"), which they say is for going from
a glyph name to a Unicode codepoint. I (ab)use it in the reverse direction. I turned it into a dictionary keyed on the codepoint.
are named differently. Fontforge can write a table of glyphs in a
font and their corresponding codepoints. Using that table,
unicodeshow looks more like this:
% lookup a unicode codepoint (int) in a list of known glyphs (dict)
% and display the glyph found.
% dict int unicodeshow -
/unicodeshow {
2 copy known { get } { pop pop /.notdef } ifelse glyphshow
} bind def
While this looks much neater, it requires pre-generating a dictionary
for each font used.
I can't decide which approach is better.
I'm not delighted by needing to add a dictionary that's specific to
the current font to utfshow and unicodeshow because it feels wrong.
Opinions? Would adding to a font dictionary going to break things?
(I'm looking at you, Acrobat and Distiller.)
On 21/1/22 9:56 pm, David Newall wrote:
I've written some PostScript to allow me to print UTF8-encoded
strings
There was an error in unicodeshow. I wasn't attempting /uniXXXX for codepoints that weren't in Adobe's table.
Apparently it's also not uncommon to use /uXXXX through /uXXXXXX (4
to 6 hex digits), so I check for those, too.
Opinions? Would adding to a font dictionary going to break things?
(I'm looking at you, Acrobat and Distiller.)
Regards,
David
V Sun, 23 Jan 2022 14:10:12 +1100
David Newall <dav...@davidnewall.com> napsáno:
[...]
Opinions? Would adding to a font dictionary going to break things?Don't know about that, I only use Ghostscript. But if the reason to add
(I'm looking at you, Acrobat and Distiller.)
a lookup is speed, a possible optimization could be not to call
unicodeshow on each codepoint, but identify string intervals where all
bytes are either <= 127 or > 127. Call show on the former, and utfshow
on the latter.
C.
On Saturday, January 22, 2022 at 9:10:23 PM UTC-6, David Newall wrote:
Opinions? Would adding to a font dictionary going to break things?
(I'm looking at you, Acrobat and Distiller.)
I don't see how that could be a problem unless the additions conflict
with existing names. It's possible that findfont will give you a dictionary without write access. But you could copy everything into a new dictionary
and then call `definefont` on that and you should be good to go.
Adobe's table (or one similar to it) is included in Ghostscript (AdobeGlyphList), and maybe other interpreters, too.
If you know you are dealing with modern fonts that include the uni/u
aliases, you can get rid of the Adobe table lookup altogether... You
don't need the canonical glyph names for those fonts.
I think if a font has a mapping between unicode points and glyphs that
you can extract (with Fontforge or whatever), then it surely also has
uni/u aliases. The Adobe table is for older fonts that don't have them,
so it's the only lookup table you need.
I'm not delighted by needing to add a dictionary that's specific to
the current font to utfshow and unicodeshow because it feels wrong.
Also, having to pre-process the files to insert the tables is not good.
a possible optimization could be not to call
unicodeshow on each codepoint, but identify string intervals where all
bytes are either <= 127 or > 127. Call show on the former, and utfshow
on the latter.
No font is guaranteed to use any of these names and many fonts that
I've examined use different names for unicode values (and different
values for some names.)
If you know you are dealing with modern fonts that include the uni/u aliases, you can get rid of the Adobe table lookup altogether... You
don't need the canonical glyph names for those fonts.
No font that I've examined includes uni/u names for every glyph, or
even for most glyphs.
One can't rely on any pre-determined glyph name, nor any
pre-determined lookup table. What a mess.
After your comment about older fonts, I examined Courier, a Type 1
font (https://web.archive.org/web/20010617080950/http://www.ctan.org/tex-archive/fonts/psfonts/courier/).
The CharStrings array breaks my assumptions and my code completely
fails.
After your comment about older fonts, I examined Courier, a Type 1What assumptions?
font
(https://web.archive.org/web/20010617080950/http://www.ctan.org/tex-archive/fonts/psfonts/courier/).
The CharStrings array breaks my assumptions and my code completely
fails.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 293 |
Nodes: | 16 (2 / 14) |
Uptime: | 221:14:40 |
Calls: | 6,623 |
Calls today: | 5 |
Files: | 12,171 |
Messages: | 5,318,094 |