How does this look to you? I am testing a da_DK.utf8 MSG
januar februar marts
ma ti on to fr lø sø ma ti on to fr lø sø ma ti on to fr lø
didn't agree with the chr.codes I had
and the slashed capital O is to be found among the graphical chrs
of PC8 as code 216. This is often used as a replacement for zero.
didn't agree with the chr.codes I had
That is because it cannot be mapped out to PC8. latin1 will work
and will be mapped out to 0xf8 (dec 248 if you prefer).
Near as I can figure dec 148 in PC8 would be the "LATIN SMALL LETTER
O WITH DIAERESIS" which in latin1 is dec 246 or the ö character in
utf8.
and the slashed capital O is to be found among the graphical chrs
of PC8 as code 216.
I'll have to take your word for that since I have never found a map
for PC8. I have seen speculation that it is the same as CP437.
Is it?
'...' En Møøse hade en gång min syster ...What is this .................^^ in Latin 1?
However, if diaeresis is the same as the 'divide' sign
In Latin 1 it's represented by chr code D8
In Latin 1 it's represented by chr code D8 or dec.216 which
happens to be the same as in CP 437.
"IBM OS/2 Warp 4" "Keyboards and Code Pages"
'...' En Møøse hade en gång min syster ...What is this .................^^ in Latin 1?
Near as I can figure dec 148 in PC8 would be the "LATIN SMALL LETTER
O WITH DIAERESIS" which in latin1 is dec 246 or the ö character in
utf8.
The expression 'diaeresis' doesn't exist in my vocabulary or dictionary. However, if diaeresis is the same as the 'divide' sign on the numeric keyboard I agree. That comes out as the Umlaut 'o' in when translated
from Latin 1.
However, if diaeresis is the same as the 'divide' sign
It is the 'o' character with two dots on top. The 'o' character
with the 'divide' sign - I call it the slashed 'o' which hardcore
encoding gurus call 'LATIN SMALL LETTER O WITH STROKE' - ....
In Latin 1 it's represented by chr code D8
That is 'LATIN CAPITAL LETTER O WITH STROKE' and also doesn't exist
in CP437.
In Latin 1 it's represented by chr code D8 or dec.216 which
happens to be the same as in CP 437.
No it isn't. According to
https://en.wikipedia.org/wiki/Code_page_437 D8 or dec.216 is a line
drawing character and in latin1 it is 'LATIN CAPITAL LETTER O WITH
STROKE' or character 'Ø' in utf8.
"IBM OS/2 Warp 4" "Keyboards and Code Pages"
I found a pdf online entitled "OS/2 Warp Server for e-business,
Keyboards and Codepages" and do not see PC8 listed in there.
It does have 'Codepage 437' and 'Codepage 819 - ISO 8859-1' andthe degree sign in code pages 437, 850 and in 819 as B0 dec.176.
comparing them shows the same results I have stated above.
'...' En Møøse hade en gång min syster ...What is this .................^^ in Latin 1?
F8 or dec.248 (not a character in CP437). Yes it is and represents
..... the second and third characters in Møøse,
and E5 or dec.229 (86 or dec.134 in CP437) for the second character
in gång.
"LATIN SMALL LETTER A WITH RING ABOVE" which I believe in Swedish is
called the small letter angstrom. Please correct me if I am wrong.
The expression 'diaeresis' doesn't exist in my vocabulary or dictionary.
However, if diaeresis is the same as the 'divide' sign on the numeric keyboard I agree. That comes out as the Umlaut 'o' in when translated
from Latin 1.
https://www.google.com/search?q="O+WITH+DIAERESIS"
looking at the above, one can see that "diaeresis" is "two dots on
top"...
the O or o with the forward slash like the divided-by symbol is its
own separate vowel character/letter in Scandianiavian...
diaeresis and umlaut look the same (two dots on top) but they
signify different pronounciations...
"The diaeresis and the umlaut are diacritics marking two distinct
phonological phenomena. The diaeresis represents the phenomenon
also known as diaeresis or hiatus in which a vowel letter is
pronounced separately from an adjacent vowel and not as part of a
digraph or diphthong. The umlaut (/'?mla?t/), in contrast,
indicates a sound shift. These two diacritics originated separately;
the diaeresis is considerably older."
in unicode, both are coded the same so something like HTML ä is
both a-umlaut and a-diaeresis in the same way that the hyphen and
minus are represented by the same character glyph...
OK, the divide sign on the numerical keypad is a dash with dots
above and below the dash.
Thanks for that 'Ø' addition to my UTF conversion table.
"LATIN SMALL LETTER A WITH RING ABOVE" which I believe in Swedish is called the small letter angstrom. Please correct me if I am wrong.
Correct, but so far I can't recall having seen that letter in a
danish text, but I may be wrong. Let's hear what Benny says <BG>.
However, if diaeresis is the same as the 'divide' sign
OK, the divide sign on the numerical keypad is a dash with dots above
and below the dash.
On 2019 Feb 23 12:33:00, you wrote to Maurice Kinal:
OK, the divide sign on the numerical keypad is a dash with dots above
and below the dash.
not on my keyboard... it is the "/" character...
OK, the divide sign on the numerical keypad is a dash with dots
above and below the dash.
On my keyboard it is the '/' character but I've seen some keyboards
that use that divide sign. I see it as F6 or dec.246 in CP850.
That translates to the '÷' character in utf8 - usually written as
U+00F7 or \u00f7 in bash.
other than the 'Møøse' part which is neither Swedish or Dansk. It
is a bogus word for moose which requires the Norwegian slashed small
'o' characters to enhance the taglines. That will always be the
same no matter what language. For example in German it would be
"Ein Møøse hat meine Schwester einmal gebissen ..."
So the samll angstrom is in the tagline below simply because I am
replying to you .....
That translates to the '÷' character in utf8 - usually written
as U+00F7 or \u00f7 in bash.
This is also a new sign for my UTF conversion whatever use I may
have.
If I translate that german line to swedish, norwegian or danish
Note that I am replying on a different machine this time since I am
in the middle of a major overhaul on the raspi3b+ which will take at
least another 32 hours.
That translates to the '÷' character in utf8 - usually written
as U+00F7
An excellent online source for utf8 characters is http://www.utf8-chartable.de/
up in hex editors to the corresponding utf8 characters.
For example U+00F7 will show up as a hex 'c3 b7' pair
whereas the small slashed 'o' characters in Møøse show up as÷ ÷
'c3 b8' hex pairs.
-={ '<Esc>:read !echo -e "\u00f7 \xc3\xb7"' starts }=-
-={ '<Esc>:read !echo -e "\u00f7 \xc3\xb7"' ends }=-
Imagine that!!! It works!!!
If I translate that german line to swedish, norwegian or danish
-={ '<Esc>:read !trans -b -no-ansi -s swedish -t english "En Møøse
If you have a better Swedish translation I would like to see it but
the Møøse stays, no matter what my sister or anyone
That character pair I see as C3B7.
Thanks for that URL.
Well I can't give you a better translation than the ones I gave.
They are solely based on your german tag line.
That character pair I see as C3B7.
As it should be in a non-utf8 enviroment. Also in most European
languages, the first byte of the pair will be C3.
Yes, that is true for letters but for various other characters
the first byte is DA.
Yes, that is true for letters but for various other characters
the first byte is DA.
The only characters that are prefixed (start) with DA, range from
U+0680 (DA80) to U+06BF (DABF) and are all Arabic characters
(letters).
... Gråt inte för mig jag har vi.
I was looking for the hyphen and citation characters and the
euro sign.
With decimal interpretation I get the citation mark as 218 128
157.
I was looking for the hyphen and citation characters and the
euro sign.
The unicode for it is "U+20AC" which is a 24 bit character and thus
will show up as three hex characters in a hex editor; e2 82 ac
Does that help? I am not sure which 8 bit encoding has the euro
sign other than latin9 and there it is a4 which is dec 164.
It doesn't exist in either cp437 or cp850. Both the MS encodings
cp1250 and cp1252 show it as dec 128.
With decimal interpretation I get the citation mark as 218 128
157.
Converting e2 82 ac to decimal gives me 226 130 172.
OK, the code 218 128 162 that i interpreted as hyphen actually
is the longer 'dash'.
God natt min vän
OK, the code 218 128 162 that i interpreted as hyphen actually
is the longer 'dash'.
I am not sure what you mean but using 218 (DA) as the leading byte
means you are restricted to a 2 byte or 16 bit character and not a
24 bit character that is required for euro sign in utf8. The way
the leading byte works is like this;
The first zero shows that there are two leading ones which means
there is only one trailing byte following.
So that means either 218 128 and 162 is ignored.
For the utf8 euro character the prefix is;
dec 226 = bin 11100010
^
and as you can see the first zero yields three leading ones which is
three bytes or 24 bits.
For the record 218 128 is U+0680 which we already know to be a 16
bit Arabic character.
Thank you. Buenas noches mi amigo. :-)
God natt min vän
That may be all that is needed but if not, I can always include
the leading byte. Kind of cut and try <BG>.
I don't need more than 16-bit characters for that editor.
I knew that this letter should be the capital angstrom
'a' with two dots on to. To convert it I entered 195 184 to
translate it
How about something like this instead;86 | 134 = 00E5 | C3 A5 | 195 165
hex dec UTF8 hex dec
The above matches the small angstrom.
Also I am using IBM437 for PC-8 and near as I can tell they match
perfectly but I'll let you be the judge.
As far as 24 bit characters those are mostly symbols and line drawing characters from what I see, and it looks like all the text characters
are 16 bit and the leading byte is C3 (195).
For the degree symbol found at the end of temperatures I get a 16 bit character except with a C2 (194) as the leading byte;
I don't need more than 16-bit characters for that editor.
Other than the occasional Euro sign I suspect so.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 379 |
Nodes: | 16 (2 / 14) |
Uptime: | 54:53:11 |
Calls: | 8,066 |
Calls today: | 1 |
Files: | 13,055 |
Messages: | 5,841,555 |