I'm posting this here, because this group seems to have fairly intelligent members who have more than basic knowledge on things computerish.textual output files spit out something viewable in a text editor, (namely UTF-8), this one file is generated in UTF-16!
I'm working with LTspice and started using command scripts to generate measurement output files. But they display in my editor as all the characters being separated by nulls. On asking in the LTspice group about this, it seems that while most of the
The whole point of using the command script is to facilitate getting the results expediently, I now have to convert the durn files before I can usefully view them.format???
There's no facility to write anything into this file other than simple text. Given the other file formats this program generates are either UTF-8 or Western (ISO-8859-1), can anyone think of a reason why they would spit out UTF-16 for this one file
LTspice is free, but it's not cheap. Everytime I use it, I run into problems like this, that waste my time trying to work around them. It's like the user interface was designed by asylum inmates, *for* asylum inmates.group.
I know there's no real fix for this. I'm not looking for ways to convert the file and I can't change LTspice. I'm mostly just venting my frustration for the last week of dealing with the poor documentation and the religious fanaticism of the support
Lorem Ipsum schrieb am Samstag, 18. März 2023 um 19:09:59 UTC+1:textual output files spit out something viewable in a text editor, (namely UTF-8), this one file is generated in UTF-16!
I'm posting this here, because this group seems to have fairly intelligent members who have more than basic knowledge on things computerish.
I'm working with LTspice and started using command scripts to generate measurement output files. But they display in my editor as all the characters being separated by nulls. On asking in the LTspice group about this, it seems that while most of the
format???The whole point of using the command script is to facilitate getting the results expediently, I now have to convert the durn files before I can usefully view them.
There's no facility to write anything into this file other than simple text. Given the other file formats this program generates are either UTF-8 or Western (ISO-8859-1), can anyone think of a reason why they would spit out UTF-16 for this one file
group.LTspice is free, but it's not cheap. Everytime I use it, I run into problems like this, that waste my time trying to work around them. It's like the user interface was designed by asylum inmates, *for* asylum inmates.
I know there's no real fix for this. I'm not looking for ways to convert the file and I can't change LTspice. I'm mostly just venting my frustration for the last week of dealing with the poor documentation and the religious fanaticism of the support
FWIW the free Notepad++ text editor has a menu item Encoding for such conversions.
...group.
I know there's no real fix for this. I'm not looking for ways to convert the file and I can't change LTspice. I'm mostly just venting my frustration for the last week of dealing with the poor documentation and the religious fanaticism of the support
The whole point of using the command script is to facilitate getting the results expediently, I now have to convert the durn files before I can usefully view them.
should be considered a minor inconvenience.”The whole point of using the command script is to facilitate getting the results expediently, I now have to convert the durn files before I can usefully view them.Googling around I've found this thread: https://www.quora.com/When-should-UTF-16-encoding-be-preferred-over-UTF-8
To me the conclusion is:
„UTF-16 should only be used for interoperability with existing APIs that are incompatible with UTF-8. Absent such requirements, UTF-8 should be preferred to UTF-16. UTF-8 has a few clear advantages over UTF-16, such as:
* compatibility with ASCII
* self-synchronizing property
* endianness-independence
On the other hand, UTF-16 has zero clear advantages over UTF-8. While UTF-16 does take up less space than UTF-8 for some Asian languages, you can always just compress the UTF-8 encoding. The case for using UTF-8 everywhere is so compelling that this
While UTF-16 does take up less space than UTF-8 for some Asian languages
Yes, I can convert the file many ways. But that is a silly step. Here's a file you can't use, but you can use this other program > to convert it to a format that works for.
On Saturday, March 18, 2023 at 1:35:16 PM UTC-5, Lorem Ipsum wrote:
Yes, I can convert the file many ways. But that is a silly step. Here's a file you can't use, but you can use this other program > to convert it to a format that works for.UTF was new toy just as color was decades ago when one could go
to an office and see women who changed their display background
to magenta and print many color memos so that 100 dollar ink jets
replaced 2 dollar ribbons.
Like the 5 year old after getting into her mother's makeup stands
in front of a mirror and proudly gazes at her new visage, heavily
powered face and rouged cheeks with smeared crimson lips and eyes
darkened almost black.
On Sunday, March 19, 2023 at 12:17:19 PM UTC-4, Zbig wrote:should be considered a minor inconvenience.”
The whole point of using the command script is to facilitate getting the results expediently, I now have to convert the durn files before I can usefully view them.Googling around I've found this thread:
https://www.quora.com/When-should-UTF-16-encoding-be-preferred-over-UTF-8
To me the conclusion is:
„UTF-16 should only be used for interoperability with existing APIs that are incompatible with UTF-8. Absent such requirements, UTF-8 should be preferred to UTF-16. UTF-8 has a few clear advantages over UTF-16, such as:
* compatibility with ASCII
* self-synchronizing property
* endianness-independence
On the other hand, UTF-16 has zero clear advantages over UTF-8. While UTF-16 does take up less space than UTF-8 for some Asian languages, you can always just compress the UTF-8 encoding. The case for using UTF-8 everywhere is so compelling that this
Meanwhile, in the LTspice group, I'm being labeled a troll for talking about this.
I get that various groups have a common interest and may not be very interested in hearing about issues with a tool. But the LTspice group seems to really come down on people for even mentioning that problems exist.
People don't have that shortcoming here. They mostly just come down on people for not much at all. lol
should be considered a minor inconvenience.”The whole point of using the command script is to facilitate getting the results expediently, I now have to convert the durn files before I can usefully view them.
Googling around I've found this thread:
https://www.quora.com/When-should-UTF-16-encoding-be-preferred-over-UTF-8
To me the conclusion is:
„UTF-16 should only be used for interoperability with existing APIs that are incompatible with UTF-8. Absent such requirements, UTF-8 should be preferred to UTF-16. UTF-8 has a few clear advantages over UTF-16, such as:
* compatibility with ASCII
* self-synchronizing property
* endianness-independence
On the other hand, UTF-16 has zero clear advantages over UTF-8. While UTF-16 does take up less space than UTF-8 for some Asian languages, you can always just compress the UTF-8 encoding. The case for using UTF-8 everywhere is so compelling that this
UTF-16 has an advantage that seeking to a specific
character offset is O(1) whereas it's O(n) for UTF-8.
Likewise seeking
backwards through a string is easier for UTF-16.
Ron AARON <clf@8th-dev.com> writes:
UTF-16 has an advantage that seeking to a specific
character offset is O(1) whereas it's O(n) for UTF-8.
Wrong. Even seeking to a specific code point offset is O(n) for
UTF-16. Even UTF-32 does not give us O(1) character seeking, because
a character can be composed of several code points; UTF-32 does give
us O(1) code-point seeking, but why would one want that?
In article <2023Mar19.190719@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
Zbig <zbigniew2011@gmail.com> writes:
While UTF-16 does take up less space than UTF-8 for some Asian languages
Often claimed, but often not true. E.g., consider the web page
https://ctee.com.tw/news/tech/823656.html
This is encoded in UTF-8. Let's see how big it would be in UTF-16:
wget https://ctee.com.tw/news/tech/823656.html
recode utf8..utf16 <823656.html >823656-utf16.html
ls -l 823656*
This shows:
-rw-r--r-- 1 anton users 175148 Mar 19 19:06 823656-utf16.html
-rw-r--r-- 1 anton users 92601 Mar 19 19:05 823656.html
So for this Taiwanese web page UTF-16 is *bigger* by a factor 1.89
than UTF-8.
Viewing the ridiculous waste of website bandwidth for pictures,
I think size is hardly relevant.
Working with D1 I come accross source files with comment in Chinese. I
can decipher it with my youdoa pen (or google) and I prefer this
situation over no comment.
While at the moment English is the "lingua franca" of the Internet
and science, Chinese will become more important.
Zbig <zbigniew2011@gmail.com> writes:
While UTF-16 does take up less space than UTF-8 for some Asian languages
Often claimed, but often not true. E.g., consider the web page
https://ctee.com.tw/news/tech/823656.html
This is encoded in UTF-8. Let's see how big it would be in UTF-16:
wget https://ctee.com.tw/news/tech/823656.html
recode utf8..utf16 <823656.html >823656-utf16.html
ls -l 823656*
This shows:
-rw-r--r-- 1 anton users 175148 Mar 19 19:06 823656-utf16.html
-rw-r--r-- 1 anton users 92601 Mar 19 19:05 823656.html
So for this Taiwanese web page UTF-16 is *bigger* by a factor 1.89
than UTF-8.
- anton
In article <2023Mar1...@mips.complang.tuwien.ac.at>,
Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
Zbig <zbigni...@gmail.com> writes:
While UTF-16 does take up less space than UTF-8 for some Asian languages
Often claimed, but often not true. E.g., consider the web page
https://ctee.com.tw/news/tech/823656.html
This is encoded in UTF-8. Let's see how big it would be in UTF-16:
wget https://ctee.com.tw/news/tech/823656.html
recode utf8..utf16 <823656.html >823656-utf16.html
ls -l 823656*
This shows:
-rw-r--r-- 1 anton users 175148 Mar 19 19:06 823656-utf16.html
-rw-r--r-- 1 anton users 92601 Mar 19 19:05 823656.html
So for this Taiwanese web page UTF-16 is *bigger* by a factor 1.89Viewing the ridiculous waste of website bandwidth for pictures,
than UTF-8.
I think size is hardly relevant.
Working with D1 I come accross source files with comment in Chinese. I
can decipher it with my youdoa pen (or google) and I prefer this
situation over no comment.
While at the moment English is the "lingua franca" of the Internet
and science, Chinese will become more important.
In article <2023Mar19.190719@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
Zbig <zbigniew2011@gmail.com> writes:
While UTF-16 does take up less space than UTF-8 for some Asian languages
Often claimed, but often not true. E.g., consider the web page
https://ctee.com.tw/news/tech/823656.html
This is encoded in UTF-8. Let's see how big it would be in UTF-16:
wget https://ctee.com.tw/news/tech/823656.html
recode utf8..utf16 <823656.html >823656-utf16.html
ls -l 823656*
This shows:
-rw-r--r-- 1 anton users 175148 Mar 19 19:06 823656-utf16.html
-rw-r--r-- 1 anton users 92601 Mar 19 19:05 823656.html
So for this Taiwanese web page UTF-16 is *bigger* by a factor 1.89
than UTF-8.
Viewing the ridiculous waste of website bandwidth for pictures,
I think size is hardly relevant.
While at the moment English is the "lingua franca" of the Internet
and science, Chinese will become more important.
Back to ideograms?
What can't be done in 7-bit ASCII isn't worth doing. Less is Moore.
dxforth <dxforth@gmail.com> writes:
What can't be done in 7-bit ASCII isn't worth doing. Less is Moore.
Touché.
On Monday, March 20, 2023 at 7:11:08 AM UTC-5, dxforth wrote:
Back to ideograms?
Pure ideograms are very nice in an environment comprised of many
dialects. Knowing the ideograms, one sounds in his mind his
own dialect without translation to disturb the mind's harmony.
But Chinese got corrupted long ago when some progressive
"improved" it by adding phoneme elements to characters and
the LC (language committee) got carried away and produced thousands
of characters providing job security for the scribes.
Romanji works in Japan so would assume it should work in China
and elsewhere.
On 21/03/2023 3:02 am, S Jack wrote:
On Monday, March 20, 2023 at 7:11:08 AM UTC-5, dxforth wrote:Humans do have a habit of romanticising language and culture - especially
if they view it as being on the ascendency or as possessing something they don't. The expression of the human condition in any language is fine by me but let's keep it simple :)
On Monday, March 20, 2023 at 7:42:00 PM UTC-4, dxforth wrote:
On 21/03/2023 7:13 am, Paul Rubin wrote:
dxforth <dxf...@gmail.com> writes:Precisely :)
What can't be done in 7-bit ASCII isn't worth doing. Less is Moore.
Touché.
Users in various countries may differ. For example, the euro glyph is common ( € ) .
On 21/03/2023 7:13 am, Paul Rubin wrote:
dxforth <dxf...@gmail.com> writes:
What can't be done in 7-bit ASCII isn't worth doing. Less is Moore.
Touché.Precisely :)
Assuming an ASCII world, one byte should be plenty - 128 slots for ASCII
and 128 slots for whatever else one believes is important.
While at the moment English is the "lingua franca" of the Internet
and science, Chinese will become more important.
On 22/03/2023 8:20 pm, Doug Hoffman wrote:
On Monday, March 20, 2023 at 7:42:00 PM UTC-4, dxforth wrote:
On 21/03/2023 7:13 am, Paul Rubin wrote:
dxforth <dxf...@gmail.com> writes:Precisely :)
What can't be done in 7-bit ASCII isn't worth doing. Less is Moore.
Touché.
Users in various countries may differ. For example, the euro glyph is common ( € ) .Assuming an ASCII world, one byte should be plenty - 128 slots for ASCII
and 128 slots for whatever else one believes is important.
On Monday, March 20, 2023 at 7:53:10 AM UTC-4, none albert wrote:
While at the moment English is the "lingua franca" of the Internet<sidebar>
and science, Chinese will become more important.
As famous American baseball player, Yogi Berra, was reputed to say:
"It's hard to tell what's gonna happen, especially when it's in the future"
I won't be around to see it but China is on a demographic precipice.
Some are saying by early in the 3rd quarter of this century the population will be half of current number.
Who knows what that does? It might put Hindi in ascent.
</sidebar>
On 22/03/2023 8:20 pm, Doug Hoffman wrote:
On Monday, March 20, 2023 at 7:42:00 PM UTC-4, dxforth wrote:
On 21/03/2023 7:13 am, Paul Rubin wrote:
dxforth <dxf...@gmail.com> writes:Precisely :)
What can't be done in 7-bit ASCII isn't worth doing. Less is Moore.
Touché.
Users in various countries may differ. For example, the euro glyph is common ( € ) .Assuming an ASCII world, one byte should be plenty - 128 slots for ASCII
and 128 slots for whatever else one believes is important.
I'm pretty sure UTF-8 includes the euro glyph without machinations.
I think I'll use shift-option-2 for € (whatever of the remaining 128 slots that is). Wonder
what others will use?
I think I'll use shift-option-2 for € (whatever of the remaining 128 slots that is). Wonder
what others will use?
On Wednesday, March 22, 2023 at 6:06:58 AM UTC-4, dxforth wrote:
On 22/03/2023 8:20 pm, Doug Hoffman wrote:
On Monday, March 20, 2023 at 7:42:00 PM UTC-4, dxforth wrote:Assuming an ASCII world, one byte should be plenty - 128 slots for ASCII
On 21/03/2023 7:13 am, Paul Rubin wrote:
dxforth <dxf...@gmail.com> writes:Precisely :)
What can't be done in 7-bit ASCII isn't worth doing. Less is Moore. >>>>>Touché.
Users in various countries may differ. For example, the euro glyph is common ( € ) .
and 128 slots for whatever else one believes is important.
UTF-8 is code compatible with ASCII, while supporting as many characters as you would like. If you use ASCII, it is also UTF-8 encoded, automagically. I'm pretty sure UTF-8 includes the euro glyph without machinations.
Lorem Ipsum <gnuarm.deletethisbit@gmail.com> writes:
I'm pretty sure UTF-8 includes the euro glyph without machinations.
The codepoint is U+20AC so the utf-8 encoding is 3 bytes long. In Windows-1252 it has a single byte encoding (0x80). It doesn't seem to
exist in ISO-8859-1. In ISO-8859-15 it is 0xa4. Especially in the
Forth milieu on limited systems, I can understand the attraction of
having a single byte encoding for every character, even if that limits
the character set. I think Unicode was originally intended to be a 16
bit character set corresponding to the Unicode BMP (basic multilingual plane), but the BMP ran out of characters and now we have a contorted
mess with slightly over 20 bits but plenty of literally crap characters
(viz. U+1F4A9, the poop emoji).
On 23/03/2023 5:59 am, Lorem Ipsum wrote:
On Wednesday, March 22, 2023 at 6:06:58 AM UTC-4, dxforth wrote:
On 22/03/2023 8:20 pm, Doug Hoffman wrote:
On Monday, March 20, 2023 at 7:42:00 PM UTC-4, dxforth wrote:Assuming an ASCII world, one byte should be plenty - 128 slots for ASCII >> and 128 slots for whatever else one believes is important.
On 21/03/2023 7:13 am, Paul Rubin wrote:
dxforth <dxf...@gmail.com> writes:Precisely :)
What can't be done in 7-bit ASCII isn't worth doing. Less is Moore. >>>>>Touché.
Users in various countries may differ. For example, the euro glyph is common ( € ) .
UTF-8 is code compatible with ASCII, while supporting as many characters as you would like. If you use ASCII, it is also UTF-8 encoded, automagically. I'm pretty sure UTF-8 includes the euro glyph without machinations.
Sure but who is going to implement UTF-8 when ASCII will do? AFAIK
for every currency there is corresponding ASCII abbreviation e.g. AUD
Sure but who is going to implement UTF-8 when ASCII will do? AFAIK
for every currency there is corresponding ASCII abbreviation e.g. AUD
And they'll keep adding garbage characters because we've got all that
space now. And then the space aliens will show up and we'll need
characters for their language, but we won't have any space left, so
they'll zap us.
On Wednesday, March 22, 2023 at 9:05:55 PM UTC-4, dxforth wrote:
On 23/03/2023 5:59 am, Lorem Ipsum wrote:
On Wednesday, March 22, 2023 at 6:06:58 AM UTC-4, dxforth wrote:Sure but who is going to implement UTF-8 when ASCII will do? AFAIK
On 22/03/2023 8:20 pm, Doug Hoffman wrote:
On Monday, March 20, 2023 at 7:42:00 PM UTC-4, dxforth wrote:Assuming an ASCII world, one byte should be plenty - 128 slots for ASCII >>>> and 128 slots for whatever else one believes is important.
On 21/03/2023 7:13 am, Paul Rubin wrote:
dxforth <dxf...@gmail.com> writes:Precisely :)
What can't be done in 7-bit ASCII isn't worth doing. Less is Moore. >>>>>>>Touché.
Users in various countries may differ. For example, the euro glyph is common ( € ) .
UTF-8 is code compatible with ASCII, while supporting as many characters as you would like. If you use ASCII, it is also UTF-8 encoded, automagically. I'm pretty sure UTF-8 includes the euro glyph without machinations.
for every currency there is corresponding ASCII abbreviation e.g. AUD
Didn't you read the post that started this thread???
...group.
I know there's no real fix for this. I'm not looking for ways to convert the file and I can't change LTspice. I'm mostly just venting my frustration for the last week of dealing with the poor documentation and the religious fanaticism of the support
(at one point I wanted to run the program I had used to
produce Figure 1 of ><https://www.complang.tuwien.ac.at/anton/euroforth2005/papers/ertl%26paysan05.pdf>,
but had trouble finding a font that supported all the scripts I had
used. But over time, more stuff seems to be supported.
dxforth <dxf...@gmail.com> writes:
Sure but who is going to implement UTF-8 when ASCII will do? AFAIKFor AUD there is even an ASCII character: $
for every currency there is corresponding ASCII abbreviation e.g. AUD
However, this demonstrastes trhe advantage of currency codes over
currency signs: Currency signs are ambiguous.
The currency code for the Euro is EUR.
- anton
dxforth <dxforth@gmail.com> writes:
Sure but who is going to implement UTF-8 when ASCII will do? AFAIK
for every currency there is corresponding ASCII abbreviation e.g. AUD
For AUD there is even an ASCII character: $
Especially in the
Forth milieu on limited systems, I can understand the attraction of
having a single byte encoding for every character, even if that limits
the character set.
Lorem Ipsum <gnuarm.deletethisbit@gmail.com> writes:
I'm pretty sure UTF-8 includes the euro glyph without machinations.
The codepoint is U+20AC so the utf-8 encoding is 3 bytes long. In >Windows-1252 it has a single byte encoding (0x80). It doesn't seem to
exist in ISO-8859-1. In ISO-8859-15 it is 0xa4. Especially in the
Forth milieu on limited systems, I can understand the attraction of
having a single byte encoding for every character, even if that limits
the character set. I think Unicode was originally intended to be a 16
bit character set corresponding to the Unicode BMP (basic multilingual >plane), but the BMP ran out of characters and now we have a contorted
mess with slightly over 20 bits but plenty of literally crap characters
(viz. U+1F4A9, the poop emoji).
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (3 / 13) |
Uptime: | 44:22:04 |
Calls: | 6,710 |
Calls today: | 3 |
Files: | 12,243 |
Messages: | 5,354,110 |