Forum: >>> Magnum BBS <<<

Charset

From Stan Brown@21:1/5 to All on Thu Oct 15 14:31:10 2020

I'm trying, and failing, to write the proper charset in my meta tag.
Help, please!

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

But perfectly decent characters like �, �, � show up as a question
mark in a lozenge. I figured out that that's because my HTML files
are all plain text, 8 characters per byte, which is not UTF8 when I
use characters above 127.

So I changed the charset to latin-1, and then to iso-8859-1. With
each of them, characters 160-255 display correctly, but the W3C's
validator gives this error message:
Bad value ?text/html; charset=iso-8859-1? for attribute
?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

So what charset should I use to represent a file where every
character is 8 bits, and those 8 bits match the iso=8851-1 or latin-1
character set?

To make things even more murky, at https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta#attr-
charset
I found this gem: "If the attribute is present, its value must be an
ASCII case-insensitive match for the string "utf-8", because UTF-8 is
the only valid encoding for HTML5 documents."
If that's true, it sounds very much like I can't generate my web
pages unless I code every 160-255 character as a six-byte &#nnn;
string, which is not only a pain but makes editing harder.

(I tried looking at character encodings in Vim, and indeed it does
have a utf-8 option, but after I do my editing I run all my pages
through a very complicated awk script, and it looks like awk can't
handle UTF-8, at least not in Windows.)

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/
validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/
validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You: http://preview.tinyurl.com/WhyWont

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Eli the Bearded@21:1/5 to Stan Brown on Thu Oct 15 22:56:44 2020

In comp.infosystems.www.authoring.html,
Stan Brown <the_stan_brown@fastmail.fm> wrote:

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
But perfectly decent characters like é, ×, ² show up as a question
mark in a lozenge. I figured out that that's because my HTML files
are all plain text, 8 characters per byte, which is not UTF8 when I
use characters above 127.

The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
it's all not plain.

So I changed the charset to latin-1, and then to iso-8859-1. With
each of them, characters 160-255 display correctly, but the W3C's
validator gives this error message:
Bad value ?text/html; charset=iso-8859-1? for attribute
?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

So what charset should I use to represent a file where every
character is 8 bits, and those 8 bits match the iso=8851-1 or latin-1 character set?

...

I found this gem: "If the attribute is present, its value must be an
ASCII case-insensitive match for the string "utf-8", because UTF-8 is
the only valid encoding for HTML5 documents."

I can't tell for sure without seeing your page, but I think you are
running into the declared document type specifies an allowed list of
"charset"s to conformant to that document type. One fix is to declare
your document to be a type that allows the charset you feel you need to
use, eg some variant of HTML4. Another fix is to find a compatible
chaset from the allowed list.

https://en.wikipedia.org/wiki/Character_encodings_in_HTML#Permitted_encodings

That suggests that UTF-8 is not required, but only recommended. And
since you are coming from a Windows environment perhaps Windows-1252 is
the right HTML5 charset to use. It is a superset of ISO-8859-1, using
some additional characters between 128 and 159 as "printable" instead of
as control characters.

(I tried looking at character encodings in Vim, and indeed it does
have a utf-8 option, but after I do my editing I run all my pages
through a very complicated awk script, and it looks like awk can't
handle UTF-8, at least not in Windows.)

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/
HTML 4.01 spec: http://www.w3.org/TR/html401/

Ooo, look there.

https://www.w3.org/TR/html401/charset.html#doc-char-set

5.2.1 Choosing an encoding

[...] This specification does not mandate which character encodings
a user agent must support.

validator: http://validator.w3.org/
CSS 2.1 spec: http://www.w3.org/TR/CSS21/

That's an HTML4 document in ISO-8859-1.

validator: http://jigsaw.w3.org/css-validator/
Why We Won't Help You: http://preview.tinyurl.com/WhyWont

Okay, you validated. But you then you didn't provide a URL or complete
example.

Elijah
------
is not averse to & encodings to put Unicode into US-ASCII

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From JJ@21:1/5 to Stan Brown on Fri Oct 16 09:11:21 2020

On Thu, 15 Oct 2020 14:31:10 -0700, Stan Brown wrote:

I'm trying, and failing, to write the proper charset in my meta tag.
Help, please!

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

But perfectly decent characters like �, �, � show up as a question
mark in a lozenge. I figured out that that's because my HTML files
are all plain text, 8 characters per byte, which is not UTF8 when I
use characters above 127.

So I changed the charset to latin-1, and then to iso-8859-1. With
each of them, characters 160-255 display correctly, but the W3C's
validator gives this error message:
Bad value ?text/html; charset=iso-8859-1? for attribute
?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

So what charset should I use to represent a file where every
character is 8 bits, and those 8 bits match the iso=8851-1 or latin-1 character set?

To make things even more murky, at https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta#attr-
charset
I found this gem: "If the attribute is present, its value must be an
ASCII case-insensitive match for the string "utf-8", because UTF-8 is
the only valid encoding for HTML5 documents."
If that's true, it sounds very much like I can't generate my web
pages unless I code every 160-255 character as a six-byte &#nnn;
string, which is not only a pain but makes editing harder.

(I tried looking at character encodings in Vim, and indeed it does
have a utf-8 option, but after I do my editing I run all my pages
through a very complicated awk script, and it looks like awk can't
handle UTF-8, at least not in Windows.)

It depends on what character set which is used by the text editor/processor you're using.

If the software doesn't have any setting regarding character set, and...

If it's a non cross-platform native Windows software, the character set
should be `Windows-1252` - assuming that the OS' locale is U.S. English. Otherwise, other Windows-NNN character set should be used based on the
current locale. e.g. `Windows-1250` for central European (French, German, etc.).

If it's a DOS software, i.e. a pure DOS program, instead of a text-mode
Windows program; then the character set should be `cp437` for U.S. English. Otherwise, it's cpNNN.

Cross platform softwares mostly use UTF-8. But it case they don't, the character set could be either be cpNNN, iso-XXX, or Windows-NNNN. Depending
on the active locale.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Fri Oct 16 10:06:59 2020

Stan Brown:

I'm trying, and failing, to write the proper charset in my meta tag.
Help, please!

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

But perfectly decent characters like é, ×, ² show up as a question
mark in a lozenge. I figured out that that's because my HTML files
are all plain text, 8 characters per byte, which is not UTF8 when I
use characters above 127.

So I changed the charset to latin-1, and then to iso-8859-1. With
each of them, characters 160-255 display correctly, but the W3C's
validator gives this error message:
Bad value ?text/html; charset=iso-8859-1? for attribute
?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

Did you try <meta charset="ISO-8859-1">?

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helmut Richter@21:1/5 to Eli the Bearded on Fri Oct 16 10:09:31 2020

On Thu, 15 Oct 2020, Eli the Bearded wrote:

I can't tell for sure without seeing your page, [...]

Just tell us the URL (the web address) where we can see your page and thus
will discover

– what charset you are really using
– what the web server says about it
– what the web page tells about it
– what the default charset would be if none of the above applies

and whether these four contradict each other.

--
Helmut Richter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Helmut Richter on Fri Oct 16 06:52:23 2020

On Fri, 16 Oct 2020 10:09:31 +0200, Helmut Richter wrote:

On Thu, 15 Oct 2020, Eli the Bearded wrote:

I can't tell for sure without seeing your page, [...]

Just tell us the URL (the web address) where we can see your page and thus will discover

? what charset you are really using
? what the web server says about it
? what the web page tells about it
? what the default charset would be if none of the above applies

and whether these four contradict each other.

Sorry, I should have done that in my original.

https://brownmath.com/Charsets/

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Eli the Bearded on Fri Oct 16 06:51:12 2020

On Thu, 15 Oct 2020 22:56:44 +0000 (UTC), Eli the Bearded wrote:

In comp.infosystems.www.authoring.html,
Stan Brown <the_stan_brown@fastmail.fm> wrote:

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
But perfectly decent characters like �, �, � show up as a question
mark in a lozenge. I figured out that that's because my HTML files
are all plain text, 8 characters per byte, which is not UTF8 when I
use characters above 127.

The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
it's all not plain.

So I changed the charset to latin-1, and then to iso-8859-1. With
each of them, characters 160-255 display correctly, but the W3C's
validator gives this error message:
Bad value ?text/html; charset=iso-8859-1? for attribute
?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

So what charset should I use to represent a file where every
character is 8 bits, and those 8 bits match the iso=8851-1 or latin-1 character set?

...

I found this gem: "If the attribute is present, its value must be an
ASCII case-insensitive match for the string "utf-8", because UTF-8 is
the only valid encoding for HTML5 documents."

I can't tell for sure without seeing your page,

A fair point:
https://brownmath.com/Charsets/

I created every combination of HTML 4.01 Strict or HTML 5 with utf-8, iso-8859-1, windows-1252, and latin-1.

but I think you are
running into the declared document type specifies an allowed list of "charset"s to conformant to that document type. One fix is to declare
your document to be a type that allows the charset you feel you need to
use, eg some variant of HTML4.

HTML 4.01 Strict fails validation also, with charset UTF-8.

Another fix is to find a compatible
chaset from the allowed list.

Yes, I tried that, but the compatible charsets iso-8859-1, latin-1,
and windows-1252 all fail validation. I didn't see any others in the
list under "Encodings" on the W3C site, but maybe I missed one.

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Arno Welzel on Fri Oct 16 06:56:04 2020

On Fri, 16 Oct 2020 10:06:59 +0200, Arno Welzel wrote:

Stan Brown:

I'm trying, and failing, to write the proper charset in my meta tag.
Help, please!

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

But perfectly decent characters like �, �, � show up as a question
mark in a lozenge. I figured out that that's because my HTML files
are all plain text, 8 characters per byte, which is not UTF8 when I
use characters above 127.

So I changed the charset to latin-1, and then to iso-8859-1. With
each of them, characters 160-255 display correctly, but the W3C's
validator gives this error message:
Bad value ?text/html; charset=iso-8859-1? for attribute
?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

Did you try <meta charset="ISO-8859-1">?

Yes. In HTML 4.01 and 5, same problem as in the longer form
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Fri Oct 16 16:43:45 2020

Stan Brown:

On Fri, 16 Oct 2020 10:06:59 +0200, Arno Welzel wrote:

Stan Brown:

I'm trying, and failing, to write the proper charset in my meta tag.
Help, please!

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

But perfectly decent characters like é, ×, ² show up as a question
mark in a lozenge. I figured out that that's because my HTML files
are all plain text, 8 characters per byte, which is not UTF8 when I
use characters above 127.

So I changed the charset to latin-1, and then to iso-8859-1. With
each of them, characters 160-255 display correctly, but the W3C's
validator gives this error message:
Bad value ?text/html; charset=iso-8859-1? for attribute
?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

Did you try <meta charset="ISO-8859-1">?

Yes. In HTML 4.01 and 5, same problem as in the longer form
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Indeed - HTML 4 does not know anything about the charset attribute and
for HTML 5 using UTF-8 is a requiredment. In fact this is the *only*
allowed encoding for HTML 5. So you have convert your existing documents
to UTF-8 before publishing them.

Also see here:

<https://html.spec.whatwg.org/multipage/semantics.html#character-encoding-declaration>

4.2.5.4 Specifying the document's character encoding

A character encoding declaration is a mechanism by which the character
encoding used to store or transmit a document is specified.

The Encoding standard requires use of the UTF-8 character encoding and
requires use of the "utf-8" encoding label to identify it. Those
requirements necessitate that the document's character encoding
declaration, if it exists, specifies an encoding label using an ASCII case-insensitive match for "utf-8". Regardless of whether a character
encoding declaration is present or not, the actual character encoding
used to encode the document must be UTF-8. [ENCODING]

To enforce the above rules, authoring tools must default to using UTF-8
for newly-created documents.

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Arno Welzel on Fri Oct 16 08:16:00 2020

On Fri, 16 Oct 2020 16:43:45 +0200, Arno Welzel wrote:

Stan Brown:

On Fri, 16 Oct 2020 10:06:59 +0200, Arno Welzel wrote:

Did you try <meta charset="ISO-8859-1">?

Yes. In HTML 4.01 and 5, same problem as in the longer form
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Indeed - HTML 4 does not know anything about the charset attribute and
for HTML 5 using UTF-8 is a requiredment. In fact this is the *only*
allowed encoding for HTML 5. So you have convert your existing documents
to UTF-8 before publishing them.

Also see here:

<https://html.spec.whatwg.org/multipage/semantics.html#character-encoding-declaration>

4.2.5.4 Specifying the document's character encoding

...

The Encoding standard requires use of the UTF-8 character encoding and requires use of the "utf-8" encoding label to identify it.

...

To enforce the above rules, authoring tools must default to using UTF-8
for newly-created documents.

Well, heck! It seems unfortunate that they would retroactively change
the HTML 4.01 standard, which I am 100% certain allowed other
charsets for quite a few years.

It seems like my only options are to completely redesign how I
produce Web pages, or to declare utf-8, but only use characters 000-
127 and use numeric references for everything >=160, which will bloat
my documents.

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helmut Richter@21:1/5 to Stan Brown on Fri Oct 16 18:49:12 2020

On Fri, 16 Oct 2020, Stan Brown wrote:

4.2.5.4 Specifying the document's character encoding

...

The Encoding standard requires use of the UTF-8 character encoding and requires use of the "utf-8" encoding label to identify it.

...

To enforce the above rules, authoring tools must default to using UTF-8
for newly-created documents.

This *did* surprise me. I had thought the "<meta charset=..."> would have a meaning beyond recognising that one has no choice. Well, I switched to UTF-8 before I switched to HTML5, so I did not notice that as a problem. After all, UTF-8 has existed for 17 years now. And my native tongue requires much more non-ASCII characters than English does, so there is more to change.

Well, heck! It seems unfortunate that they would retroactively change
the HTML 4.01 standard, which I am 100% certain allowed other
charsets for quite a few years.

You should notice that

<meta http-equiv="Content-Type" content="text/html; charset=any-code">
(HTML before HTML5 as well)

and

<meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

have different meanings. meta_http-equiv is a hint to the web server to
declare the content type and encoding via the HTTP protocol. This hint is actually ignored by the web server you use, I see only "Content-Type: text/html" appearing¹). By many browsers it is also interpreted as if it were a declaration of the encoding used in the document – this is why it works and will probably work as long as HTML4 documents exist and are interpreted by browsers. But strictly speaking, it is not a usage of anything that is well-defined in HTML4. – meta_charset is indeed a declaration of the encoding used in the document, albeit meaningless as there is no choice.

¹) The full answer of the web server to the browser's request for https://brownmath.com/Charsets/charset_utf-8_html4.htm was:

HTTP/1.1 200 OK
Server: nginx
Date: Fri, 16 Oct 2020 16:12:13 GMT
Content-Type: text/html
Content-Length: 798
Connection: keep-alive
Last-Modified: Fri, 16 Oct 2020 13:43:53 GMT
ETag: "31e-5b1c9f48d5840"
alt-svc: quic=":443"; ma=86400; v="43,39"
Host-Header: 5d77dd967d63c3104bced1db0cace49c
X-Proxy-Cache: MISS
Accept-Ranges: bytes

So, you are not in a hurry to change anything, but you should have a plan for the future. You can even validate your non-UTF-8 HTML files:

* Declare them as HTML4, otherwise it will complain that only UTF-8 is allowed.
* Before starting the validator, check „More Options“ and fill in the correct encoding.

I tried it out with https://brownmath.com/Charsets/charset_utf-8_html4.htm, and it worked.

I consider the behaviour of the validator extreme user-unfriendly. When people use habits that were not only tolerated but even recommended in the past, it could give a hint that and why they are no longer supported and what to do instead.

It seems like my only options are to completely redesign how I
produce Web pages, or to declare utf-8, but only use characters 000-
127 and use numeric references for everything >=160, which will bloat
my documents.

I am not sure it requires a complete redesign. When I changed to UTF-8, I had only to tell the editor used that it should encode in UTF-8 instead of ISO-8859-1. Well, I work on a Unix system, and the editor used is emacs, which has such an option. Windows has the problem that it sometime changes the encoding without any notice to the user. When I do have to use Windows, I use Notepad++ which also has an option to control the code to be used. (People always working on Windows will perhaps have better recommendations; I just needed *anything* capable of reliably producing UTF-8 output.)

For recoding the existing web pages, I had a little script.

I warn you of installing a legacy workplace consisting of more and more lecagy work-arounds. It is less work to switch to UTF-8 but there is no need to do it all in one night.

--
Helmut Richter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jukka K. Korpela@21:1/5 to Stan Brown on Fri Oct 16 23:42:47 2020

Stan Brown wrote:

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Just remove it, unless it matches the actual encoding used.

But perfectly decent characters like é, ×, ² show up as a question
mark in a lozenge.

Apparently it is not utf-8 encoded.

I figured out that that's because my HTML files
are all plain text,

HTML is by definition not plain text.

So I changed the charset to latin-1, and then to iso-8859-1. With
each of them, characters 160-255 display correctly,

Fine. Stop there. Latin-1 and iso-8859-1 are equivalent, and so is
windows-1252 in practice.

but the W3C's
validator gives this error message:
Bad value ?text/html; charset=iso-8859-1? for attribute
?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

Ignore it.

So what charset should I use to represent a file where every
character is 8 bits, and those 8 bits match the iso=8851-1 or latin-1 character set?

Windows-1252. But the “validator” still says it’s wrong.

If that's true, it sounds very much like I can't generate my web
pages unless I code every 160-255 character as a six-byte &#nnn;
string, which is not only a pain but makes editing harder.

You can conform to utf-8 by doing so, or by actually using utf-8.

But as opposite to using latin-1, it only amounts to worshipping
whatever WHATWG (and W3C) declared holy. And there are obvious risks
whenever you eidt your pages using a tool that does not conform to the
same confession.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jukka K. Korpela@21:1/5 to Eli the Bearded on Fri Oct 16 23:30:41 2020

Eli the Bearded wrote:

The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
it's all not plain.

That’s nonsense. Plain text is just text, as oppotite to “rich text”, like MS Word format, or HTML.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Jukka K. Korpela on Fri Oct 16 15:00:30 2020

On Fri, 16 Oct 2020 23:42:47 +0300, Jukka K. Korpela wrote:

Stan Brown wrote:

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Just remove it, unless it matches the actual encoding used.

Brilliant! I tried with no <meta .. charset> tag. The characters were
displayed correctly in the HTML5 and HTML4.01 versions, and the HTML5
version passed validation. (The W3C validator failed the HTML4.01
version with "obsolete DOCTYPE", which seems a bit harsh.) The
revised examples are at <URL:https://brownmath.com/Charsets/>.

I know that encoding is complicated, but just because the characters
are displayed correctly in my browsers, is it safe to assume they'll
be correct in (the great majority of) other browsers?

I guess in a way I'm asking: what figures out the document encoding
if it's not specified, the Web server or the user-agent? If it's the
server, then the fact that they worked for me says they should work
for anyone. But if it's the browser, maybe not so much.

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Sat Oct 17 00:14:21 2020

Helmut Richter:

[...]

You should notice that

<meta http-equiv="Content-Type" content="text/html; charset=any-code">
(HTML before HTML5 as well)

and

<meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

have different meanings. meta_http-equiv is a hint to the web server to declare the content type and encoding via the HTTP protocol. This hint is actually ignored by the web server you use, I see only "Content-Type: text/html" appearing¹).

[...]

Because it is not for the server but for the *browser*.

In fact this meta element is used *instead* sending a HTTP response
header. That's why it is called "http-equiv" - it should be treated by
the *browser* in the same way as the respective HTTP header for the Content-Type.

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Helmut Richter on Fri Oct 16 15:19:52 2020

On Fri, 16 Oct 2020 18:49:12 +0200, Helmut Richter wrote:

You should notice that

<meta http-equiv="Content-Type" content="text/html; charset=any-code">
(HTML before HTML5 as well)

and

<meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

have different meanings. meta_http-equiv is a hint to the web server to declare the content type and encoding via the HTTP protocol. This hint is actually ignored by the web server you use, I see only "Content-Type: text/html" appearing�). By many browsers it is also interpreted as if it were a declaration of the encoding used in the document ? this is why it works and will probably work as long as HTML4 documents exist and are interpreted by browsers. But strictly speaking, it is not a usage of anything that is well-defined in HTML4. ? meta_charset is indeed a declaration of the encoding used in the document, albeit meaningless as there is no choice.

�) The full answer of the web server to the browser's request for https://brownmath.com/Charsets/charset_utf-8_html4.htm was:

HTTP/1.1 200 OK
Server: nginx
Date: Fri, 16 Oct 2020 16:12:13 GMT
Content-Type: text/html
Content-Length: 798
Connection: keep-alive
Last-Modified: Fri, 16 Oct 2020 13:43:53 GMT
ETag: "31e-5b1c9f48d5840"
alt-svc: quic=":443"; ma=86400; v="43,39"
Host-Header: 5d77dd967d63c3104bced1db0cace49c
X-Proxy-Cache: MISS
Accept-Ranges: bytes

Interesting. The server doesn't seem to send information about
document encoding. I guess that means the browser is left to figure
it out?

So, you are not in a hurry to change anything, but you should have a plan for the future. You can even validate your non-UTF-8 HTML files:

* Declare them as HTML4, otherwise it will complain that only UTF-8 is allowed.
* Before starting the validator, check ?More Options? and fill in the correct encoding.

I tried it out with https://brownmath.com/Charsets/charset_utf-8_html4.htm, and it worked.

Actually in my build procedure I don't use the W3C validator. I use
NSGMLS, an ancient tool that parses against the DOCTYPE specified in
the document, using the referenced DOCTYPE file. If I'm not
mistaken, there is no DOCTYPE file for <!DOCTYPE html>, so if I want
to start publishing HTML5 pages (I do), I'll have to find a command-
line tool that returns an appropriate pass or fail value, so that my
makefile knows to stop or keep going.

My build sequence is:
1. Do manual edits to source files (not the HTML documents), using
vim. the source files contain a mix of ordinary text and HTML plus a
lot of macro and #includes and function calls.
2. Run a MAKE to rebuild the HTML pages that need it.

For each HTML page to be rebuilt:
a. Run the awk script that processes includes, macros and functions
into static HTML.
b. Call the local validator, which gives a status result.
c. If the page validates, go to the next file.
d. If the page fails to validate, stop.

I consider the behaviour of the validator extreme user-unfriendly.
When people use habits that were not only tolerated but even
recommended in the past, it could give a hint that and why they are
no longer supported and what to do instead.

Indeed yes!

It seems like my only options are to completely redesign how I
produce Web pages, or to declare utf-8, but only use characters 000-
127 and use numeric references for everything >=160, which will bloat
my documents.

I am not sure it requires a complete redesign. When I changed to UTF-8, I had only to tell the editor used that it should encode in UTF-8 instead of ISO-8859-1. Well, I work on a Unix system, and the editor used is emacs, which
has such an option.

Vim does too. There are two problems: (a) I haven't figured out how
to do editing in that mode, and (b) according to what I read on the
Web, awk can't handle UTF-8 files correctly if they contain any
multi-byte characters.

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Sat Oct 17 00:22:07 2020

Stan Brown:

On Fri, 16 Oct 2020 23:42:47 +0300, Jukka K. Korpela wrote:

Stan Brown wrote:

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Just remove it, unless it matches the actual encoding used.

Brilliant! I tried with no <meta .. charset> tag. The characters were displayed correctly in the HTML5 and HTML4.01 versions, and the HTML5
version passed validation. (The W3C validator failed the HTML4.01
version with "obsolete DOCTYPE", which seems a bit harsh.) The
revised examples are at <URL:https://brownmath.com/Charsets/>.

Well this is just by chance correct. In fact your server does not send
any charset at all:

HTTP/2 200 OK
server: nginx
date: Fri, 16 Oct 2020 22:16:16 GMT
content-type: text/html
content-length: 784
last-modified: Fri, 16 Oct 2020 13:44:01 GMT
etag: "310-5b1c9f5076a40"
alt-svc: quic=":443"; ma=86400; v="43,39"
host-header: 5d77dd967d63c3104bced1db0cace49c

I know that encoding is complicated, but just because the characters
are displayed correctly in my browsers, is it safe to assume they'll
be correct in (the great majority of) other browsers?

That depends on your audience.

I guess in a way I'm asking: what figures out the document encoding
if it's not specified, the Web server or the user-agent? If it's the

Browsers use default characters sets or try to detect it.

server, then the fact that they worked for me says they should work
for anyone. But if it's the browser, maybe not so much.

The server has nothing to do with it - see above: no indication at all
what encoding is used.

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Sat Oct 17 00:29:51 2020

Eli the Bearded:

The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
it's all not plain.

What exactly is not "plain" in a text encoded as UTF-8 or Windows-1252?

And why do you define "ASCII = plain"? Even ASCII has its history of
changes and not all 7-bit characters had the same meaning in the past:

<https://www.aivosto.com/articles/charsets-7bit.html>

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Eli the Bearded@21:1/5 to usenet@arnowelzel.de on Fri Oct 16 23:58:37 2020

In comp.infosystems.www.authoring.html,
Arno Welzel <usenet@arnowelzel.de> wrote:

Eli the Bearded:

The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
it's all not plain.

What exactly is not "plain" in a text encoded as UTF-8 or Windows-1252?

It is not "plain" in the sense of how documents without content types
should be interpreted according to the RFCs I remember reading. Consider

RFC-2045 - Multipurpose Inter Mail Extensions (MIME) Part One:

5.2. Content-Type Defaults

Default RFC 822 messages without a MIME Content-Type header are taken
by this protocol to be plain text in the US-ASCII character set,
which can be explicitly specified as:

Content-type: text/plain; charset=us-ascii

This default is assumed if no Content-Type header field is specified.
It is also recommend that this default be assumed when a
syntactically invalid Content-Type header field is encountered. In
the presence of a MIME-Version header field and the absence of any
Content-Type header field, a receiving User Agent can also assume
that plain US-ASCII text was the sender's intent. Plain US-ASCII
^^^^^^^^^^^^^^
text may still be assumed in the absence of a MIME-Version or the
^^^^^^^^^^^^^^^^^^^^^^^^^
presence of an syntactically invalid Content-Type header field, but
the sender's intent might have been otherwise.

and

RFC-2046 - Multipurpose Inter Mail Extensions (MIME) Part Two:

4.1.2. Charset Parameter

A critical parameter that may be specified in the Content-Type field
for "text/plain" data is the character set. This is specified with a
"charset" parameter, as in:

Content-type: text/plain; charset=iso-8859-1

Unlike some other parameter values, the values of the charset
parameter are NOT case sensitive. The default character set, which
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
must be assumed in the absence of a charset parameter, is US-ASCII.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

And why do you define "ASCII = plain"? Even ASCII has its history of
changes and not all 7-bit characters had the same meaning in the past:

Agreed that ASCII was not created in it's final form.

<https://www.aivosto.com/articles/charsets-7bit.html>

The last change to ASCII there is in 1986. The last change there that
involved the characters enumerated by ASCII was in 1977. The list of
things that were important for computers in 1977 that are still
important today is very small. ASCII, awkward as it is for many
purposes, remains a bedrock upon which other, better things, are
built. I just don't call UTF-8, eg, "plain text".

Elijah
------
notes that the unicode character table is written in US-ASCII

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From The Doctor@21:1/5 to *@eli.users.panix.com on Sat Oct 17 01:25:37 2020

In article <eli$2010161958@qaz.wtf>,
Eli the Bearded <*@eli.users.panix.com> wrote:

In comp.infosystems.www.authoring.html,
Arno Welzel <usenet@arnowelzel.de> wrote:

Eli the Bearded:

The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
it's all not plain.

What exactly is not "plain" in a text encoded as UTF-8 or Windows-1252?

It is not "plain" in the sense of how documents without content types
should be interpreted according to the RFCs I remember reading. Consider

RFC-2045 - Multipurpose Inter Mail Extensions (MIME) Part One:

5.2. Content-Type Defaults

Default RFC 822 messages without a MIME Content-Type header are taken
by this protocol to be plain text in the US-ASCII character set,
which can be explicitly specified as:

Content-type: text/plain; charset=us-ascii

This default is assumed if no Content-Type header field is specified.
It is also recommend that this default be assumed when a
syntactically invalid Content-Type header field is encountered. In
the presence of a MIME-Version header field and the absence of any
Content-Type header field, a receiving User Agent can also assume
that plain US-ASCII text was the sender's intent. Plain US-ASCII
^^^^^^^^^^^^^^
text may still be assumed in the absence of a MIME-Version or the
^^^^^^^^^^^^^^^^^^^^^^^^^
presence of an syntactically invalid Content-Type header field, but
the sender's intent might have been otherwise.

and

RFC-2046 - Multipurpose Inter Mail Extensions (MIME) Part Two:

4.1.2. Charset Parameter

A critical parameter that may be specified in the Content-Type field
for "text/plain" data is the character set. This is specified with a
"charset" parameter, as in:

Content-type: text/plain; charset=iso-8859-1

Unlike some other parameter values, the values of the charset
parameter are NOT case sensitive. The default character set, which
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
must be assumed in the absence of a charset parameter, is US-ASCII.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

And why do you define "ASCII = plain"? Even ASCII has its history of
changes and not all 7-bit characters had the same meaning in the past:

Agreed that ASCII was not created in it's final form.

<https://www.aivosto.com/articles/charsets-7bit.html>

The last change to ASCII there is in 1986. The last change there that >involved the characters enumerated by ASCII was in 1977. The list of
things that were important for computers in 1977 that are still
important today is very small. ASCII, awkward as it is for many
purposes, remains a bedrock upon which other, better things, are
built. I just don't call UTF-8, eg, "plain text".

Elijah
------
notes that the unicode character table is written in US-ASCII

Move with the times!
--
Member - Liberal International This is doctor@@nl2k.ab.ca Ici doctor@@nl2k.ab.ca
Yahweh, Queen & country!Never Satan President Republic!Beware AntiChrist rising!
Look at Psalms 14 and 53 on Atheism https://www.empire.kred/ROOTNK?t=94a1f39b BC save the Province; on 24 October 2020, vote Liberal and not NDP!

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Phillip Helbig (undress to reply@21:1/5 to the_stan_brown@fastmail.fm on Sat Oct 17 06:03:49 2020

In article <MPG.39f3b7d5df3b90e798fd73@news.individual.net>, Stan Brown <the_stan_brown@fastmail.fm> writes:

I know that encoding is complicated, but just because the characters
are displayed correctly in my browsers, is it safe to assume they'll
be correct in (the great majority of) other browsers?

In general, no.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helmut Richter@21:1/5 to Arno Welzel on Sat Oct 17 10:36:35 2020

On Sat, 17 Oct 2020, Arno Welzel wrote:

Helmut Richter:

[...]

You should notice that

<meta http-equiv="Content-Type" content="text/html; charset=any-code">
(HTML before HTML5 as well)

and

<meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

have different meanings. meta_http-equiv is a hint to the web server to declare the content type and encoding via the HTTP protocol. This hint is actually ignored by the web server you use, I see only "Content-Type: text/html" appearing¹).

[...]

Because it is not for the server but for the *browser*.

In fact this meta element is used *instead* sending a HTTP response
header. That's why it is called "http-equiv" - it should be treated by
the *browser* in the same way as the respective HTTP header for the Content-Type.

Sounds reasonable. Thank you.

So the author of the wep page can be a little more sure that the browser
feels obliged to respect it. As far as I know, browsers also honour
<meta charset="iso-8859-1"> even though it does not conform to the
standard.

--
Helmut Richter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jukka K. Korpela@21:1/5 to Eli the Bearded on Sat Oct 17 16:33:13 2020

Eli the Bearded wrote:

What exactly is not "plain" in a text encoded as UTF-8 or Windows-1252?

It is not "plain" in the sense of how documents without content types
should be interpreted according to the RFCs I remember reading.

The RFC that defines “plain text” as a media type (text/plain) is RFC
2046. It has been updated by other RFCs, but this fundamental definition
has not changed:

(1) text -- textual information. The subtype "plain" in
particular indicates plain text containing no
formatting commands or directives of any sort. Plain
text is intended to be displayed "as-is". No special
software is required to get the full meaning of the
text, aside from support for the indicated character
set.

Thus, HTML is by definition not plain text. It is required to contain
markup, and it is not intended to be displayed “as-is”, with <!doctype

and start and end tags and entity references included.

that plain US-ASCII text was the sender's intent. Plain US-ASCII
^^^^^^^^^^^^^^
text may still be assumed in the absence of a MIME-Version or the
^^^^^^^^^^^^^^^^^^^^^^^^^
presence of an syntactically invalid Content-Type header field, but
the sender's intent might have been otherwise.

Statements like this just formalize the idea that e-mail content is to
be taken as Ascii encoded plain text, unless specified otherwise.
“Plain” and “US-ASCII” are two distinct attributes here. A Content-Type header may override either or both of them.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jukka K. Korpela@21:1/5 to Stan Brown on Sat Oct 17 16:15:23 2020

Stan Brown wrote:

On Fri, 16 Oct 2020 23:42:47 +0300, Jukka K. Korpela wrote:

Stan Brown wrote:

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Just remove it, unless it matches the actual encoding used.

Brilliant! I tried with no <meta .. charset> tag. The characters were displayed correctly in the HTML5 and HTML4.01 versions, and the HTML5
version passed validation.

What I tried to say is that declaring an encoding that is not the actual encoding used (or compatible with it) is worse than not declaring the
encoding at all. This gives the user agent a chance to guess right, as
opposite to applying wrong information.

I know that encoding is complicated, but just because the characters
are displayed correctly in my browsers, is it safe to assume they'll
be correct in (the great majority of) other browsers?

Encoding isn’t that complicated, but guessing the encoding is. The
WHATWG description deals with the overall process rather than specific heuristics, but it seems very probable that browsers will guess
correctly between windows-1252 and utf-8 if actual non-Ascii data
appears within 1,000 or so characters in the HTML file. But of course it
is not completely safe. https://html.spec.whatwg.org/multipage/parsing.html#determining-the-character-encoding

I guess in a way I'm asking: what figures out the document encoding
if it's not specified, the Web server or the user-agent?

In theory, it could also be the server. There is no law against a server
scan of a document to guess the encoding and to add an HTTP header
accordingly. But I have not heard of such things, and it does not sound productive. So it’s the use agent.

The practical way is to use

<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

and to ignore what the validator says about it. You can even use
automated ignoring by using the W3C validator tools for hiding messages
by type.

WHATWG and W3C just wish to promote UTF-8 on all pages at any cost.
That’s why they specify that only UTF-8 is kosher and make the validator
nag about it.

The theoretically most correct way is to make the server send HTTP
headers specifying the encoding. I have no idea how to do that when
using Nginx. You might need access to the server configuration files.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Jukka K. Korpela on Sat Oct 17 13:03:04 2020

On Sat, 17 Oct 2020 16:15:23 +0300, Jukka K. Korpela wrote:

Stan Brown wrote:

On Fri, 16 Oct 2020 23:42:47 +0300, Jukka K. Korpela wrote:

What I tried to say is that declaring an encoding that is not the
actual encoding used (or compatible with it) is worse than not
declaring the encoding at all. This gives the user agent a chance
to guess right, as opposite to applying wrong information.

I know that encoding is complicated, but just because the characters
are displayed correctly in my browsers, is it safe to assume they'll
be correct in (the great majority of) other browsers?

[Answer: not completely safe]

The practical way is to use

<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

QUESTION 1: Any reason you suggest that rather than the simpler
<meta charset="windows-1252">
? This page says the two forms are equivalent in HTML5: https://stackoverflow.com/questions/4696499/meta-charset-utf-8-vs- meta-http-equiv-content-type

and to ignore what the validator says about it. You can even use
automated ignoring by using the W3C validator tools for hiding messages
by type.

Yes, I found the vnu validator as a windows binary here: https://github.com/validator/validator/releases/tag/20.6.30
and I noticed that one option lets me filter out messages.

I'm quite excited about this -- it should make it possible to switch
everything from valid HTML 4.01 to valid HTML5, _and_ do better
validation. (I found that vnu even parses CSS inside <style>...
</style>.)

QUESTION 2: It would be awfully convenient to type a Windows
apostrophe (8-bit character 146) rather than ’ or ’. If
I specify a charset of windows-1252, am I safe to do that, or should
I still stay away from Windows characters 128-159?

QUESTION 3: If I should still stay away from 128-159, even with a
windows-1252 declaration, is there any particular reason you suggest windows-1252 rather than iso-8859-1? know they're the same for 32-
127 and 160-255, but in my mind windows-1252 suggests that I'll be
using Windows 128-159, and iso-8859-1 does not.

WHATWG and W3C just wish to promote UTF-8 on all pages at any cost.
That?s why they specify that only UTF-8 is kosher and make the validator
nag about it.

The theoretically most correct way is to make the server send HTTP
headers specifying the encoding. I have no idea how to do that when
using Nginx. You might need access to the server configuration files.

I think I can get that access, probably via some override file in my
root directory. In fact, there's already a .htaccess file there with
one AddType, so I think it must be an Apache server or a workalike.
I should be able to add
AddType text/plain;charset=windows-1252
AddType text/html;charset=windows-1252
and have the server emit the desired headers. But the stackoverflow
article above makes the point that we still want to include a charset
in each file, for the folks who download a file for later reading.

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helmut Richter@21:1/5 to Stan Brown on Sat Oct 17 22:35:32 2020

On Sat, 17 Oct 2020, Stan Brown wrote:

QUESTION 2: It would be awfully convenient to type a Windows
apostrophe (8-bit character 146) rather than ’ or ’. If
I specify a charset of windows-1252, am I safe to do that, or should
I still stay away from Windows characters 128-159?

There is no reason to stay away from code points that are defined in the
code.

(I don’t have the problem, though. If I want a real apostrophe like the
one in the preceding sentence, I just type it (on my keyboard AltGr+'),
and it lands in the file as the UTF-8 representation of that character.
When I look into that file on the screen or I when print it, I see exactly
the apostrophe I typed in – as an apostroph, not as a code point number.
No need ever to use &#... or to worry about code point numbers.)

QUESTION 3: If I should still stay away from 128-159, even with a windows-1252 declaration, is there any particular reason you suggest windows-1252 rather than iso-8859-1? know they're the same for 32-
127 and 160-255, but in my mind windows-1252 suggests that I'll be
using Windows 128-159, and iso-8859-1 does not.

If you use these code points, you have to specify windows-1252; if not,
the effect is the same for the two code names.

--
Helmut Richter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Sat Oct 17 23:41:54 2020

Eli the Bearded:

In comp.infosystems.www.authoring.html,
Arno Welzel <usenet@arnowelzel.de> wrote:

Eli the Bearded:

The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
it's all not plain.

What exactly is not "plain" in a text encoded as UTF-8 or Windows-1252?

It is not "plain" in the sense of how documents without content types
should be interpreted according to the RFCs I remember reading. Consider

RFC-2045 - Multipurpose Inter Mail Extensions (MIME) Part One:

5.2. Content-Type Defaults

Default RFC 822 messages without a MIME Content-Type header are taken
by this protocol to be plain text in the US-ASCII character set,
which can be explicitly specified as:

Read carefully:

"plain text in the US-ASCII character set"

This means "plain text" *and* "in the US-ASCII character set".

There is no definition that "plain text" must be US-ASCII only.

Content-type: text/plain; charset=us-ascii

This default is assumed if no Content-Type header field is specified.

Yes - because there is an RFC which defines a specific context where a
missing Content-Type means that content is understood as text encoded in US-ASCII. However *if* there is a content type then "text/plain" is also
valid with other encodings:

Content-type: text/plain; charset=utf-8

Is no less "plain text" as the one using US-ASCII. That's why I would
not say that "plain text" is the sime like "plain text using US-ASCII".

See for example: <https://arnowelzel.de/samples/plain-text-utf8/>

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Arno Welzel on Sat Oct 17 18:37:51 2020

On Sat, 17 Oct 2020 00:22:07 +0200, Arno Welzel wrote:

Stan Brown:

On Fri, 16 Oct 2020 23:42:47 +0300, Jukka K. Korpela wrote:

Stan Brown wrote:

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Just remove it, unless it matches the actual encoding used.

Brilliant! I tried with no <meta .. charset> tag. The characters were displayed correctly in the HTML5 and HTML4.01 versions, and the HTML5 version passed validation. (The W3C validator failed the HTML4.01
version with "obsolete DOCTYPE", which seems a bit harsh.) The
revised examples are at <URL:https://brownmath.com/Charsets/>.

Well this is just by chance correct. In fact your server does not send
any charset at all:

HTTP/2 200 OK
server: nginx
date: Fri, 16 Oct 2020 22:16:16 GMT
content-type: text/html
content-length: 784
last-modified: Fri, 16 Oct 2020 13:44:01 GMT
etag: "310-5b1c9f5076a40"
alt-svc: quic=":443"; ma=86400; v="43,39"
host-header: 5d77dd967d63c3104bced1db0cace49c

Thanks for this. Apparently nginx accepts Apache directives in the
.htaccess file. I've added them. I used W3C's i18n checker at https://validator.w3.org/i18n-checker/
to verify that the server now sends "charset windows-1252". And in
both Firefox and Chrome, the Windows-1252 characters are displayed
correctly even if I have a conflicting charset declared in the actual
html file.

If you have a different browser, and if you care to check, could you
let me know how
https://brownmath.com/Charsets/charset_utf-8_html5.htm
shows up in your browser, whether the Windows characters in the last
paragraph are displayed?

And I'll change my scripts to declare a charset of windows-1252(*)
instead of utf-8, and <!DOCTYPE html>, and run everything through
W3C's command-line verifier. Fun!

(*)Unless someone thinks I should use iso-8859-1. But I'd kind of
like to use the Windows characters in code points 128-159: having the
quotes and dashes instead of &...; codes would simplify editing.

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jukka K. Korpela@21:1/5 to Stan Brown on Sun Oct 18 13:49:41 2020

Stan Brown wrote:

<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

QUESTION 1: Any reason you suggest that rather than the simpler
<meta charset="windows-1252">

No good reason. I just wrote the original format because I learned it 25
years or so ago

? This page says the two forms are equivalent in HTML5:

They are. The people who worked on HTML5 detected that all browsers
treat <meta charset="windows-1252"> as equivalent to the defined format.
I’m not sure what kind of accident this was, but anyway they made it a rule.

QUESTION 2: It would be awfully convenient to type a Windows
apostrophe (8-bit character 146) rather than ’ or ’. If
I specify a charset of windows-1252, am I safe to do that, or should
I still stay away from Windows characters 128-159?

You’re safe. Twenty years ago it was different.

QUESTION 3: If I should still stay away from 128-159, even with a windows-1252 declaration, is there any particular reason you suggest windows-1252 rather than iso-8859-1?

The reason is that browsers treat iso-8859-1 as windows-1252, and HTML5
made this the rule. In the old times it was different, mainly in the
sense that browsers running on Unix platforms actually treated
iso-8859-1 declared data so that octets 128–159 were control characters
and sometimes had odd effects.

I think I can get that access, probably via some override file in my
root directory. In fact, there's already a .htaccess file there with
one AddType, so I think it must be an Apache server or a workalike.
I should be able to add
AddType text/plain;charset=windows-1252
AddType text/html;charset=windows-1252
and have the server emit the desired headers.

I’m afraid Nginx does not support .htaccess but has other tools.

But the stackoverflow
article above makes the point that we still want to include a charset
in each file, for the folks who download a file for later reading.

That’s a valid point, because browsers probably still haven’t learned to save a web page locally in a proper way. That is, they don’t use the
HTTP headers when saving the file. This is understandable, since file
systems generally lack a file type concept that involves character
encoding, and to save the encoding information in the file itself, the
browser would need to insert a tag there. This means that the browser
would need to 1) save the file as a serialization of the browser’s
internal data structure for it, with a meta element inserted, thereby
producing something that might differ very much from the original file,
or 2) to operate on the document as text and inserting a meta element at
the appropriate place.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jukka K. Korpela@21:1/5 to Helmut Richter on Sun Oct 18 15:06:11 2020

Helmut Richter wrote:

On Sat, 17 Oct 2020, Stan Brown wrote:

QUESTION 2: It would be awfully convenient to type a Windows
apostrophe (8-bit character 146) rather than ’ or ’. If
I specify a charset of windows-1252, am I safe to do that, or should
I still stay away from Windows characters 128-159?

There is no reason to stay away from code points that are defined in the code.

Well, apart from some code points not being assigned to any character,
or some assigned characters being somewhat questionable. (For example,
how often would it make sense to use the florin sign ƒ?) Sorry, today is
my nitpicking day.

(I don’t have the problem, though. If I want a real apostrophe like the
one in the preceding sentence, I just type it (on my keyboard AltGr+'),

I just press the key labeled with the Ascii apostophe ('). Well, that’s
how I use my personal keyboard layout when typing text (as opposite to
code), and using the standard Finnish international layout I need to use AltGr+'

and it lands in the file as the UTF-8 representation of that character.

This depends on the software that processes the typed characters.

QUESTION 3: If I should still stay away from 128-159, even with a
windows-1252 declaration, is there any particular reason you suggest
windows-1252 rather than iso-8859-1? know they're the same for 32-
127 and 160-255, but in my mind windows-1252 suggests that I'll be
using Windows 128-159, and iso-8859-1 does not.

If you use these code points, you have to specify windows-1252; if not,
the effect is the same for the two code names.

No, the effect is always the same on all browsers use nowadays (possibly excluding some you might see in a museum of technology).

Browsers treat iso-8859-1 as an alias for windows-1252. Technically,
they are to distinct encodings and differ in the 128–159 range, but in
the HTML context, they are the same. You can see this by creating a test document with loads of characters in that range, in windows-1252
encoding, and declaring the document as iso-8859-1 encoded in all
possible ways. It’s still processed and shown as windows-1252 encoded.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helmut Richter@21:1/5 to Jukka K. Korpela on Sun Oct 18 15:34:47 2020

On Sun, 18 Oct 2020, Jukka K. Korpela wrote:

Helmut Richter wrote:

On Sat, 17 Oct 2020, Stan Brown wrote:

QUESTION 2: It would be awfully convenient to type a Windows
apostrophe (8-bit character 146) rather than ’ or ’. If
I specify a charset of windows-1252, am I safe to do that, or should
I still stay away from Windows characters 128-159?

There is no reason to stay away from code points that are defined in the code.

Well, apart from some code points not being assigned to any character, or some
assigned characters being somewhat questionable. (For example, how often would
it make sense to use the florin sign ƒ?) Sorry, today is my nitpicking day.

If you need it, you can use it. There are many usable characters I have
never used.

(I don’t have the problem, though. If I want a real apostrophe like the one in the preceding sentence, I just type it (on my keyboard AltGr+'),

I just press the key labeled with the Ascii apostophe ('). Well, that’s how I
use my personal keyboard layout when typing text (as opposite to code), and using the standard Finnish international layout I need to use AltGr+'

and it lands in the file as the UTF-8 representation of that character.

This depends on the software that processes the typed characters.

Yes, of course. My remark has another background: Instead of thinking how
to produce a character for this or that purpose, I have once and for all installed the software that each „ä“, „é“, or „ע“ is the same whatever the
purpose: in a letter to be sent, as part of command, as text in an HTML
page, as part of a filename, or anything else. It is represented as the
same bit pattern in all uses, at least in all where I can control that bit pattern (PDF does it differently AFAIK). Depending on the underlying
system, it may be some work until everything fits together, but from then onward it is much easier. It is of no use to make an exception just for
one application. But eveybody may do so if they like.

QUESTION 3: If I should still stay away from 128-159, even with a windows-1252 declaration, is there any particular reason you suggest windows-1252 rather than iso-8859-1? know they're the same for 32-
127 and 160-255, but in my mind windows-1252 suggests that I'll be
using Windows 128-159, and iso-8859-1 does not.

If you use these code points, you have to specify windows-1252; if not,
the effect is the same for the two code names.

No, the effect is always the same on all browsers use nowadays (possibly excluding some you might see in a museum of technology).

Yes, but I hate to write iso-8859-1 when it is a lie, whereas windows-1252 would work exactly the same and would be true. In effect, one relies on a (common and arguably user-friendly) bug in the browsers. This is the same
as writing a comma instead of an opening single quote which looks the
same, just because the comma is typed faster. Such tricks may be adequate
if there is no other work-around for a problem but not on a regular basis.

--
Helmut Richter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Helmut Richter on Sun Oct 18 07:29:58 2020

On Sun, 18 Oct 2020 15:34:47 +0200, Helmut Richter wrote:

Yes, but I hate to write iso-8859-1 when it is a lie, whereas windows-1252 would work exactly the same and would be true.

I happened to read Jukka's followup before yours, but I think you put
my feeling into better words than I could.

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Jukka K. Korpela on Sun Oct 18 07:26:07 2020

On Sun, 18 Oct 2020 13:49:41 +0300, Jukka K. Korpela wrote:

Stan Brown wrote:

<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

QUESTION 1: Any reason you suggest that rather than the simpler
<meta charset="windows-1252">

No good reason. I just wrote the original format because I learned it 25 years or so ago

Got it; thanks! I think I'll use the shorter form, then. Fortunately
it's inside an include file, so I only need to change it once.

QUESTION 2: It would be awfully convenient to type a Windows
apostrophe (8-bit character 146) rather than ’ or ’. If
I specify a charset of windows-1252, am I safe to do that, or should
I still stay away from Windows characters 128-159?

You?re safe. Twenty years ago it was different.

That's good news; thanks. (And how bizarre it is to talk about
"twenty years ago" in a Web context. Where did the time go?

QUESTION 3: If I should still stay away from 128-159, even with a windows-1252 declaration, is there any particular reason you suggest windows-1252 rather than iso-8859-1?

The reason is that browsers treat iso-8859-1 as windows-1252, and HTML5
made this the rule. In the old times it was different, mainly in the
sense that browsers running on Unix platforms actually treated
iso-8859-1 declared data so that octets 128?159 were control characters
and sometimes had odd effects.

Wow! I would never have guessed that: I take those character sets
literally.

I think I can get that access, probably via some override file in my
root directory. In fact, there's already a .htaccess file there with
one AddType, so I think it must be an Apache server or a workalike.
I should be able to add
AddType text/plain;charset=windows-1252
AddType text/html;charset=windows-1252
and have the server emit the desired headers.

I?m afraid Nginx does not support .htaccess but has other tools.

Hmm ... it seems to work for me as though it were Apache. I added
these lines to my existing .htaccess file:

AddType 'text/html; charset=windows-1252' htm
AddType 'text/html; charset=windows-1252' html

and then tried a couple of retrieved with W3C's i18n tool at <URL:https://validator.w3.org/i18n-checker/>, and the output showed
that the server was now sending windows-1252. Am I misinterpreting
something, or is that tool not reliable?

But the stackoverflow
article above makes the point that we still want to include a charset
in each file, for the folks who download a file for later reading.

That?s a valid point, because browsers probably still haven?t learned to
save a web page locally in a proper way. That is, they don?t use the
HTTP headers when saving the file. This is understandable, ...

Makes sense.

Thanks again for your help!

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to All on Sun Oct 18 08:03:43 2020

On Thu, 15 Oct 2020 14:31:10 -0700, I started this thread with:

I'm trying, and failing, to write the proper charset in my meta tag.
Help, please!

A very big thank-you to all those who responded! I have learned quite
a lot in the past few days, and you were a big help in that. Here are
changes completed or in progress:

* Server now declares web pages as windows-1252 character set, which
is what they are.

* Learned about W3C's i18n tool, an easy way to check headers
relevant to encoding as sent by the server: <URL:https://validator.w3.org/i18n-checker/>

* Dumped HTML 4.01 and now use <!DOCTYPE html>. I had put that off
for far too long.

* Replaced <meta http-equiv ...> with <meta charset ...>. This is
redundant for online viewing, but may be helpful for viewing saved
Web pages off line.

* W3C's command-line checker (vnu) is installed and is now part of my
build process (replacing NSGMLS). Not only does it validate according
to HTML, it checks inline CSS, both <style> and style="..."
attributes.

* Figured out the --filterfile option in vnu, so that it suppresses
messages about my non-utf-8 character set. (And they implemented that
right: if the suppressed messages are the only errors, the checker
returns an exit status of 0, not 1.)

* Now using actual characters for the whole range 32-255, instead of
&#...; in the range 128-255. That includes Windows quote marks and
dashes, for instance, which will reduce file sizes and of course be
easier for me to read in the raw files.

At this point I'm not converting to utf-8, though perhaps in the
future. But despite W3C pushing for it very hard, I've learned that
it's not necessary.

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas 'PointedEars' Lahn@21:1/5 to Eli the Bearded on Tue Oct 20 02:08:13 2020

Eli the Bearded wrote:

In comp.infosystems.www.authoring.html,
Stan Brown <the_stan_brown@fastmail.fm> wrote:

I have this line in the <head> of my Web pages:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
But perfectly decent characters like é, ×, ² show up as a question
mark in a lozenge. I figured out that that's because my HTML files
are all plain text, 8 characters per byte, which is not UTF8 when I
use characters above 127.

The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
it's all not plain.

Nonsense. “Plain text” means – literally – content that can be read by a
person as opposed to “binary” data; that is, content where byte sequences represent characters, in particular digits and letters.

--
PointedEars
FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas 'PointedEars' Lahn@21:1/5 to Helmut Richter on Tue Oct 20 02:14:22 2020

Helmut Richter wrote:

You should notice that

<meta http-equiv="Content-Type" content="text/html; charset=any-code">
(HTML before HTML5 as well)

and

<meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

have different meanings. meta_http-equiv is a hint to the web server to declare the content type and encoding via the HTTP protocol. […]

Not at all. How did you get that idea? It is not a job of a Web server to *interpret* the body of a HTTP message in order to generate a header for
that HTTP message. Parsing and interpreting HTML, for example, is solely
the domain of a HTML user agent.

Instead, both HTML elements are a *substitute* – an *equivalent* – for the Content-Type HTTP header field, to be used by the Web _browser_, if that
header field is not sent by the Web server.

The various HTML Specifications make that very clear.

--
PointedEars
FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas 'PointedEars' Lahn@21:1/5 to Jukka K. Korpela on Tue Oct 20 02:23:05 2020

Jukka K. Korpela wrote:

HTML is by definition not plain text.

That is plain false.

--
PointedEars
FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas 'PointedEars' Lahn@21:1/5 to Jukka K. Korpela on Tue Oct 20 02:21:41 2020

Jukka K. Korpela wrote:

Eli the Bearded wrote:

The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
it's all not plain.

That’s nonsense. Plain text is just text, as oppotite to “rich text”, like MS Word format, or HTML.

By contrast to the Rich Text Format and MS Word format(s), HTML *is* a plain-text format because whether a file is a “plain text” file does not depend on the presentation of the content, only on the meaning of the octet sequences.

Therefore its media type “text/html” belongs to (and starts with) the type “text”. By contrast, e.g. the media types for the Rich Text Format (.rtf) files is “application/rtf” and of MS Word 2003+ documents (.docx) is “application/vnd.openxmlformats-officedocument.wordprocessingml.document”.

--
PointedEars
FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jukka K. Korpela@21:1/5 to Lahn on Tue Oct 20 10:08:18 2020

Lahn wrote:

Helmut Richter wrote:

You should notice that

<meta http-equiv="Content-Type" content="text/html; charset=any-code">
(HTML before HTML5 as well)

and

<meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

have different meanings. meta_http-equiv is a hint to the web server to
declare the content type and encoding via the HTTP protocol. […]

Not at all. How did you get that idea?

Perhaps from the HTML specifications.

The original idea was that servers could parse the start of an HTML
document and use <meta http-equiv=...> content to generate HTTP headers.

This didn’t happen. Instead, servers use various configuration files or settings to decide on HTTP response headers. Browsers, on the other
hand, started using <meta> tags at least to some extent, e.g. when
server response does not specify charset.

It is not a job of a Web server to
*interpret* the body of a HTTP message in order to generate a header for
that HTTP message. Parsing and interpreting HTML, for example, is solely
the domain of a HTML user agent.

Instead, both HTML elements are a *substitute* – an *equivalent* – for the
Content-Type HTTP header field, to be used by the Web _browser_, if that header field is not sent by the Web server.

The various HTML Specifications make that very clear.

”HTTP servers may read the content of the document HEAD to generate
header fields corresponding to any elements defining a value for the
attribute HTTP-EQUIV.” https://www.w3.org/MarkUp/html-spec/html-spec_5.html#SEC5.2.5

”HTTP servers may use the property name specified by the HTTP-EQUIV
attribute to create an RFC 822 style header in the HTTP response.” https://www.w3.org/TR/2018/SPSD-html32-20180315/#meta

”http-equiv = name [CI]
This attribute may be used in place of the name attribute. HTTP servers
use this attribute to gather information for HTTP response message headers.” https://www.w3.org/TR/html401/struct/global.html#h-7.4.4.2

Since that’s not how things actually worked, HTML5 specs don’t even
mention the possibility of servers using <meta> tags. Neither do they
prohibit such things; they don’t really deal with the operation of
servers. The early HTML5 drafts/specs didn’t even allow <meta
http-equiv=...> and instead used the <meta charset=...> invention, which
was, from the beginning, meant to be handled by user agents.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helmut Richter@21:1/5 to Thomas 'PointedEars' Lahn on Tue Oct 20 10:09:44 2020

On Tue, 20 Oct 2020, Thomas 'PointedEars' Lahn wrote:

Helmut Richter wrote:

You should notice that

<meta http-equiv="Content-Type" content="text/html; charset=any-code">
(HTML before HTML5 as well)

and

<meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

have different meanings. meta_http-equiv is a hint to the web server to declare the content type and encoding via the HTTP protocol. […]

Not at all. How did you get that idea? It is not a job of a Web server to *interpret* the body of a HTTP message in order to generate a header for
that HTTP message. Parsing and interpreting HTML, for example, is solely
the domain of a HTML user agent.

Thank you for repeating <huuk9tF11ncU1@mid.individual.net>. I understood
that one as well, though.

--
Helmut Richter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Tue Oct 20 15:34:52 2020

Thomas 'PointedEars' Lahn:

Helmut Richter wrote:

You should notice that

<meta http-equiv="Content-Type" content="text/html; charset=any-code">
(HTML before HTML5 as well)

and

<meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

have different meanings. meta_http-equiv is a hint to the web server to
declare the content type and encoding via the HTTP protocol. […]

Not at all. How did you get that idea? It is not a job of a Web server to *interpret* the body of a HTTP message in order to generate a header for
that HTTP message. Parsing and interpreting HTML, for example, is solely
the domain of a HTML user agent.

Instead, both HTML elements are a *substitute* – an *equivalent* – for the
Content-Type HTTP header field, to be used by the Web _browser_, if that header field is not sent by the Web server.

The various HTML Specifications make that very clear.

JFTR - HTML 4.01 already mentioned that servers parse the document and
use meta elements to create response headers, eventhough I have never
seen this in real world implementations:

<https://www.w3.org/TR/html401/struct/global.html#adef-http-equiv>

"http-equiv = name [CI]

This attribute may be used in place of the name attribute. HTTP servers
use this attribute to gather information for HTTP response message headers."

It seems there is a module for Apache 2 to deal with this - but I doubt
this is still in use anywhere:

<https://metacpan.org/pod/Apache2::HttpEquiv>

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Arno Welzel on Tue Oct 20 06:46:29 2020

On Tue, 20 Oct 2020 15:36:53 +0200, Arno Welzel wrote:

Stan Brown:

On Thu, 15 Oct 2020 14:31:10 -0700, I started this thread with:

I'm trying, and failing, to write the proper charset in my meta tag.
Help, please!

A very big thank-you to all those who responded! I have learned quite
a lot in the past few days, and you were a big help in that. Here are changes completed or in progress:

[...]

Thank you for this summary of your findings.

It seemed the least I could do, after all the help I received. I've
learned a huge amount these last few days, and now I'm in the process
of bringing my Web pages up to date.

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Tue Oct 20 15:36:53 2020

Stan Brown:

On Thu, 15 Oct 2020 14:31:10 -0700, I started this thread with:

I'm trying, and failing, to write the proper charset in my meta tag.
Help, please!

A very big thank-you to all those who responded! I have learned quite
a lot in the past few days, and you were a big help in that. Here are
changes completed or in progress:

[...]

Thank you for this summary of your findings.

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Eli the Bearded@21:1/5 to cljs@PointedEars.de on Tue Oct 20 18:40:51 2020

In comp.infosystems.www.authoring.html,
Thomas 'PointedEars' Lahn <cljs@PointedEars.de> wrote:

Eli the Bearded wrote:

The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
it's all not plain.

Nonsense. "Plain text" means - literally - content that can be read
by a person as opposed to "binary" data; that is, content where byte sequences represent characters, in particular digits and letters.

So, by that rule, anything in RAM, on magnetic disk, on magnetic tape,
on SSD, on DVD-R or CD-ROM, in transit over ethernet or wifi, all of
those are _not plain text_.

(As an aside, I'm seeing that my stance that US-ASCII is "plain text"
and "plain text" does not necessarily mean "text/plain" is an unpopular
one. I'm tired of arguing the point, but no one has convinced me that
I'm wrong.)

Elijah
------
utf-8 in the sheets, ascii in the style sheets

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helmut Richter@21:1/5 to Eli the Bearded on Tue Oct 20 20:43:49 2020

On Tue, 20 Oct 2020, Eli the Bearded wrote:

(As an aside, I'm seeing that my stance that US-ASCII is "plain text"

This why ,d,,

and "plain text" does not necessarily mean "text/plain" is an unpopular
one. I'm tired of arguing the point, but no one has convinced me that
I'm wrong.)

Elijah
------
utf-8 in the sheets, ascii in the style sheets

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Phillip Helbig (undress to reply@21:1/5 to *@eli.users.panix.com on Tue Oct 20 19:29:41 2020

In article <eli$2010201433@qaz.wtf>, Eli the Bearded
<*@eli.users.panix.com> writes:

In comp.infosystems.www.authoring.html,
Thomas 'PointedEars' Lahn <cljs@PointedEars.de> wrote:

Eli the Bearded wrote:

The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
it's all not plain.

Nonsense. "Plain text" means - literally - content that can be read
by a person as opposed to "binary" data; that is, content where byte sequences represent characters, in particular digits and letters.

So, by that rule, anything in RAM, on magnetic disk, on magnetic tape,
on SSD, on DVD-R or CD-ROM, in transit over ethernet or wifi, all of
those are _not plain text_.

(As an aside, I'm seeing that my stance that US-ASCII is "plain text"
and "plain text" does not necessarily mean "text/plain" is an unpopular
one. I'm tired of arguing the point, but no one has convinced me that
I'm wrong.)

Elijah
------
utf-8 in the sheets, ascii in the style sheets

Is PostScript plain text?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Wed Oct 21 04:27:37 2020

Phillip Helbig (undress to reply):

[...]

Is PostScript plain text?

It can be:

<http://paulbourke.net/dataformats/postscript/>

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jukka K. Korpela@21:1/5 to Arno Welzel on Wed Oct 21 13:52:17 2020

Arno Welzel wrote:

Phillip Helbig (undress to reply):

[...]

Is PostScript plain text?

It can be:

<http://paulbourke.net/dataformats/postscript/>

That old document seems to say that PostScript is plain text, since you
can create, edit, and read a PostScript file using a text editor. But
that’s not how ”plain text” is defined in MIME:

The simplest and most important subtype of "text" is "plain". This
indicates plain text that does not contain any formatting commands or
directives. Plain text is intended to be displayed "as-is", that is,
no interpretation of embedded formatting commands, font attribute
specifications, processing instructions, interpretation directives,
or content markup should be necessary for proper display. T
https://tools.ietf.org/html/rfc2046#section-4.1.3

ObHTML: Similarly, HTML is not plain text.

Technically, PostScript isn’t even classified as text; the media type
for it is application/postscript. This does not mean that it would be impossible to write PostScript using a text editor.

ObHTML: For XHTML, the media type application/xhtml+xml is specified.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Wed Oct 21 18:08:06 2020

Jukka K. Korpela:

Arno Welzel wrote:

Phillip Helbig (undress to reply):

[...]

Is PostScript plain text?

It can be:

<http://paulbourke.net/dataformats/postscript/>

That old document seems to say that PostScript is plain text, since you
can create, edit, and read a PostScript file using a text editor. But that’s not how ”plain text” is defined in MIME:

The simplest and most important subtype of "text" is "plain". This
indicates plain text that does not contain any formatting commands or
directives. Plain text is intended to be displayed "as-is", that is,
no interpretation of embedded formatting commands, font attribute
specifications, processing instructions, interpretation directives,
or content markup should be necessary for proper display. T
https://tools.ietf.org/html/rfc2046#section-4.1.3

ObHTML: Similarly, HTML is not plain text.

Correct - HTML has to be interpreted by a browser to get the final
display. Nevertheless you still can also edit it with a text editor
which does not know anything about HTML at all.

Technically, PostScript isn’t even classified as text; the media type
for it is application/postscript. This does not mean that it would be impossible to write PostScript using a text editor.

ObHTML: For XHTML, the media type application/xhtml+xml is specified.

But even application/xhtml+xml is in fact plain text which is
*interpreted* as XHTML.

The important point is, that the content of a file of that type can be
read as plain text as well.

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Eli the Bearded@21:1/5 to Jukka K. Korpela on Wed Oct 21 18:04:43 2020

In comp.infosystems.www.authoring.html,
Jukka K. Korpela <jukkakk@gmail.com> wrote:

Arno Welzel wrote:

Phillip Helbig (undress to reply):

Is PostScript plain text?

It can be:

It "can" be plain text (but is not text/plain).

That old document seems to say that PostScript is plain text, since you
can create, edit, and read a PostScript file using a text editor. But that’s not how ”plain text” is defined in MIME:

Broken "smart" quotes, woo-hoo. But more seriously, the real objection
to calling Postscript plain "text" is very often Postscript contains
binary data. Either in 7-bit clean encoded form (eg hex, base64, or
base85) as actual raw binary inclusions. The language makes it easy to
say "the next 1289683 octets are data" and not worry about encoding
the data.

Alas I can't find an example on this computer, but I have seen actual
JPEG files inlined in Postscript. Since Postscript is a programming
language it is easy enough to have a program that can interpret a binary
blob to simplify the creation of programs using raster images. Or to
have "self extracting" compressed programs.

For reasons like that, alone, giving Postscript an "application/" MIME
type is quite reasonable.

Elijah
------
also there is something reasonable about calling programs "application/"

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jukka K. Korpela@21:1/5 to Arno Welzel on Wed Oct 21 21:27:04 2020

Arno Welzel wrote:

But even application/xhtml+xml is in fact plain text which is
*interpreted* as XHTML.

The important point is, that the content of a file of that type can be
read as plain text as well.

Please read this as plain text.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas 'PointedEars' Lahn@21:1/5 to Eli the Bearded on Wed Oct 21 20:29:21 2020

Eli the Bearded wrote:

In comp.infosystems.www.authoring.html,
Thomas 'PointedEars' Lahn <cljs@PointedEars.de> wrote:

Eli the Bearded wrote:

The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
it's all not plain.

Nonsense. "Plain text" means - literally - content that can be read
by a person as opposed to "binary" data; that is, content where byte
sequences represent characters, in particular digits and letters.

So, by that rule, anything in RAM, on magnetic disk, on magnetic tape,
on SSD, on DVD-R or CD-ROM, in transit over ethernet or wifi, all of
those are _not plain text_.

No, of course not. Not all code points of US-ASCII or Unicode represent
digits and letters. In particular, the first 32 code points do not; they represent non-printable control characters or are left unassigned. That
is, they represent *data*, but not necessarily *text*.

[Ex falso quodlibet]

PointedEars
--
Prototype.js was written by people who don't know javascript for people
who don't know javascript. People who don't know javascript are not
the best source of advice on designing systems that use javascript.
-- Richard Cornford, cljs, <f806at$ail$1$8300dec7@news.demon.co.uk>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas 'PointedEars' Lahn@21:1/5 to Jukka K. Korpela on Wed Oct 21 20:49:26 2020

Jukka K. Korpela wrote:

Arno Welzel wrote:

But even application/xhtml+xml is in fact plain text which is
*interpreted* as XHTML.

The important point is, that the content of a file of that type can be
read as plain text as well.

Please read

this as plain

text.

| This definition is NOT what is commonly being used to distinguish which
| *files* are considered plain text and “binary” *files* by software
| developers; they use common sense instead (which arguably some people do
| not appear to have):

q.e.d.

*facepalm*

--
PointedEars
FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas 'PointedEars' Lahn@21:1/5 to Jukka K. Korpela on Wed Oct 21 20:48:18 2020

Jukka K. Korpela wrote:

Arno Welzel wrote:

Phillip Helbig (undress to reply):
[...]

Is PostScript plain text?

It can be:

<http://paulbourke.net/dataformats/postscript/>

That old document seems to say that PostScript is plain text, since you
can create, edit, and read a PostScript file using a text editor. But that’s not how ”plain text” is defined in MIME:

This definition is NOT what is commonly being used to distinguish which
*files* are considered plain text and “binary” *files* by software developers; they use common sense instead (which arguably some people do not appear to have):

"Plain text" *files* are *human*-readable¹, while "binary" files are not.
I wager that further information can be found in the standards that define
the Unix operating system as various tools standardized there are using this definition.

Therefore, for software developers and authors who actually *write* HTML – HTML can be *written* with a *plain-text editor* like Vim, Emacs, Atom etc.;
it does not need to be generated by a special application like graphics software – (instead of only discussing about it), HTML *is* considered a plain-text *file* format, as I explained before.

For clarification, see also <https://en.wikipedia.org/wiki/Plain_text>

https://tools.ietf.org/html/rfc2046#section-4.1.3

| Updated by: 2646, 3798, 5147, 6657, 8098
| […]
| November 1996

_______
¹ substitute the name of your favorite intelligent fully-biological species

PointedEars
--
Sometimes, what you learn is wrong. If those wrong ideas are close to the
root of the knowledge tree you build on a particular subject, pruning the
bad branches can sometimes cause the whole tree to collapse.
-- Mike Duffy in cljs, <news:Xns9FB6521286DB8invalidcom@94.75.214.39>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Eli the Bearded@21:1/5 to Jukka K. Korpela on Wed Oct 21 18:53:18 2020

In comp.infosystems.www.authoring.html,
Jukka K. Korpela <jukkakk@gmail.com> wrote:

Please read this as plain
text.

Reading it as plain text is trivial. Ampersand hash lower-case-x five
zero semicolon. Ampersand hash lower-case-x six upper-case-C semicolon.
Et cetera. As text/plain it leaves a lot to be desired.

Elijah
------ %77%72%6f%74%65%20%61%20%43%4c%49%20%74%6f%6f%6c%20%66%6f%72%20%74%68%69%73

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jukka K. Korpela@21:1/5 to Eli the Bearded on Wed Oct 21 22:08:27 2020

Eli the Bearded wrote:

In comp.infosystems.www.authoring.html,
Jukka K. Korpela <jukkakk@gmail.com> wrote:

Please read
this as plain
text.

Reading it as plain text is trivial.

Didn’t someone quote this from the relevant RFC:
Plain text is intended to be displayed "as-is", that is,
no interpretation of embedded formatting commands, font attribute
specifications, processing instructions, interpretation directives,
or content markup should be necessary for proper display.

Do I need to point out that it says that “no interpretation of [...]
content markup should be necessary for proper display”?

Are you saying tai displaying the character sequence “as-is” is proper display?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Eli the Bearded@21:1/5 to dciwam@PointedEars.de on Wed Oct 21 19:16:27 2020

In comp.infosystems.www.authoring.html,
Thomas 'PointedEars' Lahn <dciwam@PointedEars.de> wrote:

I note the lack of an attribution there[*].

My writing:

So, by that rule, anything in RAM, on magnetic disk, on magnetic tape,
on SSD, on DVD-R or CD-ROM, in transit over ethernet or wifi, all of
those are _not plain text_.

Thomas's reply:

No, of course not. Not all code points of US-ASCII or Unicode represent digits and letters. In particular, the first 32 code points do not; they represent non-printable control characters or are left unassigned. That
is, they represent *data*, but not necessarily *text*.

Control characters between 0 and 31 are either generally not used in
output or have very well defined meanings in output. Many of them are
used in _input_ though, eg in Unix <ctrl-c>, <ctrl-d>, <ctrl-w> are all
things I've used today. Calling <tab> or <line feed> "non-printable"
might be true for some narrow version of "non-printable", but that same
narrow version of "non-printable" also holds for the 33rd entry in
US-ASCII, U+0032, which you explicitly left out of your example.

The lexographer Jesse Sheidlower was once asked what his favorite
punctionation mark was: https://www.theatlantic.com/culture/archive/2012/09/writers-favorite-punctuation-marks/323287/

I once participated in a similar exercise, and in the end I
concluded that the humble space is the punctuation mark to beat.
People tend to argue for the expressiveness of the semicolon, or
the esoteric old-fashionedness of the diaeresis. But these are all
seasonings. The meat of it is the space, and if you've ever tried
to read manuscripts from the era before the space was regularly
used, you'll know just how important it is. It's what gives us
words instead of a big lump.

ALLCAPSTEXTWITHNOWHITESPACEISPLAINTEXTANDEVENMIMETEXTPLAINBUTTHATDOESNOTMEANITISEASILYREAD

The thirty-three codepoints between U+0000 and U+0032 (inclusive) are
all punctuation marks of a sort, some of which never found general use.

[Ex falso quodlibet]

[*] This is not something I wrote, although the >> implies it was in
my article. So perhaps the lack of attribution was deliberate?

Elijah
------
boustrophedonic inscriptions are plain text but not easily read

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas 'PointedEars' Lahn@21:1/5 to Jukka K. Korpela on Wed Oct 21 21:24:26 2020

Jukka K. Korpela wrote:

Lahn wrote:

^^^^

Helmut Richter wrote:

^^^^^^^^^^^^^^
(sic)

You don’t *like* me, I get it. No need to point it out every time!

What an obnoxious character :-(

You should notice that

<meta http-equiv="Content-Type" content="text/html;
charset=any-code">
(HTML before HTML5 as well)

and

<meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

have different meanings. meta_http-equiv is a hint to the web server to
declare the content type and encoding via the HTTP protocol. […]

Not at all. How did you get that idea?

Perhaps from the HTML specifications.

Perhaps, but that would be a common misconception. (How common it was/is became apparent when Apache’s “AddDefaultCharset” directive had to be removed from/disabled in the default configuration, and the display errors showed up in the very same bug report because the server used to display the report was an Apache server which was still misconfigured in that way.¹)

No non-obsolete HTML Specification specifies this. In fact, only version
2.0 did, and it was literally made obsolete decades ago.

The next version, HTML 3.2 of 1997, already clarified:

,-<https://www.w3.org/TR/2018/SPSD-html32-20180315/#meta>
|
| […] This can't be used to set certain HTTP headers though, see the HTTP
| specification for details.

Since a Web server "now" must provide at least “Content-Type: text/html” (see below) for a resource to be parsed as HTML if it is requested via HTTP,
it is not intended for

<meta http-equiv='Content-Type' value='text/html; charset=foo'>

to supersede the server-specified encoding.

It is not a job of a Web server to *interpret* the body of a HTTP message
in order to generate a header for that HTTP message. Parsing and
interpreting HTML, for example, is solely the domain of a HTML user
agent.

Instead, both HTML elements are a *substitute* – an *equivalent* – for >> the Content-Type HTTP header field, to be used by the Web _browser_, if
that header field is not sent by the Web server.

The various HTML Specifications make that very clear.

”HTTP servers may read the content of the document HEAD to generate
header fields corresponding to any elements defining a value for the attribute HTTP-EQUIV.” https://www.w3.org/MarkUp/html-spec/html-spec_5.html#SEC5.2.5

This was nonsense to begin with, because, as I indicated, it would require a HTTP *server* to interpret HTML (and keep up-to-date with the respective current HTML standard as well), and tacitly assume that no non-US-ASCII- compatible code sequences occur before the respective META element.

Unicode 1.0 was introduced in 1992 already, and other character encodings
than US-ASCII existed before, so this was a clear oversight in this specification that became an IETF standards track document (RFC 1866)
in 1995-11.

It is obsolete since 2000-06: <https://tools.ietf.org/html/rfc2854>

Since that’s not how things actually worked, HTML5 specs don’t even mention the possibility of servers using <meta> tags. Neither do they prohibit such things; they don’t really deal with the operation of
servers.

So it is not reasonable to assume that this would work. AISB.

The early HTML5 drafts/specs didn’t even allow <meta
http-equiv=...> and instead used the <meta charset=...> invention,

Questionable. Evidence?

which was, from the beginning, meant to be handled by user agents.

Yes.

______
¹ <https://bz.apache.org/bugzilla/show_bug.cgi?id=23421>
--
PointedEars
FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Wed Oct 21 21:21:09 2020

Jukka K. Korpela:

Arno Welzel wrote:

But even application/xhtml+xml is in fact plain text which is
*interpreted* as XHTML.

The important point is, that the content of a file of that type can be
read as plain text as well.

Please read this as plain
text.

Is this the way *you* create your XHTML files?

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Wed Oct 21 21:22:28 2020

Jukka K. Korpela:

Eli the Bearded wrote:

In comp.infosystems.www.authoring.html,
Jukka K. Korpela <jukkakk@gmail.com> wrote:

Please read
this as plain
text.

Reading it as plain text is trivial.

Didn’t someone quote this from the relevant RFC:
Plain text is intended to be displayed "as-is", that is,

Which is possible:

Ampersand, Hash, Five, Zero, Colon...

[...]

Are you saying tai displaying the character sequence “as-is” is proper display?

Yes. You did not ask for "interpret what this text means".

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas 'PointedEars' Lahn@21:1/5 to Helmut Richter on Wed Oct 21 21:26:38 2020

Helmut Richter wrote:

On Tue, 20 Oct 2020, Thomas 'PointedEars' Lahn wrote:

Helmut Richter wrote:

You should notice that

<meta http-equiv="Content-Type" content="text/html;
charset=any-code">
(HTML before HTML5 as well)

and

<meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

have different meanings. meta_http-equiv is a hint to the web server to
declare the content type and encoding via the HTTP protocol. […]

Not at all. How did you get that idea? It is not a job of a Web server
to *interpret* the body of a HTTP message in order to generate a header
for
that HTTP message. Parsing and interpreting HTML, for example, is solely
the domain of a HTML user agent.

Thank you for repeating <huuk9tF11ncU1@mid.individual.net>. I understood
that one as well, though.

Go to hell.

--
PointedEars
FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Wed Oct 21 21:27:13 2020

Arno Welzel:

Jukka K. Korpela:

Eli the Bearded wrote:

In comp.infosystems.www.authoring.html,
Jukka K. Korpela <jukkakk@gmail.com> wrote:

Please read
this as plain
text.

Reading it as plain text is trivial.

Didn’t someone quote this from the relevant RFC:
Plain text is intended to be displayed "as-is", that is,

Which is possible:

Ampersand, Hash, Five, Zero, Colon...

Well, I forgot the x after the ampersand...

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas 'PointedEars' Lahn@21:1/5 to All on Wed Oct 21 21:36:16 2020

A pseudonymous coward and liar trolled:

In comp.infosystems.www.authoring.html,
Thomas 'PointedEars' Lahn <dciwam@PointedEars.de> wrote:

I note the lack of an attribution there[*].

Have your eyes checked.

My writing:

So, by that rule, anything in RAM, on magnetic disk, on magnetic tape,
on SSD, on DVD-R or CD-ROM, in transit over ethernet or wifi, all of
those are _not plain text_.

Thomas's reply:

See below.

No, of course not. Not all code points of US-ASCII or Unicode represent
digits and letters. In particular, the first 32 code points do not; they
represent non-printable control characters or are left unassigned. That
is, they represent *data*, but not necessarily *text*.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Control characters between 0 and 31 are either generally not used in
output or have very well defined meanings in output.

You don’t say. *facepalm*

[Ex falso quodlibet]

[*] This is not something I wrote, although the >> implies it was in
my article. So perhaps the lack of attribution was deliberate?

Why are you lying?

<https://www.netmeister.org/news/learn2quote.html>

Score adjusted

--
PointedEars
FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jukka K. Korpela@21:1/5 to Arno Welzel on Wed Oct 21 22:50:06 2020

Arno Welzel wrote:

Ampersand, Hash, Five, Zero, Colon...

[...]

Are you saying tai displaying the character sequence “as-is” is proper >> display?

Yes. You did not ask for "interpret what this text means".

For HTML (which is what we are discussing here), “proper display” means displaying the content as defined in HTML specifications. It would inappropriate for a browser to display the tags, the character
references, the comments, etc., “as-is”. It would mean rendering an HTML document as plain text (which it is not, by definition), refusing to do
the job of a browser.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Eli the Bearded@21:1/5 to usenet@PointedEars.de on Wed Oct 21 20:07:55 2020

In comp.infosystems.www.authoring.html,
Thomas 'PointedEars' Lahn <usenet@PointedEars.de> wrote:

A pseudonymous coward and liar trolled:

Ever the classy person there.

[Ex falso quodlibet]

[*] This is not something I wrote, although the >> implies it was in
my article. So perhaps the lack of attribution was deliberate?

Why are you lying?

$ lynx -source -dump 'news:<eli$2010201433@qaz.wtf>' |grep
quodlibet
$ lynx -source -dump 'news:<2173853.ElGaqSPkdT@PointedEars.de>' |grep quodlibet

[Ex falso quodlibet]

$

Elijah
------
still recalls <eli$1603151953@qz.little-neck.ny.us>

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Arno Welzel on Wed Oct 21 14:38:48 2020

On Wed, 21 Oct 2020 21:21:09 +0200, Arno Welzel wrote:

Jukka K. Korpela:

Arno Welzel wrote:

But even application/xhtml+xml is in fact plain text which is
*interpreted* as XHTML.

The important point is, that the content of a file of that type can be
read as plain text as well.

Please read this as plain
text.

Is this the way *you* create your XHTML files?

Reminds me of the old days at my college computing center, when we
would have to key in a series of octal codes to cold boot the Univac
1107 after repairs.

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Thomas 'PointedEars' Lahn@21:1/5 to Eli the Bearded on Thu Oct 22 03:28:03 2020

Eli the Bearded wrote:

In comp.infosystems.www.authoring.html,
Thomas 'PointedEars' Lahn <usenet@PointedEars.de> wrote:

A pseudonymous coward and liar trolled:

Ever the classy person there.

At least you can’t say now that there was no proper attribution :-p

[Ex falso quodlibet]

[*] This is not something I wrote, although the >> implies it was in
my article. So perhaps the lack of attribution was deliberate?

Why are you lying?

$ lynx -source -dump 'news:<eli$2010201433@qaz.wtf>' |grep
quodlibet
$ lynx -source -dump 'news:<2173853.ElGaqSPkdT@PointedEars.de>' |grep quodlibet

[Ex falso quodlibet]

$

Oh honey, the buses don’t go where you live, yes?

It was a SUMMARY of what you wrote as indicated by the BRACKETS. As you
could have READ in “How do I quote correctly in Usenet?” which I REFERRED YOU TO.

As you can’t be smart enough to understand Latin, there is the translation:

<https://en.wikipedia.org/wiki/Principle_of_explosion>

*facepalm*

--
PointedEars
FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Thu Oct 22 10:19:37 2020

Jukka K. Korpela:

Arno Welzel wrote:

Ampersand, Hash, Five, Zero, Colon...

[...]

Are you saying tai displaying the character sequence “as-is” is proper >>> display?

Yes. You did not ask for "interpret what this text means".

For HTML (which is what we are discussing here), “proper display” means

"proper display" is not required to read something as plain text.

You can even print this on a sheet of paper and give it to someone to
type it in and you ge the the same file again which can again be
displayed using a web browser.

Try this with a PNG image or a MP3 file.

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Stan Brown@21:1/5 to Arno Welzel on Thu Oct 22 10:08:15 2020

On Thu, 22 Oct 2020 10:19:37 +0200, Arno Welzel wrote:

Jukka K. Korpela:

Arno Welzel wrote:

Ampersand, Hash, Five, Zero, Colon...

[...]

Are you saying tai displaying the character sequence ?as-is? is proper >>> display?

Yes. You did not ask for "interpret what this text means".

For HTML (which is what we are discussing here), ?proper display? means

"proper display" is not required to read something as plain text.

You can even print this on a sheet of paper and give it to someone to
type it in and you ge the the same file again which can again be
displayed using a web browser.

Try this with a PNG image or a MP3 file.

I think the two of you are actually using different terminology. To
Arno, and to me, "plain text" is not something with no codes in it,
it's something where a "text editor" can see all the characters.

I think Jukka is equating plain text" to type="text/plain". I won't
say that's wrong, but it's not the only interpretation.

--
Stan Brown, Tehachapi, California, USA
https://BrownMath.com/
https://OakRoadSystems.com/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jukka K. Korpela@21:1/5 to Stan Brown on Thu Oct 22 21:09:04 2020

Stan Brown wrote:

I think the two of you are actually using different terminology. To
Arno, and to me, "plain text" is not something with no codes in it,
it's something where a "text editor" can see all the characters.

I think Jukka is equating plain text" to type="text/plain". I won't
say that's wrong, but it's not the only interpretation.

It is the definition given in the RFC for MIME types (media types), so I
would argue that when discussing e.t. whether HTML is plain text, it is
the correct definition.

You are confusing plain text, subtype text/plain, with the broader
concept of text, major type text. Note that HTML is labelled and served
as text/html (unless an application type not used), specifically
distinguishing HTML text from other types of text, such as plain text or
Rich Text Format (text/rtf).

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Phillip Helbig (undress to reply@21:1/5 to jukkakk@gmail.com on Thu Oct 22 18:05:48 2020

In article <rmq3df$ddn$1@dont-email.me>, "Jukka K. Korpela"
<jukkakk@gmail.com> writes:

For HTML (which is what we are discussing here), proper display means displaying the content as defined in HTML specifications. It would inappropriate for a browser to display the tags, the character
references, the comments, etc., as-is. It would mean rendering an HTML document as plain text (which it is not, by definition), refusing to do
the job of a browser.

Jukka knows his stuff! Just today I came across
jkorpela.fi/forms/file.html and from there to a lot of really
interesting stuff concerning HTML, the web, character encodings, and so
on.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Thu Apr 18 21:44:01 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 19 09:15:26 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 19 08:49:01 2024
  from Wales, Uk via Telnet
- Chippey
  Fri Apr 19 02:45:49 2024
  from Winnipeg, Canada via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	293
Nodes:	16 (2 / 14)
Uptime:	218:49:45
Calls:	6,621
Calls today:	3
Files:	12,171
Messages:	5,317,787

Charset

Who's Online

Recent Visitors

System Info