• Charset

    From Stan Brown@21:1/5 to All on Thu Oct 15 14:31:10 2020
    I'm trying, and failing, to write the proper charset in my meta tag.
    Help, please!

    I have this line in the <head> of my Web pages:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    But perfectly decent characters like , , show up as a question
    mark in a lozenge. I figured out that that's because my HTML files
    are all plain text, 8 characters per byte, which is not UTF8 when I
    use characters above 127.

    So I changed the charset to latin-1, and then to iso-8859-1. With
    each of them, characters 160-255 display correctly, but the W3C's
    validator gives this error message:
    Bad value ?text/html; charset=iso-8859-1? for attribute
    ?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

    So what charset should I use to represent a file where every
    character is 8 bits, and those 8 bits match the iso=8851-1 or latin-1
    character set?

    To make things even more murky, at https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta#attr-
    charset
    I found this gem: "If the attribute is present, its value must be an
    ASCII case-insensitive match for the string "utf-8", because UTF-8 is
    the only valid encoding for HTML5 documents."
    If that's true, it sounds very much like I can't generate my web
    pages unless I code every 160-255 character as a six-byte &#nnn;
    string, which is not only a pain but makes editing harder.

    (I tried looking at character encodings in Vim, and indeed it does
    have a utf-8 option, but after I do my editing I run all my pages
    through a very complicated awk script, and it looks like awk can't
    handle UTF-8, at least not in Windows.)

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/
    HTML 4.01 spec: http://www.w3.org/TR/html401/
    validator: http://validator.w3.org/
    CSS 2.1 spec: http://www.w3.org/TR/CSS21/
    validator: http://jigsaw.w3.org/css-validator/
    Why We Won't Help You: http://preview.tinyurl.com/WhyWont

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to Stan Brown on Thu Oct 15 22:56:44 2020
    In comp.infosystems.www.authoring.html,
    Stan Brown <the_stan_brown@fastmail.fm> wrote:
    I have this line in the <head> of my Web pages:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    But perfectly decent characters like é, ×, ² show up as a question
    mark in a lozenge. I figured out that that's because my HTML files
    are all plain text, 8 characters per byte, which is not UTF8 when I
    use characters above 127.

    The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
    it's all not plain.

    So I changed the charset to latin-1, and then to iso-8859-1. With
    each of them, characters 160-255 display correctly, but the W3C's
    validator gives this error message:
    Bad value ?text/html; charset=iso-8859-1? for attribute
    ?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

    So what charset should I use to represent a file where every
    character is 8 bits, and those 8 bits match the iso=8851-1 or latin-1 character set?
    ...
    I found this gem: "If the attribute is present, its value must be an
    ASCII case-insensitive match for the string "utf-8", because UTF-8 is
    the only valid encoding for HTML5 documents."

    I can't tell for sure without seeing your page, but I think you are
    running into the declared document type specifies an allowed list of
    "charset"s to conformant to that document type. One fix is to declare
    your document to be a type that allows the charset you feel you need to
    use, eg some variant of HTML4. Another fix is to find a compatible
    chaset from the allowed list.

    https://en.wikipedia.org/wiki/Character_encodings_in_HTML#Permitted_encodings

    That suggests that UTF-8 is not required, but only recommended. And
    since you are coming from a Windows environment perhaps Windows-1252 is
    the right HTML5 charset to use. It is a superset of ISO-8859-1, using
    some additional characters between 128 and 159 as "printable" instead of
    as control characters.

    (I tried looking at character encodings in Vim, and indeed it does
    have a utf-8 option, but after I do my editing I run all my pages
    through a very complicated awk script, and it looks like awk can't
    handle UTF-8, at least not in Windows.)

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/
    HTML 4.01 spec: http://www.w3.org/TR/html401/

    Ooo, look there.

    https://www.w3.org/TR/html401/charset.html#doc-char-set

    5.2.1 Choosing an encoding

    [...] This specification does not mandate which character encodings
    a user agent must support.

    validator: http://validator.w3.org/
    CSS 2.1 spec: http://www.w3.org/TR/CSS21/

    That's an HTML4 document in ISO-8859-1.

    validator: http://jigsaw.w3.org/css-validator/
    Why We Won't Help You: http://preview.tinyurl.com/WhyWont

    Okay, you validated. But you then you didn't provide a URL or complete
    example.

    Elijah
    ------
    is not averse to & encodings to put Unicode into US-ASCII

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From JJ@21:1/5 to Stan Brown on Fri Oct 16 09:11:21 2020
    On Thu, 15 Oct 2020 14:31:10 -0700, Stan Brown wrote:
    I'm trying, and failing, to write the proper charset in my meta tag.
    Help, please!

    I have this line in the <head> of my Web pages:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    But perfectly decent characters like , , show up as a question
    mark in a lozenge. I figured out that that's because my HTML files
    are all plain text, 8 characters per byte, which is not UTF8 when I
    use characters above 127.

    So I changed the charset to latin-1, and then to iso-8859-1. With
    each of them, characters 160-255 display correctly, but the W3C's
    validator gives this error message:
    Bad value ?text/html; charset=iso-8859-1? for attribute
    ?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

    So what charset should I use to represent a file where every
    character is 8 bits, and those 8 bits match the iso=8851-1 or latin-1 character set?

    To make things even more murky, at https://developer.mozilla.org/en-US/docs/Web/HTML/Element/meta#attr-
    charset
    I found this gem: "If the attribute is present, its value must be an
    ASCII case-insensitive match for the string "utf-8", because UTF-8 is
    the only valid encoding for HTML5 documents."
    If that's true, it sounds very much like I can't generate my web
    pages unless I code every 160-255 character as a six-byte &#nnn;
    string, which is not only a pain but makes editing harder.

    (I tried looking at character encodings in Vim, and indeed it does
    have a utf-8 option, but after I do my editing I run all my pages
    through a very complicated awk script, and it looks like awk can't
    handle UTF-8, at least not in Windows.)

    It depends on what character set which is used by the text editor/processor you're using.

    If the software doesn't have any setting regarding character set, and...

    If it's a non cross-platform native Windows software, the character set
    should be `Windows-1252` - assuming that the OS' locale is U.S. English. Otherwise, other Windows-NNN character set should be used based on the
    current locale. e.g. `Windows-1250` for central European (French, German, etc.).

    If it's a DOS software, i.e. a pure DOS program, instead of a text-mode
    Windows program; then the character set should be `cp437` for U.S. English. Otherwise, it's cpNNN.

    Cross platform softwares mostly use UTF-8. But it case they don't, the character set could be either be cpNNN, iso-XXX, or Windows-NNNN. Depending
    on the active locale.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Fri Oct 16 10:06:59 2020
    Stan Brown:

    I'm trying, and failing, to write the proper charset in my meta tag.
    Help, please!

    I have this line in the <head> of my Web pages:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    But perfectly decent characters like é, ×, ² show up as a question
    mark in a lozenge. I figured out that that's because my HTML files
    are all plain text, 8 characters per byte, which is not UTF8 when I
    use characters above 127.

    So I changed the charset to latin-1, and then to iso-8859-1. With
    each of them, characters 160-255 display correctly, but the W3C's
    validator gives this error message:
    Bad value ?text/html; charset=iso-8859-1? for attribute
    ?content? on element ?meta?: ?charset=? must be followed by ?utf-8?


    Did you try <meta charset="ISO-8859-1">?


    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Richter@21:1/5 to Eli the Bearded on Fri Oct 16 10:09:31 2020
    On Thu, 15 Oct 2020, Eli the Bearded wrote:

    I can't tell for sure without seeing your page, [...]

    Just tell us the URL (the web address) where we can see your page and thus
    will discover

    – what charset you are really using
    – what the web server says about it
    – what the web page tells about it
    – what the default charset would be if none of the above applies

    and whether these four contradict each other.

    --
    Helmut Richter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Helmut Richter on Fri Oct 16 06:52:23 2020
    On Fri, 16 Oct 2020 10:09:31 +0200, Helmut Richter wrote:

    On Thu, 15 Oct 2020, Eli the Bearded wrote:

    I can't tell for sure without seeing your page, [...]

    Just tell us the URL (the web address) where we can see your page and thus will discover

    ? what charset you are really using
    ? what the web server says about it
    ? what the web page tells about it
    ? what the default charset would be if none of the above applies

    and whether these four contradict each other.

    Sorry, I should have done that in my original.

    https://brownmath.com/Charsets/

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Eli the Bearded on Fri Oct 16 06:51:12 2020
    On Thu, 15 Oct 2020 22:56:44 +0000 (UTC), Eli the Bearded wrote:

    In comp.infosystems.www.authoring.html,
    Stan Brown <the_stan_brown@fastmail.fm> wrote:
    I have this line in the <head> of my Web pages:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    But perfectly decent characters like , , show up as a question
    mark in a lozenge. I figured out that that's because my HTML files
    are all plain text, 8 characters per byte, which is not UTF8 when I
    use characters above 127.

    The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
    it's all not plain.

    So I changed the charset to latin-1, and then to iso-8859-1. With
    each of them, characters 160-255 display correctly, but the W3C's
    validator gives this error message:
    Bad value ?text/html; charset=iso-8859-1? for attribute
    ?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

    So what charset should I use to represent a file where every
    character is 8 bits, and those 8 bits match the iso=8851-1 or latin-1 character set?
    ...
    I found this gem: "If the attribute is present, its value must be an
    ASCII case-insensitive match for the string "utf-8", because UTF-8 is
    the only valid encoding for HTML5 documents."

    I can't tell for sure without seeing your page,

    A fair point:
    https://brownmath.com/Charsets/

    I created every combination of HTML 4.01 Strict or HTML 5 with utf-8, iso-8859-1, windows-1252, and latin-1.

    but I think you are
    running into the declared document type specifies an allowed list of "charset"s to conformant to that document type. One fix is to declare
    your document to be a type that allows the charset you feel you need to
    use, eg some variant of HTML4.

    HTML 4.01 Strict fails validation also, with charset UTF-8.

    Another fix is to find a compatible
    chaset from the allowed list.

    Yes, I tried that, but the compatible charsets iso-8859-1, latin-1,
    and windows-1252 all fail validation. I didn't see any others in the
    list under "Encodings" on the W3C site, but maybe I missed one.

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Arno Welzel on Fri Oct 16 06:56:04 2020
    On Fri, 16 Oct 2020 10:06:59 +0200, Arno Welzel wrote:

    Stan Brown:

    I'm trying, and failing, to write the proper charset in my meta tag.
    Help, please!

    I have this line in the <head> of my Web pages:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    But perfectly decent characters like , , show up as a question
    mark in a lozenge. I figured out that that's because my HTML files
    are all plain text, 8 characters per byte, which is not UTF8 when I
    use characters above 127.

    So I changed the charset to latin-1, and then to iso-8859-1. With
    each of them, characters 160-255 display correctly, but the W3C's
    validator gives this error message:
    Bad value ?text/html; charset=iso-8859-1? for attribute
    ?content? on element ?meta?: ?charset=? must be followed by ?utf-8?


    Did you try <meta charset="ISO-8859-1">?

    Yes. In HTML 4.01 and 5, same problem as in the longer form
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Fri Oct 16 16:43:45 2020
    Stan Brown:

    On Fri, 16 Oct 2020 10:06:59 +0200, Arno Welzel wrote:

    Stan Brown:

    I'm trying, and failing, to write the proper charset in my meta tag.
    Help, please!

    I have this line in the <head> of my Web pages:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    But perfectly decent characters like é, ×, ² show up as a question
    mark in a lozenge. I figured out that that's because my HTML files
    are all plain text, 8 characters per byte, which is not UTF8 when I
    use characters above 127.

    So I changed the charset to latin-1, and then to iso-8859-1. With
    each of them, characters 160-255 display correctly, but the W3C's
    validator gives this error message:
    Bad value ?text/html; charset=iso-8859-1? for attribute
    ?content? on element ?meta?: ?charset=? must be followed by ?utf-8?


    Did you try <meta charset="ISO-8859-1">?

    Yes. In HTML 4.01 and 5, same problem as in the longer form
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    Indeed - HTML 4 does not know anything about the charset attribute and
    for HTML 5 using UTF-8 is a requiredment. In fact this is the *only*
    allowed encoding for HTML 5. So you have convert your existing documents
    to UTF-8 before publishing them.

    Also see here:

    <https://html.spec.whatwg.org/multipage/semantics.html#character-encoding-declaration>

    4.2.5.4 Specifying the document's character encoding

    A character encoding declaration is a mechanism by which the character
    encoding used to store or transmit a document is specified.

    The Encoding standard requires use of the UTF-8 character encoding and
    requires use of the "utf-8" encoding label to identify it. Those
    requirements necessitate that the document's character encoding
    declaration, if it exists, specifies an encoding label using an ASCII case-insensitive match for "utf-8". Regardless of whether a character
    encoding declaration is present or not, the actual character encoding
    used to encode the document must be UTF-8. [ENCODING]

    To enforce the above rules, authoring tools must default to using UTF-8
    for newly-created documents.



    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Arno Welzel on Fri Oct 16 08:16:00 2020
    On Fri, 16 Oct 2020 16:43:45 +0200, Arno Welzel wrote:

    Stan Brown:

    On Fri, 16 Oct 2020 10:06:59 +0200, Arno Welzel wrote:
    Did you try <meta charset="ISO-8859-1">?

    Yes. In HTML 4.01 and 5, same problem as in the longer form
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    Indeed - HTML 4 does not know anything about the charset attribute and
    for HTML 5 using UTF-8 is a requiredment. In fact this is the *only*
    allowed encoding for HTML 5. So you have convert your existing documents
    to UTF-8 before publishing them.

    Also see here:

    <https://html.spec.whatwg.org/multipage/semantics.html#character-encoding-declaration>

    4.2.5.4 Specifying the document's character encoding
    ...

    The Encoding standard requires use of the UTF-8 character encoding and requires use of the "utf-8" encoding label to identify it.
    ...
    To enforce the above rules, authoring tools must default to using UTF-8
    for newly-created documents.

    Well, heck! It seems unfortunate that they would retroactively change
    the HTML 4.01 standard, which I am 100% certain allowed other
    charsets for quite a few years.

    It seems like my only options are to completely redesign how I
    produce Web pages, or to declare utf-8, but only use characters 000-
    127 and use numeric references for everything >=160, which will bloat
    my documents.


    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Richter@21:1/5 to Stan Brown on Fri Oct 16 18:49:12 2020
    On Fri, 16 Oct 2020, Stan Brown wrote:

    4.2.5.4 Specifying the document's character encoding
    ...

    The Encoding standard requires use of the UTF-8 character encoding and requires use of the "utf-8" encoding label to identify it.
    ...
    To enforce the above rules, authoring tools must default to using UTF-8
    for newly-created documents.

    This *did* surprise me. I had thought the "<meta charset=..."> would have a meaning beyond recognising that one has no choice. Well, I switched to UTF-8 before I switched to HTML5, so I did not notice that as a problem. After all, UTF-8 has existed for 17 years now. And my native tongue requires much more non-ASCII characters than English does, so there is more to change.

    Well, heck! It seems unfortunate that they would retroactively change
    the HTML 4.01 standard, which I am 100% certain allowed other
    charsets for quite a few years.

    You should notice that

    <meta http-equiv="Content-Type" content="text/html; charset=any-code">
    (HTML before HTML5 as well)

    and

    <meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

    have different meanings. meta_http-equiv is a hint to the web server to
    declare the content type and encoding via the HTTP protocol. This hint is actually ignored by the web server you use, I see only "Content-Type: text/html" appearing¹). By many browsers it is also interpreted as if it were a declaration of the encoding used in the document – this is why it works and will probably work as long as HTML4 documents exist and are interpreted by browsers. But strictly speaking, it is not a usage of anything that is well-defined in HTML4. – meta_charset is indeed a declaration of the encoding used in the document, albeit meaningless as there is no choice.

    ¹) The full answer of the web server to the browser's request for https://brownmath.com/Charsets/charset_utf-8_html4.htm was:

    HTTP/1.1 200 OK
    Server: nginx
    Date: Fri, 16 Oct 2020 16:12:13 GMT
    Content-Type: text/html
    Content-Length: 798
    Connection: keep-alive
    Last-Modified: Fri, 16 Oct 2020 13:43:53 GMT
    ETag: "31e-5b1c9f48d5840"
    alt-svc: quic=":443"; ma=86400; v="43,39"
    Host-Header: 5d77dd967d63c3104bced1db0cace49c
    X-Proxy-Cache: MISS
    Accept-Ranges: bytes

    So, you are not in a hurry to change anything, but you should have a plan for the future. You can even validate your non-UTF-8 HTML files:

    * Declare them as HTML4, otherwise it will complain that only UTF-8 is allowed.
    * Before starting the validator, check „More Options“ and fill in the correct encoding.

    I tried it out with https://brownmath.com/Charsets/charset_utf-8_html4.htm, and it worked.

    I consider the behaviour of the validator extreme user-unfriendly. When people use habits that were not only tolerated but even recommended in the past, it could give a hint that and why they are no longer supported and what to do instead.

    It seems like my only options are to completely redesign how I
    produce Web pages, or to declare utf-8, but only use characters 000-
    127 and use numeric references for everything >=160, which will bloat
    my documents.

    I am not sure it requires a complete redesign. When I changed to UTF-8, I had only to tell the editor used that it should encode in UTF-8 instead of ISO-8859-1. Well, I work on a Unix system, and the editor used is emacs, which has such an option. Windows has the problem that it sometime changes the encoding without any notice to the user. When I do have to use Windows, I use Notepad++ which also has an option to control the code to be used. (People always working on Windows will perhaps have better recommendations; I just needed *anything* capable of reliably producing UTF-8 output.)

    For recoding the existing web pages, I had a little script.

    I warn you of installing a legacy workplace consisting of more and more lecagy work-arounds. It is less work to switch to UTF-8 but there is no need to do it all in one night.

    --
    Helmut Richter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jukka K. Korpela@21:1/5 to Stan Brown on Fri Oct 16 23:42:47 2020
    Stan Brown wrote:

    I have this line in the <head> of my Web pages:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    Just remove it, unless it matches the actual encoding used.

    But perfectly decent characters like é, ×, ² show up as a question
    mark in a lozenge.

    Apparently it is not utf-8 encoded.

    I figured out that that's because my HTML files
    are all plain text,

    HTML is by definition not plain text.

    So I changed the charset to latin-1, and then to iso-8859-1. With
    each of them, characters 160-255 display correctly,

    Fine. Stop there. Latin-1 and iso-8859-1 are equivalent, and so is
    windows-1252 in practice.

    but the W3C's
    validator gives this error message:
    Bad value ?text/html; charset=iso-8859-1? for attribute
    ?content? on element ?meta?: ?charset=? must be followed by ?utf-8?

    Ignore it.

    So what charset should I use to represent a file where every
    character is 8 bits, and those 8 bits match the iso=8851-1 or latin-1 character set?

    Windows-1252. But the “validator” still says it’s wrong.

    If that's true, it sounds very much like I can't generate my web
    pages unless I code every 160-255 character as a six-byte &#nnn;
    string, which is not only a pain but makes editing harder.

    You can conform to utf-8 by doing so, or by actually using utf-8.

    But as opposite to using latin-1, it only amounts to worshipping
    whatever WHATWG (and W3C) declared holy. And there are obvious risks
    whenever you eidt your pages using a tool that does not conform to the
    same confession.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jukka K. Korpela@21:1/5 to Eli the Bearded on Fri Oct 16 23:30:41 2020
    Eli the Bearded wrote:

    The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
    it's all not plain.

    That’s nonsense. Plain text is just text, as oppotite to “rich text”, like MS Word format, or HTML.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Jukka K. Korpela on Fri Oct 16 15:00:30 2020
    On Fri, 16 Oct 2020 23:42:47 +0300, Jukka K. Korpela wrote:

    Stan Brown wrote:

    I have this line in the <head> of my Web pages:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    Just remove it, unless it matches the actual encoding used.

    Brilliant! I tried with no <meta .. charset> tag. The characters were
    displayed correctly in the HTML5 and HTML4.01 versions, and the HTML5
    version passed validation. (The W3C validator failed the HTML4.01
    version with "obsolete DOCTYPE", which seems a bit harsh.) The
    revised examples are at <URL:https://brownmath.com/Charsets/>.

    I know that encoding is complicated, but just because the characters
    are displayed correctly in my browsers, is it safe to assume they'll
    be correct in (the great majority of) other browsers?

    I guess in a way I'm asking: what figures out the document encoding
    if it's not specified, the Web server or the user-agent? If it's the
    server, then the fact that they worked for me says they should work
    for anyone. But if it's the browser, maybe not so much.

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Sat Oct 17 00:14:21 2020
    Helmut Richter:

    [...]
    You should notice that

    <meta http-equiv="Content-Type" content="text/html; charset=any-code">
    (HTML before HTML5 as well)

    and

    <meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

    have different meanings. meta_http-equiv is a hint to the web server to declare the content type and encoding via the HTTP protocol. This hint is actually ignored by the web server you use, I see only "Content-Type: text/html" appearing¹).
    [...]

    Because it is not for the server but for the *browser*.

    In fact this meta element is used *instead* sending a HTTP response
    header. That's why it is called "http-equiv" - it should be treated by
    the *browser* in the same way as the respective HTTP header for the Content-Type.


    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Helmut Richter on Fri Oct 16 15:19:52 2020
    On Fri, 16 Oct 2020 18:49:12 +0200, Helmut Richter wrote:

    You should notice that

    <meta http-equiv="Content-Type" content="text/html; charset=any-code">
    (HTML before HTML5 as well)

    and

    <meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

    have different meanings. meta_http-equiv is a hint to the web server to declare the content type and encoding via the HTTP protocol. This hint is actually ignored by the web server you use, I see only "Content-Type: text/html" appearing). By many browsers it is also interpreted as if it were a declaration of the encoding used in the document ? this is why it works and will probably work as long as HTML4 documents exist and are interpreted by browsers. But strictly speaking, it is not a usage of anything that is well-defined in HTML4. ? meta_charset is indeed a declaration of the encoding used in the document, albeit meaningless as there is no choice.

    ) The full answer of the web server to the browser's request for https://brownmath.com/Charsets/charset_utf-8_html4.htm was:

    HTTP/1.1 200 OK
    Server: nginx
    Date: Fri, 16 Oct 2020 16:12:13 GMT
    Content-Type: text/html
    Content-Length: 798
    Connection: keep-alive
    Last-Modified: Fri, 16 Oct 2020 13:43:53 GMT
    ETag: "31e-5b1c9f48d5840"
    alt-svc: quic=":443"; ma=86400; v="43,39"
    Host-Header: 5d77dd967d63c3104bced1db0cace49c
    X-Proxy-Cache: MISS
    Accept-Ranges: bytes

    Interesting. The server doesn't seem to send information about
    document encoding. I guess that means the browser is left to figure
    it out?

    So, you are not in a hurry to change anything, but you should have a plan for the future. You can even validate your non-UTF-8 HTML files:

    * Declare them as HTML4, otherwise it will complain that only UTF-8 is allowed.
    * Before starting the validator, check ?More Options? and fill in the correct encoding.

    I tried it out with https://brownmath.com/Charsets/charset_utf-8_html4.htm, and it worked.

    Actually in my build procedure I don't use the W3C validator. I use
    NSGMLS, an ancient tool that parses against the DOCTYPE specified in
    the document, using the referenced DOCTYPE file. If I'm not
    mistaken, there is no DOCTYPE file for <!DOCTYPE html>, so if I want
    to start publishing HTML5 pages (I do), I'll have to find a command-
    line tool that returns an appropriate pass or fail value, so that my
    makefile knows to stop or keep going.

    My build sequence is:
    1. Do manual edits to source files (not the HTML documents), using
    vim. the source files contain a mix of ordinary text and HTML plus a
    lot of macro and #includes and function calls.
    2. Run a MAKE to rebuild the HTML pages that need it.

    For each HTML page to be rebuilt:
    a. Run the awk script that processes includes, macros and functions
    into static HTML.
    b. Call the local validator, which gives a status result.
    c. If the page validates, go to the next file.
    d. If the page fails to validate, stop.

    I consider the behaviour of the validator extreme user-unfriendly.
    When people use habits that were not only tolerated but even
    recommended in the past, it could give a hint that and why they are
    no longer supported and what to do instead.

    Indeed yes!

    It seems like my only options are to completely redesign how I
    produce Web pages, or to declare utf-8, but only use characters 000-
    127 and use numeric references for everything >=160, which will bloat
    my documents.

    I am not sure it requires a complete redesign. When I changed to UTF-8, I had only to tell the editor used that it should encode in UTF-8 instead of ISO-8859-1. Well, I work on a Unix system, and the editor used is emacs, which
    has such an option.

    Vim does too. There are two problems: (a) I haven't figured out how
    to do editing in that mode, and (b) according to what I read on the
    Web, awk can't handle UTF-8 files correctly if they contain any
    multi-byte characters.

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Sat Oct 17 00:22:07 2020
    Stan Brown:

    On Fri, 16 Oct 2020 23:42:47 +0300, Jukka K. Korpela wrote:

    Stan Brown wrote:

    I have this line in the <head> of my Web pages:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    Just remove it, unless it matches the actual encoding used.

    Brilliant! I tried with no <meta .. charset> tag. The characters were displayed correctly in the HTML5 and HTML4.01 versions, and the HTML5
    version passed validation. (The W3C validator failed the HTML4.01
    version with "obsolete DOCTYPE", which seems a bit harsh.) The
    revised examples are at <URL:https://brownmath.com/Charsets/>.

    Well this is just by chance correct. In fact your server does not send
    any charset at all:

    HTTP/2 200 OK
    server: nginx
    date: Fri, 16 Oct 2020 22:16:16 GMT
    content-type: text/html
    content-length: 784
    last-modified: Fri, 16 Oct 2020 13:44:01 GMT
    etag: "310-5b1c9f5076a40"
    alt-svc: quic=":443"; ma=86400; v="43,39"
    host-header: 5d77dd967d63c3104bced1db0cace49c

    I know that encoding is complicated, but just because the characters
    are displayed correctly in my browsers, is it safe to assume they'll
    be correct in (the great majority of) other browsers?

    That depends on your audience.

    I guess in a way I'm asking: what figures out the document encoding
    if it's not specified, the Web server or the user-agent? If it's the

    Browsers use default characters sets or try to detect it.

    server, then the fact that they worked for me says they should work
    for anyone. But if it's the browser, maybe not so much.

    The server has nothing to do with it - see above: no indication at all
    what encoding is used.


    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Sat Oct 17 00:29:51 2020
    Eli the Bearded:

    The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
    it's all not plain.

    What exactly is not "plain" in a text encoded as UTF-8 or Windows-1252?

    And why do you define "ASCII = plain"? Even ASCII has its history of
    changes and not all 7-bit characters had the same meaning in the past:

    <https://www.aivosto.com/articles/charsets-7bit.html>

    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to usenet@arnowelzel.de on Fri Oct 16 23:58:37 2020
    In comp.infosystems.www.authoring.html,
    Arno Welzel <usenet@arnowelzel.de> wrote:
    Eli the Bearded:
    The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
    it's all not plain.
    What exactly is not "plain" in a text encoded as UTF-8 or Windows-1252?

    It is not "plain" in the sense of how documents without content types
    should be interpreted according to the RFCs I remember reading. Consider

    RFC-2045 - Multipurpose Inter Mail Extensions (MIME) Part One:

    5.2. Content-Type Defaults

    Default RFC 822 messages without a MIME Content-Type header are taken
    by this protocol to be plain text in the US-ASCII character set,
    which can be explicitly specified as:

    Content-type: text/plain; charset=us-ascii

    This default is assumed if no Content-Type header field is specified.
    It is also recommend that this default be assumed when a
    syntactically invalid Content-Type header field is encountered. In
    the presence of a MIME-Version header field and the absence of any
    Content-Type header field, a receiving User Agent can also assume
    that plain US-ASCII text was the sender's intent. Plain US-ASCII
    ^^^^^^^^^^^^^^
    text may still be assumed in the absence of a MIME-Version or the
    ^^^^^^^^^^^^^^^^^^^^^^^^^
    presence of an syntactically invalid Content-Type header field, but
    the sender's intent might have been otherwise.

    and

    RFC-2046 - Multipurpose Inter Mail Extensions (MIME) Part Two:

    4.1.2. Charset Parameter

    A critical parameter that may be specified in the Content-Type field
    for "text/plain" data is the character set. This is specified with a
    "charset" parameter, as in:

    Content-type: text/plain; charset=iso-8859-1

    Unlike some other parameter values, the values of the charset
    parameter are NOT case sensitive. The default character set, which
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    must be assumed in the absence of a charset parameter, is US-ASCII.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    And why do you define "ASCII = plain"? Even ASCII has its history of
    changes and not all 7-bit characters had the same meaning in the past:

    Agreed that ASCII was not created in it's final form.

    <https://www.aivosto.com/articles/charsets-7bit.html>

    The last change to ASCII there is in 1986. The last change there that
    involved the characters enumerated by ASCII was in 1977. The list of
    things that were important for computers in 1977 that are still
    important today is very small. ASCII, awkward as it is for many
    purposes, remains a bedrock upon which other, better things, are
    built. I just don't call UTF-8, eg, "plain text".

    Elijah
    ------
    notes that the unicode character table is written in US-ASCII

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From The Doctor@21:1/5 to *@eli.users.panix.com on Sat Oct 17 01:25:37 2020
    In article <eli$2010161958@qaz.wtf>,
    Eli the Bearded <*@eli.users.panix.com> wrote:
    In comp.infosystems.www.authoring.html,
    Arno Welzel <usenet@arnowelzel.de> wrote:
    Eli the Bearded:
    The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
    it's all not plain.
    What exactly is not "plain" in a text encoded as UTF-8 or Windows-1252?

    It is not "plain" in the sense of how documents without content types
    should be interpreted according to the RFCs I remember reading. Consider

    RFC-2045 - Multipurpose Inter Mail Extensions (MIME) Part One:

    5.2. Content-Type Defaults

    Default RFC 822 messages without a MIME Content-Type header are taken
    by this protocol to be plain text in the US-ASCII character set,
    which can be explicitly specified as:

    Content-type: text/plain; charset=us-ascii

    This default is assumed if no Content-Type header field is specified.
    It is also recommend that this default be assumed when a
    syntactically invalid Content-Type header field is encountered. In
    the presence of a MIME-Version header field and the absence of any
    Content-Type header field, a receiving User Agent can also assume
    that plain US-ASCII text was the sender's intent. Plain US-ASCII
    ^^^^^^^^^^^^^^
    text may still be assumed in the absence of a MIME-Version or the
    ^^^^^^^^^^^^^^^^^^^^^^^^^
    presence of an syntactically invalid Content-Type header field, but
    the sender's intent might have been otherwise.

    and

    RFC-2046 - Multipurpose Inter Mail Extensions (MIME) Part Two:

    4.1.2. Charset Parameter

    A critical parameter that may be specified in the Content-Type field
    for "text/plain" data is the character set. This is specified with a
    "charset" parameter, as in:

    Content-type: text/plain; charset=iso-8859-1

    Unlike some other parameter values, the values of the charset
    parameter are NOT case sensitive. The default character set, which
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    must be assumed in the absence of a charset parameter, is US-ASCII.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

    And why do you define "ASCII = plain"? Even ASCII has its history of
    changes and not all 7-bit characters had the same meaning in the past:

    Agreed that ASCII was not created in it's final form.

    <https://www.aivosto.com/articles/charsets-7bit.html>

    The last change to ASCII there is in 1986. The last change there that >involved the characters enumerated by ASCII was in 1977. The list of
    things that were important for computers in 1977 that are still
    important today is very small. ASCII, awkward as it is for many
    purposes, remains a bedrock upon which other, better things, are
    built. I just don't call UTF-8, eg, "plain text".

    Elijah
    ------
    notes that the unicode character table is written in US-ASCII

    Move with the times!
    --
    Member - Liberal International This is doctor@@nl2k.ab.ca Ici doctor@@nl2k.ab.ca
    Yahweh, Queen & country!Never Satan President Republic!Beware AntiChrist rising!
    Look at Psalms 14 and 53 on Atheism https://www.empire.kred/ROOTNK?t=94a1f39b BC save the Province; on 24 October 2020, vote Liberal and not NDP!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phillip Helbig (undress to reply@21:1/5 to the_stan_brown@fastmail.fm on Sat Oct 17 06:03:49 2020
    In article <MPG.39f3b7d5df3b90e798fd73@news.individual.net>, Stan Brown <the_stan_brown@fastmail.fm> writes:

    I know that encoding is complicated, but just because the characters
    are displayed correctly in my browsers, is it safe to assume they'll
    be correct in (the great majority of) other browsers?

    In general, no.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Richter@21:1/5 to Arno Welzel on Sat Oct 17 10:36:35 2020
    On Sat, 17 Oct 2020, Arno Welzel wrote:

    Helmut Richter:

    [...]
    You should notice that

    <meta http-equiv="Content-Type" content="text/html; charset=any-code">
    (HTML before HTML5 as well)

    and

    <meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

    have different meanings. meta_http-equiv is a hint to the web server to declare the content type and encoding via the HTTP protocol. This hint is actually ignored by the web server you use, I see only "Content-Type: text/html" appearing¹).
    [...]

    Because it is not for the server but for the *browser*.

    In fact this meta element is used *instead* sending a HTTP response
    header. That's why it is called "http-equiv" - it should be treated by
    the *browser* in the same way as the respective HTTP header for the Content-Type.

    Sounds reasonable. Thank you.

    So the author of the wep page can be a little more sure that the browser
    feels obliged to respect it. As far as I know, browsers also honour
    <meta charset="iso-8859-1"> even though it does not conform to the
    standard.

    --
    Helmut Richter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jukka K. Korpela@21:1/5 to Eli the Bearded on Sat Oct 17 16:33:13 2020
    Eli the Bearded wrote:

    What exactly is not "plain" in a text encoded as UTF-8 or Windows-1252?

    It is not "plain" in the sense of how documents without content types
    should be interpreted according to the RFCs I remember reading.

    The RFC that defines “plain text” as a media type (text/plain) is RFC
    2046. It has been updated by other RFCs, but this fundamental definition
    has not changed:

    (1) text -- textual information. The subtype "plain" in
    particular indicates plain text containing no
    formatting commands or directives of any sort. Plain
    text is intended to be displayed "as-is". No special
    software is required to get the full meaning of the
    text, aside from support for the indicated character
    set.

    Thus, HTML is by definition not plain text. It is required to contain
    markup, and it is not intended to be displayed “as-is”, with <!doctype
    and start and end tags and entity references included.

    that plain US-ASCII text was the sender's intent. Plain US-ASCII
    ^^^^^^^^^^^^^^
    text may still be assumed in the absence of a MIME-Version or the
    ^^^^^^^^^^^^^^^^^^^^^^^^^
    presence of an syntactically invalid Content-Type header field, but
    the sender's intent might have been otherwise.

    Statements like this just formalize the idea that e-mail content is to
    be taken as Ascii encoded plain text, unless specified otherwise.
    “Plain” and “US-ASCII” are two distinct attributes here. A Content-Type header may override either or both of them.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jukka K. Korpela@21:1/5 to Stan Brown on Sat Oct 17 16:15:23 2020
    Stan Brown wrote:

    On Fri, 16 Oct 2020 23:42:47 +0300, Jukka K. Korpela wrote:

    Stan Brown wrote:

    I have this line in the <head> of my Web pages:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    Just remove it, unless it matches the actual encoding used.

    Brilliant! I tried with no <meta .. charset> tag. The characters were displayed correctly in the HTML5 and HTML4.01 versions, and the HTML5
    version passed validation.

    What I tried to say is that declaring an encoding that is not the actual encoding used (or compatible with it) is worse than not declaring the
    encoding at all. This gives the user agent a chance to guess right, as
    opposite to applying wrong information.

    I know that encoding is complicated, but just because the characters
    are displayed correctly in my browsers, is it safe to assume they'll
    be correct in (the great majority of) other browsers?

    Encoding isn’t that complicated, but guessing the encoding is. The
    WHATWG description deals with the overall process rather than specific heuristics, but it seems very probable that browsers will guess
    correctly between windows-1252 and utf-8 if actual non-Ascii data
    appears within 1,000 or so characters in the HTML file. But of course it
    is not completely safe. https://html.spec.whatwg.org/multipage/parsing.html#determining-the-character-encoding

    I guess in a way I'm asking: what figures out the document encoding
    if it's not specified, the Web server or the user-agent?

    In theory, it could also be the server. There is no law against a server
    scan of a document to guess the encoding and to add an HTTP header
    accordingly. But I have not heard of such things, and it does not sound productive. So it’s the use agent.

    The practical way is to use

    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

    and to ignore what the validator says about it. You can even use
    automated ignoring by using the W3C validator tools for hiding messages
    by type.

    WHATWG and W3C just wish to promote UTF-8 on all pages at any cost.
    That’s why they specify that only UTF-8 is kosher and make the validator
    nag about it.

    The theoretically most correct way is to make the server send HTTP
    headers specifying the encoding. I have no idea how to do that when
    using Nginx. You might need access to the server configuration files.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Jukka K. Korpela on Sat Oct 17 13:03:04 2020
    On Sat, 17 Oct 2020 16:15:23 +0300, Jukka K. Korpela wrote:

    Stan Brown wrote:

    On Fri, 16 Oct 2020 23:42:47 +0300, Jukka K. Korpela wrote:
    What I tried to say is that declaring an encoding that is not the
    actual encoding used (or compatible with it) is worse than not
    declaring the encoding at all. This gives the user agent a chance
    to guess right, as opposite to applying wrong information.

    I know that encoding is complicated, but just because the characters
    are displayed correctly in my browsers, is it safe to assume they'll
    be correct in (the great majority of) other browsers?

    [Answer: not completely safe]

    The practical way is to use

    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

    QUESTION 1: Any reason you suggest that rather than the simpler
    <meta charset="windows-1252">
    ? This page says the two forms are equivalent in HTML5: https://stackoverflow.com/questions/4696499/meta-charset-utf-8-vs- meta-http-equiv-content-type

    and to ignore what the validator says about it. You can even use
    automated ignoring by using the W3C validator tools for hiding messages
    by type.

    Yes, I found the vnu validator as a windows binary here: https://github.com/validator/validator/releases/tag/20.6.30
    and I noticed that one option lets me filter out messages.

    I'm quite excited about this -- it should make it possible to switch
    everything from valid HTML 4.01 to valid HTML5, _and_ do better
    validation. (I found that vnu even parses CSS inside <style>...
    </style>.)

    QUESTION 2: It would be awfully convenient to type a Windows
    apostrophe (8-bit character 146) rather than &#8217; or &#x2019;. If
    I specify a charset of windows-1252, am I safe to do that, or should
    I still stay away from Windows characters 128-159?

    QUESTION 3: If I should still stay away from 128-159, even with a
    windows-1252 declaration, is there any particular reason you suggest windows-1252 rather than iso-8859-1? know they're the same for 32-
    127 and 160-255, but in my mind windows-1252 suggests that I'll be
    using Windows 128-159, and iso-8859-1 does not.

    WHATWG and W3C just wish to promote UTF-8 on all pages at any cost.
    That?s why they specify that only UTF-8 is kosher and make the validator
    nag about it.

    The theoretically most correct way is to make the server send HTTP
    headers specifying the encoding. I have no idea how to do that when
    using Nginx. You might need access to the server configuration files.

    I think I can get that access, probably via some override file in my
    root directory. In fact, there's already a .htaccess file there with
    one AddType, so I think it must be an Apache server or a workalike.
    I should be able to add
    AddType text/plain;charset=windows-1252
    AddType text/html;charset=windows-1252
    and have the server emit the desired headers. But the stackoverflow
    article above makes the point that we still want to include a charset
    in each file, for the folks who download a file for later reading.

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Richter@21:1/5 to Stan Brown on Sat Oct 17 22:35:32 2020
    On Sat, 17 Oct 2020, Stan Brown wrote:

    QUESTION 2: It would be awfully convenient to type a Windows
    apostrophe (8-bit character 146) rather than &#8217; or &#x2019;. If
    I specify a charset of windows-1252, am I safe to do that, or should
    I still stay away from Windows characters 128-159?

    There is no reason to stay away from code points that are defined in the
    code.

    (I don’t have the problem, though. If I want a real apostrophe like the
    one in the preceding sentence, I just type it (on my keyboard AltGr+'),
    and it lands in the file as the UTF-8 representation of that character.
    When I look into that file on the screen or I when print it, I see exactly
    the apostrophe I typed in – as an apostroph, not as a code point number.
    No need ever to use &#... or to worry about code point numbers.)

    QUESTION 3: If I should still stay away from 128-159, even with a windows-1252 declaration, is there any particular reason you suggest windows-1252 rather than iso-8859-1? know they're the same for 32-
    127 and 160-255, but in my mind windows-1252 suggests that I'll be
    using Windows 128-159, and iso-8859-1 does not.

    If you use these code points, you have to specify windows-1252; if not,
    the effect is the same for the two code names.

    --
    Helmut Richter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Sat Oct 17 23:41:54 2020
    Eli the Bearded:

    In comp.infosystems.www.authoring.html,
    Arno Welzel <usenet@arnowelzel.de> wrote:
    Eli the Bearded:
    The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
    it's all not plain.
    What exactly is not "plain" in a text encoded as UTF-8 or Windows-1252?

    It is not "plain" in the sense of how documents without content types
    should be interpreted according to the RFCs I remember reading. Consider

    RFC-2045 - Multipurpose Inter Mail Extensions (MIME) Part One:

    5.2. Content-Type Defaults

    Default RFC 822 messages without a MIME Content-Type header are taken
    by this protocol to be plain text in the US-ASCII character set,
    which can be explicitly specified as:

    Read carefully:

    "plain text in the US-ASCII character set"

    This means "plain text" *and* "in the US-ASCII character set".

    There is no definition that "plain text" must be US-ASCII only.

    Content-type: text/plain; charset=us-ascii

    This default is assumed if no Content-Type header field is specified.

    Yes - because there is an RFC which defines a specific context where a
    missing Content-Type means that content is understood as text encoded in US-ASCII. However *if* there is a content type then "text/plain" is also
    valid with other encodings:

    Content-type: text/plain; charset=utf-8

    Is no less "plain text" as the one using US-ASCII. That's why I would
    not say that "plain text" is the sime like "plain text using US-ASCII".

    See for example: <https://arnowelzel.de/samples/plain-text-utf8/>


    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Arno Welzel on Sat Oct 17 18:37:51 2020
    On Sat, 17 Oct 2020 00:22:07 +0200, Arno Welzel wrote:

    Stan Brown:

    On Fri, 16 Oct 2020 23:42:47 +0300, Jukka K. Korpela wrote:

    Stan Brown wrote:

    I have this line in the <head> of my Web pages:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    Just remove it, unless it matches the actual encoding used.

    Brilliant! I tried with no <meta .. charset> tag. The characters were displayed correctly in the HTML5 and HTML4.01 versions, and the HTML5 version passed validation. (The W3C validator failed the HTML4.01
    version with "obsolete DOCTYPE", which seems a bit harsh.) The
    revised examples are at <URL:https://brownmath.com/Charsets/>.

    Well this is just by chance correct. In fact your server does not send
    any charset at all:

    HTTP/2 200 OK
    server: nginx
    date: Fri, 16 Oct 2020 22:16:16 GMT
    content-type: text/html
    content-length: 784
    last-modified: Fri, 16 Oct 2020 13:44:01 GMT
    etag: "310-5b1c9f5076a40"
    alt-svc: quic=":443"; ma=86400; v="43,39"
    host-header: 5d77dd967d63c3104bced1db0cace49c

    Thanks for this. Apparently nginx accepts Apache directives in the
    .htaccess file. I've added them. I used W3C's i18n checker at https://validator.w3.org/i18n-checker/
    to verify that the server now sends "charset windows-1252". And in
    both Firefox and Chrome, the Windows-1252 characters are displayed
    correctly even if I have a conflicting charset declared in the actual
    html file.

    If you have a different browser, and if you care to check, could you
    let me know how
    https://brownmath.com/Charsets/charset_utf-8_html5.htm
    shows up in your browser, whether the Windows characters in the last
    paragraph are displayed?

    And I'll change my scripts to declare a charset of windows-1252(*)
    instead of utf-8, and <!DOCTYPE html>, and run everything through
    W3C's command-line verifier. Fun!

    (*)Unless someone thinks I should use iso-8859-1. But I'd kind of
    like to use the Windows characters in code points 128-159: having the
    quotes and dashes instead of &...; codes would simplify editing.

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jukka K. Korpela@21:1/5 to Stan Brown on Sun Oct 18 13:49:41 2020
    Stan Brown wrote:

    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

    QUESTION 1: Any reason you suggest that rather than the simpler
    <meta charset="windows-1252">

    No good reason. I just wrote the original format because I learned it 25
    years or so ago

    ? This page says the two forms are equivalent in HTML5:

    They are. The people who worked on HTML5 detected that all browsers
    treat <meta charset="windows-1252"> as equivalent to the defined format.
    I’m not sure what kind of accident this was, but anyway they made it a rule.
    QUESTION 2: It would be awfully convenient to type a Windows
    apostrophe (8-bit character 146) rather than &#8217; or &#x2019;. If
    I specify a charset of windows-1252, am I safe to do that, or should
    I still stay away from Windows characters 128-159?

    You’re safe. Twenty years ago it was different.

    QUESTION 3: If I should still stay away from 128-159, even with a windows-1252 declaration, is there any particular reason you suggest windows-1252 rather than iso-8859-1?

    The reason is that browsers treat iso-8859-1 as windows-1252, and HTML5
    made this the rule. In the old times it was different, mainly in the
    sense that browsers running on Unix platforms actually treated
    iso-8859-1 declared data so that octets 128–159 were control characters
    and sometimes had odd effects.

    I think I can get that access, probably via some override file in my
    root directory. In fact, there's already a .htaccess file there with
    one AddType, so I think it must be an Apache server or a workalike.
    I should be able to add
    AddType text/plain;charset=windows-1252
    AddType text/html;charset=windows-1252
    and have the server emit the desired headers.

    I’m afraid Nginx does not support .htaccess but has other tools.

    But the stackoverflow
    article above makes the point that we still want to include a charset
    in each file, for the folks who download a file for later reading.

    That’s a valid point, because browsers probably still haven’t learned to save a web page locally in a proper way. That is, they don’t use the
    HTTP headers when saving the file. This is understandable, since file
    systems generally lack a file type concept that involves character
    encoding, and to save the encoding information in the file itself, the
    browser would need to insert a tag there. This means that the browser
    would need to 1) save the file as a serialization of the browser’s
    internal data structure for it, with a meta element inserted, thereby
    producing something that might differ very much from the original file,
    or 2) to operate on the document as text and inserting a meta element at
    the appropriate place.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jukka K. Korpela@21:1/5 to Helmut Richter on Sun Oct 18 15:06:11 2020
    Helmut Richter wrote:

    On Sat, 17 Oct 2020, Stan Brown wrote:

    QUESTION 2: It would be awfully convenient to type a Windows
    apostrophe (8-bit character 146) rather than &#8217; or &#x2019;. If
    I specify a charset of windows-1252, am I safe to do that, or should
    I still stay away from Windows characters 128-159?

    There is no reason to stay away from code points that are defined in the code.

    Well, apart from some code points not being assigned to any character,
    or some assigned characters being somewhat questionable. (For example,
    how often would it make sense to use the florin sign ƒ?) Sorry, today is
    my nitpicking day.

    (I don’t have the problem, though. If I want a real apostrophe like the
    one in the preceding sentence, I just type it (on my keyboard AltGr+'),

    I just press the key labeled with the Ascii apostophe ('). Well, that’s
    how I use my personal keyboard layout when typing text (as opposite to
    code), and using the standard Finnish international layout I need to use AltGr+'

    and it lands in the file as the UTF-8 representation of that character.

    This depends on the software that processes the typed characters.

    QUESTION 3: If I should still stay away from 128-159, even with a
    windows-1252 declaration, is there any particular reason you suggest
    windows-1252 rather than iso-8859-1? know they're the same for 32-
    127 and 160-255, but in my mind windows-1252 suggests that I'll be
    using Windows 128-159, and iso-8859-1 does not.

    If you use these code points, you have to specify windows-1252; if not,
    the effect is the same for the two code names.

    No, the effect is always the same on all browsers use nowadays (possibly excluding some you might see in a museum of technology).

    Browsers treat iso-8859-1 as an alias for windows-1252. Technically,
    they are to distinct encodings and differ in the 128–159 range, but in
    the HTML context, they are the same. You can see this by creating a test document with loads of characters in that range, in windows-1252
    encoding, and declaring the document as iso-8859-1 encoded in all
    possible ways. It’s still processed and shown as windows-1252 encoded.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Richter@21:1/5 to Jukka K. Korpela on Sun Oct 18 15:34:47 2020
    On Sun, 18 Oct 2020, Jukka K. Korpela wrote:

    Helmut Richter wrote:

    On Sat, 17 Oct 2020, Stan Brown wrote:

    QUESTION 2: It would be awfully convenient to type a Windows
    apostrophe (8-bit character 146) rather than &#8217; or &#x2019;. If
    I specify a charset of windows-1252, am I safe to do that, or should
    I still stay away from Windows characters 128-159?

    There is no reason to stay away from code points that are defined in the code.

    Well, apart from some code points not being assigned to any character, or some
    assigned characters being somewhat questionable. (For example, how often would
    it make sense to use the florin sign ƒ?) Sorry, today is my nitpicking day.

    If you need it, you can use it. There are many usable characters I have
    never used.

    (I don’t have the problem, though. If I want a real apostrophe like the one in the preceding sentence, I just type it (on my keyboard AltGr+'),

    I just press the key labeled with the Ascii apostophe ('). Well, that’s how I
    use my personal keyboard layout when typing text (as opposite to code), and using the standard Finnish international layout I need to use AltGr+'

    and it lands in the file as the UTF-8 representation of that character.

    This depends on the software that processes the typed characters.

    Yes, of course. My remark has another background: Instead of thinking how
    to produce a character for this or that purpose, I have once and for all installed the software that each „ä“, „é“, or „ע“ is the same whatever the
    purpose: in a letter to be sent, as part of command, as text in an HTML
    page, as part of a filename, or anything else. It is represented as the
    same bit pattern in all uses, at least in all where I can control that bit pattern (PDF does it differently AFAIK). Depending on the underlying
    system, it may be some work until everything fits together, but from then onward it is much easier. It is of no use to make an exception just for
    one application. But eveybody may do so if they like.

    QUESTION 3: If I should still stay away from 128-159, even with a windows-1252 declaration, is there any particular reason you suggest windows-1252 rather than iso-8859-1? know they're the same for 32-
    127 and 160-255, but in my mind windows-1252 suggests that I'll be
    using Windows 128-159, and iso-8859-1 does not.

    If you use these code points, you have to specify windows-1252; if not,
    the effect is the same for the two code names.

    No, the effect is always the same on all browsers use nowadays (possibly excluding some you might see in a museum of technology).

    Yes, but I hate to write iso-8859-1 when it is a lie, whereas windows-1252 would work exactly the same and would be true. In effect, one relies on a (common and arguably user-friendly) bug in the browsers. This is the same
    as writing a comma instead of an opening single quote which looks the
    same, just because the comma is typed faster. Such tricks may be adequate
    if there is no other work-around for a problem but not on a regular basis.

    --
    Helmut Richter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Helmut Richter on Sun Oct 18 07:29:58 2020
    On Sun, 18 Oct 2020 15:34:47 +0200, Helmut Richter wrote:
    Yes, but I hate to write iso-8859-1 when it is a lie, whereas windows-1252 would work exactly the same and would be true.

    I happened to read Jukka's followup before yours, but I think you put
    my feeling into better words than I could.

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Jukka K. Korpela on Sun Oct 18 07:26:07 2020
    On Sun, 18 Oct 2020 13:49:41 +0300, Jukka K. Korpela wrote:

    Stan Brown wrote:

    <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">

    QUESTION 1: Any reason you suggest that rather than the simpler
    <meta charset="windows-1252">

    No good reason. I just wrote the original format because I learned it 25 years or so ago

    Got it; thanks! I think I'll use the shorter form, then. Fortunately
    it's inside an include file, so I only need to change it once.

    QUESTION 2: It would be awfully convenient to type a Windows
    apostrophe (8-bit character 146) rather than &#8217; or &#x2019;. If
    I specify a charset of windows-1252, am I safe to do that, or should
    I still stay away from Windows characters 128-159?

    You?re safe. Twenty years ago it was different.

    That's good news; thanks. (And how bizarre it is to talk about
    "twenty years ago" in a Web context. Where did the time go?

    QUESTION 3: If I should still stay away from 128-159, even with a windows-1252 declaration, is there any particular reason you suggest windows-1252 rather than iso-8859-1?

    The reason is that browsers treat iso-8859-1 as windows-1252, and HTML5
    made this the rule. In the old times it was different, mainly in the
    sense that browsers running on Unix platforms actually treated
    iso-8859-1 declared data so that octets 128?159 were control characters
    and sometimes had odd effects.

    Wow! I would never have guessed that: I take those character sets
    literally.

    I think I can get that access, probably via some override file in my
    root directory. In fact, there's already a .htaccess file there with
    one AddType, so I think it must be an Apache server or a workalike.
    I should be able to add
    AddType text/plain;charset=windows-1252
    AddType text/html;charset=windows-1252
    and have the server emit the desired headers.

    I?m afraid Nginx does not support .htaccess but has other tools.

    Hmm ... it seems to work for me as though it were Apache. I added
    these lines to my existing .htaccess file:

    AddType 'text/html; charset=windows-1252' htm
    AddType 'text/html; charset=windows-1252' html

    and then tried a couple of retrieved with W3C's i18n tool at <URL:https://validator.w3.org/i18n-checker/>, and the output showed
    that the server was now sending windows-1252. Am I misinterpreting
    something, or is that tool not reliable?

    But the stackoverflow
    article above makes the point that we still want to include a charset
    in each file, for the folks who download a file for later reading.

    That?s a valid point, because browsers probably still haven?t learned to
    save a web page locally in a proper way. That is, they don?t use the
    HTTP headers when saving the file. This is understandable, ...

    Makes sense.

    Thanks again for your help!

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to All on Sun Oct 18 08:03:43 2020
    On Thu, 15 Oct 2020 14:31:10 -0700, I started this thread with:

    I'm trying, and failing, to write the proper charset in my meta tag.
    Help, please!

    A very big thank-you to all those who responded! I have learned quite
    a lot in the past few days, and you were a big help in that. Here are
    changes completed or in progress:

    * Server now declares web pages as windows-1252 character set, which
    is what they are.

    * Learned about W3C's i18n tool, an easy way to check headers
    relevant to encoding as sent by the server: <URL:https://validator.w3.org/i18n-checker/>

    * Dumped HTML 4.01 and now use <!DOCTYPE html>. I had put that off
    for far too long.

    * Replaced <meta http-equiv ...> with <meta charset ...>. This is
    redundant for online viewing, but may be helpful for viewing saved
    Web pages off line.

    * W3C's command-line checker (vnu) is installed and is now part of my
    build process (replacing NSGMLS). Not only does it validate according
    to HTML, it checks inline CSS, both <style> and style="..."
    attributes.

    * Figured out the --filterfile option in vnu, so that it suppresses
    messages about my non-utf-8 character set. (And they implemented that
    right: if the suppressed messages are the only errors, the checker
    returns an exit status of 0, not 1.)

    * Now using actual characters for the whole range 32-255, instead of
    &#...; in the range 128-255. That includes Windows quote marks and
    dashes, for instance, which will reduce file sizes and of course be
    easier for me to read in the raw files.

    At this point I'm not converting to utf-8, though perhaps in the
    future. But despite W3C pushing for it very hard, I've learned that
    it's not necessary.

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas 'PointedEars' Lahn@21:1/5 to Eli the Bearded on Tue Oct 20 02:08:13 2020
    Eli the Bearded wrote:

    In comp.infosystems.www.authoring.html,
    Stan Brown <the_stan_brown@fastmail.fm> wrote:
    I have this line in the <head> of my Web pages:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    But perfectly decent characters like é, ×, ² show up as a question
    mark in a lozenge. I figured out that that's because my HTML files
    are all plain text, 8 characters per byte, which is not UTF8 when I
    use characters above 127.

    The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
    it's all not plain.

    Nonsense. “Plain text” means – literally – content that can be read by a
    person as opposed to “binary” data; that is, content where byte sequences represent characters, in particular digits and letters.

    --
    PointedEars
    FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
    Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas 'PointedEars' Lahn@21:1/5 to Helmut Richter on Tue Oct 20 02:14:22 2020
    Helmut Richter wrote:

    You should notice that

    <meta http-equiv="Content-Type" content="text/html; charset=any-code">
    (HTML before HTML5 as well)

    and

    <meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

    have different meanings. meta_http-equiv is a hint to the web server to declare the content type and encoding via the HTTP protocol. […]

    Not at all. How did you get that idea? It is not a job of a Web server to *interpret* the body of a HTTP message in order to generate a header for
    that HTTP message. Parsing and interpreting HTML, for example, is solely
    the domain of a HTML user agent.

    Instead, both HTML elements are a *substitute* – an *equivalent* – for the Content-Type HTTP header field, to be used by the Web _browser_, if that
    header field is not sent by the Web server.

    The various HTML Specifications make that very clear.

    --
    PointedEars
    FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
    Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas 'PointedEars' Lahn@21:1/5 to Jukka K. Korpela on Tue Oct 20 02:23:05 2020
    Jukka K. Korpela wrote:

    HTML is by definition not plain text.

    That is plain false.

    --
    PointedEars
    FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
    Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas 'PointedEars' Lahn@21:1/5 to Jukka K. Korpela on Tue Oct 20 02:21:41 2020
    Jukka K. Korpela wrote:

    Eli the Bearded wrote:
    The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
    it's all not plain.

    That’s nonsense. Plain text is just text, as oppotite to “rich text”, like MS Word format, or HTML.

    By contrast to the Rich Text Format and MS Word format(s), HTML *is* a plain-text format because whether a file is a “plain text” file does not depend on the presentation of the content, only on the meaning of the octet sequences.

    Therefore its media type “text/html” belongs to (and starts with) the type “text”. By contrast, e.g. the media types for the Rich Text Format (.rtf) files is “application/rtf” and of MS Word 2003+ documents (.docx) is “application/vnd.openxmlformats-officedocument.wordprocessingml.document”.

    --
    PointedEars
    FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
    Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jukka K. Korpela@21:1/5 to Lahn on Tue Oct 20 10:08:18 2020
    Lahn wrote:

    Helmut Richter wrote:

    You should notice that

    <meta http-equiv="Content-Type" content="text/html; charset=any-code">
    (HTML before HTML5 as well)

    and

    <meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

    have different meanings. meta_http-equiv is a hint to the web server to
    declare the content type and encoding via the HTTP protocol. […]

    Not at all. How did you get that idea?

    Perhaps from the HTML specifications.

    The original idea was that servers could parse the start of an HTML
    document and use <meta http-equiv=...> content to generate HTTP headers.

    This didn’t happen. Instead, servers use various configuration files or settings to decide on HTTP response headers. Browsers, on the other
    hand, started using <meta> tags at least to some extent, e.g. when
    server response does not specify charset.

    It is not a job of a Web server to
    *interpret* the body of a HTTP message in order to generate a header for
    that HTTP message. Parsing and interpreting HTML, for example, is solely
    the domain of a HTML user agent.

    Instead, both HTML elements are a *substitute* – an *equivalent* – for the
    Content-Type HTTP header field, to be used by the Web _browser_, if that header field is not sent by the Web server.

    The various HTML Specifications make that very clear.

    ”HTTP servers may read the content of the document HEAD to generate
    header fields corresponding to any elements defining a value for the
    attribute HTTP-EQUIV.” https://www.w3.org/MarkUp/html-spec/html-spec_5.html#SEC5.2.5

    ”HTTP servers may use the property name specified by the HTTP-EQUIV
    attribute to create an RFC 822 style header in the HTTP response.” https://www.w3.org/TR/2018/SPSD-html32-20180315/#meta

    ”http-equiv = name [CI]
    This attribute may be used in place of the name attribute. HTTP servers
    use this attribute to gather information for HTTP response message headers.” https://www.w3.org/TR/html401/struct/global.html#h-7.4.4.2

    Since that’s not how things actually worked, HTML5 specs don’t even
    mention the possibility of servers using <meta> tags. Neither do they
    prohibit such things; they don’t really deal with the operation of
    servers. The early HTML5 drafts/specs didn’t even allow <meta
    http-equiv=...> and instead used the <meta charset=...> invention, which
    was, from the beginning, meant to be handled by user agents.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Richter@21:1/5 to Thomas 'PointedEars' Lahn on Tue Oct 20 10:09:44 2020
    On Tue, 20 Oct 2020, Thomas 'PointedEars' Lahn wrote:

    Helmut Richter wrote:

    You should notice that

    <meta http-equiv="Content-Type" content="text/html; charset=any-code">
    (HTML before HTML5 as well)

    and

    <meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

    have different meanings. meta_http-equiv is a hint to the web server to declare the content type and encoding via the HTTP protocol. […]

    Not at all. How did you get that idea? It is not a job of a Web server to *interpret* the body of a HTTP message in order to generate a header for
    that HTTP message. Parsing and interpreting HTML, for example, is solely
    the domain of a HTML user agent.

    Thank you for repeating <huuk9tF11ncU1@mid.individual.net>. I understood
    that one as well, though.

    --
    Helmut Richter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Tue Oct 20 15:34:52 2020
    Thomas 'PointedEars' Lahn:

    Helmut Richter wrote:

    You should notice that

    <meta http-equiv="Content-Type" content="text/html; charset=any-code">
    (HTML before HTML5 as well)

    and

    <meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

    have different meanings. meta_http-equiv is a hint to the web server to
    declare the content type and encoding via the HTTP protocol. […]

    Not at all. How did you get that idea? It is not a job of a Web server to *interpret* the body of a HTTP message in order to generate a header for
    that HTTP message. Parsing and interpreting HTML, for example, is solely
    the domain of a HTML user agent.

    Instead, both HTML elements are a *substitute* – an *equivalent* – for the
    Content-Type HTTP header field, to be used by the Web _browser_, if that header field is not sent by the Web server.

    The various HTML Specifications make that very clear.

    JFTR - HTML 4.01 already mentioned that servers parse the document and
    use meta elements to create response headers, eventhough I have never
    seen this in real world implementations:

    <https://www.w3.org/TR/html401/struct/global.html#adef-http-equiv>

    "http-equiv = name [CI]

    This attribute may be used in place of the name attribute. HTTP servers
    use this attribute to gather information for HTTP response message headers."

    It seems there is a module for Apache 2 to deal with this - but I doubt
    this is still in use anywhere:

    <https://metacpan.org/pod/Apache2::HttpEquiv>

    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Arno Welzel on Tue Oct 20 06:46:29 2020
    On Tue, 20 Oct 2020 15:36:53 +0200, Arno Welzel wrote:

    Stan Brown:

    On Thu, 15 Oct 2020 14:31:10 -0700, I started this thread with:

    I'm trying, and failing, to write the proper charset in my meta tag.
    Help, please!

    A very big thank-you to all those who responded! I have learned quite
    a lot in the past few days, and you were a big help in that. Here are changes completed or in progress:
    [...]

    Thank you for this summary of your findings.

    It seemed the least I could do, after all the help I received. I've
    learned a huge amount these last few days, and now I'm in the process
    of bringing my Web pages up to date.

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Tue Oct 20 15:36:53 2020
    Stan Brown:

    On Thu, 15 Oct 2020 14:31:10 -0700, I started this thread with:

    I'm trying, and failing, to write the proper charset in my meta tag.
    Help, please!

    A very big thank-you to all those who responded! I have learned quite
    a lot in the past few days, and you were a big help in that. Here are
    changes completed or in progress:
    [...]

    Thank you for this summary of your findings.


    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to cljs@PointedEars.de on Tue Oct 20 18:40:51 2020
    In comp.infosystems.www.authoring.html,
    Thomas 'PointedEars' Lahn <cljs@PointedEars.de> wrote:
    Eli the Bearded wrote:
    The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
    it's all not plain.
    Nonsense. "Plain text" means - literally - content that can be read
    by a person as opposed to "binary" data; that is, content where byte sequences represent characters, in particular digits and letters.

    So, by that rule, anything in RAM, on magnetic disk, on magnetic tape,
    on SSD, on DVD-R or CD-ROM, in transit over ethernet or wifi, all of
    those are _not plain text_.

    (As an aside, I'm seeing that my stance that US-ASCII is "plain text"
    and "plain text" does not necessarily mean "text/plain" is an unpopular
    one. I'm tired of arguing the point, but no one has convinced me that
    I'm wrong.)

    Elijah
    ------
    utf-8 in the sheets, ascii in the style sheets

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Richter@21:1/5 to Eli the Bearded on Tue Oct 20 20:43:49 2020
    On Tue, 20 Oct 2020, Eli the Bearded wrote:

    (As an aside, I'm seeing that my stance that US-ASCII is "plain text"

    This why ,d,,

    and "plain text" does not necessarily mean "text/plain" is an unpopular
    one. I'm tired of arguing the point, but no one has convinced me that
    I'm wrong.)

    Elijah
    ------
    utf-8 in the sheets, ascii in the style sheets


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phillip Helbig (undress to reply@21:1/5 to *@eli.users.panix.com on Tue Oct 20 19:29:41 2020
    In article <eli$2010201433@qaz.wtf>, Eli the Bearded
    <*@eli.users.panix.com> writes:

    In comp.infosystems.www.authoring.html,
    Thomas 'PointedEars' Lahn <cljs@PointedEars.de> wrote:
    Eli the Bearded wrote:
    The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
    it's all not plain.
    Nonsense. "Plain text" means - literally - content that can be read
    by a person as opposed to "binary" data; that is, content where byte sequences represent characters, in particular digits and letters.

    So, by that rule, anything in RAM, on magnetic disk, on magnetic tape,
    on SSD, on DVD-R or CD-ROM, in transit over ethernet or wifi, all of
    those are _not plain text_.

    (As an aside, I'm seeing that my stance that US-ASCII is "plain text"
    and "plain text" does not necessarily mean "text/plain" is an unpopular
    one. I'm tired of arguing the point, but no one has convinced me that
    I'm wrong.)

    Elijah
    ------
    utf-8 in the sheets, ascii in the style sheets

    Is PostScript plain text?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Wed Oct 21 04:27:37 2020
    Phillip Helbig (undress to reply):

    [...]
    Is PostScript plain text?

    It can be:

    <http://paulbourke.net/dataformats/postscript/>


    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jukka K. Korpela@21:1/5 to Arno Welzel on Wed Oct 21 13:52:17 2020
    Arno Welzel wrote:

    Phillip Helbig (undress to reply):

    [...]
    Is PostScript plain text?

    It can be:

    <http://paulbourke.net/dataformats/postscript/>

    That old document seems to say that PostScript is plain text, since you
    can create, edit, and read a PostScript file using a text editor. But
    that’s not how ”plain text” is defined in MIME:

    The simplest and most important subtype of "text" is "plain". This
    indicates plain text that does not contain any formatting commands or
    directives. Plain text is intended to be displayed "as-is", that is,
    no interpretation of embedded formatting commands, font attribute
    specifications, processing instructions, interpretation directives,
    or content markup should be necessary for proper display. T
    https://tools.ietf.org/html/rfc2046#section-4.1.3

    ObHTML: Similarly, HTML is not plain text.

    Technically, PostScript isn’t even classified as text; the media type
    for it is application/postscript. This does not mean that it would be impossible to write PostScript using a text editor.

    ObHTML: For XHTML, the media type application/xhtml+xml is specified.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Wed Oct 21 18:08:06 2020
    Jukka K. Korpela:

    Arno Welzel wrote:

    Phillip Helbig (undress to reply):

    [...]
    Is PostScript plain text?

    It can be:

    <http://paulbourke.net/dataformats/postscript/>

    That old document seems to say that PostScript is plain text, since you
    can create, edit, and read a PostScript file using a text editor. But that’s not how ”plain text” is defined in MIME:

    The simplest and most important subtype of "text" is "plain". This
    indicates plain text that does not contain any formatting commands or
    directives. Plain text is intended to be displayed "as-is", that is,
    no interpretation of embedded formatting commands, font attribute
    specifications, processing instructions, interpretation directives,
    or content markup should be necessary for proper display. T
    https://tools.ietf.org/html/rfc2046#section-4.1.3

    ObHTML: Similarly, HTML is not plain text.

    Correct - HTML has to be interpreted by a browser to get the final
    display. Nevertheless you still can also edit it with a text editor
    which does not know anything about HTML at all.

    Technically, PostScript isn’t even classified as text; the media type
    for it is application/postscript. This does not mean that it would be impossible to write PostScript using a text editor.

    ObHTML: For XHTML, the media type application/xhtml+xml is specified.

    But even application/xhtml+xml is in fact plain text which is
    *interpreted* as XHTML.

    The important point is, that the content of a file of that type can be
    read as plain text as well.


    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to Jukka K. Korpela on Wed Oct 21 18:04:43 2020
    In comp.infosystems.www.authoring.html,
    Jukka K. Korpela <jukkakk@gmail.com> wrote:
    Arno Welzel wrote:
    Phillip Helbig (undress to reply):
    Is PostScript plain text?
    It can be:

    It "can" be plain text (but is not text/plain).

    That old document seems to say that PostScript is plain text, since you
    can create, edit, and read a PostScript file using a text editor. But that’s not how ”plain text” is defined in MIME:

    Broken "smart" quotes, woo-hoo. But more seriously, the real objection
    to calling Postscript plain "text" is very often Postscript contains
    binary data. Either in 7-bit clean encoded form (eg hex, base64, or
    base85) as actual raw binary inclusions. The language makes it easy to
    say "the next 1289683 octets are data" and not worry about encoding
    the data.

    Alas I can't find an example on this computer, but I have seen actual
    JPEG files inlined in Postscript. Since Postscript is a programming
    language it is easy enough to have a program that can interpret a binary
    blob to simplify the creation of programs using raster images. Or to
    have "self extracting" compressed programs.

    For reasons like that, alone, giving Postscript an "application/" MIME
    type is quite reasonable.

    Elijah
    ------
    also there is something reasonable about calling programs "application/"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jukka K. Korpela@21:1/5 to Arno Welzel on Wed Oct 21 21:27:04 2020
    Arno Welzel wrote:

    But even application/xhtml+xml is in fact plain text which is
    *interpreted* as XHTML.

    The important point is, that the content of a file of that type can be
    read as plain text as well.

    &#x50;&#x6C;&#x65;&#x61;&#x73;&#x65;&#x20;&#x72;&#x65;&#x61;&#x64; &#x74;&#x68;&#x69;&#x73;&#x20;&#x61;&#x73;&#x20;&#x70;&#x6C;&#x61;&#x69;&#x6E; &#x74;&#x65;&#x78;&#x74;&#x2E;

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas 'PointedEars' Lahn@21:1/5 to Eli the Bearded on Wed Oct 21 20:29:21 2020
    Eli the Bearded wrote:

    In comp.infosystems.www.authoring.html,
    Thomas 'PointedEars' Lahn <cljs@PointedEars.de> wrote:
    Eli the Bearded wrote:
    The only thing that is plain text is US-ASCII, 0 to 127. Beyond that
    it's all not plain.
    Nonsense. "Plain text" means - literally - content that can be read
    by a person as opposed to "binary" data; that is, content where byte
    sequences represent characters, in particular digits and letters.

    So, by that rule, anything in RAM, on magnetic disk, on magnetic tape,
    on SSD, on DVD-R or CD-ROM, in transit over ethernet or wifi, all of
    those are _not plain text_.

    No, of course not. Not all code points of US-ASCII or Unicode represent
    digits and letters. In particular, the first 32 code points do not; they represent non-printable control characters or are left unassigned. That
    is, they represent *data*, but not necessarily *text*.

    [Ex falso quodlibet]


    PointedEars
    --
    Prototype.js was written by people who don't know javascript for people
    who don't know javascript. People who don't know javascript are not
    the best source of advice on designing systems that use javascript.
    -- Richard Cornford, cljs, <f806at$ail$1$8300dec7@news.demon.co.uk>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas 'PointedEars' Lahn@21:1/5 to Jukka K. Korpela on Wed Oct 21 20:49:26 2020
    Jukka K. Korpela wrote:

    Arno Welzel wrote:
    But even application/xhtml+xml is in fact plain text which is
    *interpreted* as XHTML.

    The important point is, that the content of a file of that type can be
    read as plain text as well.

    &#x50;&#x6C;&#x65;&#x61;&#x73;&#x65;&#x20;&#x72;&#x65;&#x61;&#x64;

    &#x74;&#x68;&#x69;&#x73;&#x20;&#x61;&#x73;&#x20;&#x70;&#x6C;&#x61;&#x69;&#x6E;
    &#x74;&#x65;&#x78;&#x74;&#x2E;

    | This definition is NOT what is commonly being used to distinguish which
    | *files* are considered plain text and “binary” *files* by software
    | developers; they use common sense instead (which arguably some people do
    | not appear to have):

    q.e.d.

    *facepalm*

    --
    PointedEars
    FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
    Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas 'PointedEars' Lahn@21:1/5 to Jukka K. Korpela on Wed Oct 21 20:48:18 2020
    Jukka K. Korpela wrote:

    Arno Welzel wrote:
    Phillip Helbig (undress to reply):
    [...]
    Is PostScript plain text?

    It can be:

    <http://paulbourke.net/dataformats/postscript/>

    That old document seems to say that PostScript is plain text, since you
    can create, edit, and read a PostScript file using a text editor. But that’s not how ”plain text” is defined in MIME:

    This definition is NOT what is commonly being used to distinguish which
    *files* are considered plain text and “binary” *files* by software developers; they use common sense instead (which arguably some people do not appear to have):

    "Plain text" *files* are *human*-readable¹, while "binary" files are not.
    I wager that further information can be found in the standards that define
    the Unix operating system as various tools standardized there are using this definition.

    Therefore, for software developers and authors who actually *write* HTML – HTML can be *written* with a *plain-text editor* like Vim, Emacs, Atom etc.;
    it does not need to be generated by a special application like graphics software – (instead of only discussing about it), HTML *is* considered a plain-text *file* format, as I explained before.

    For clarification, see also <https://en.wikipedia.org/wiki/Plain_text>

    https://tools.ietf.org/html/rfc2046#section-4.1.3

    | Updated by: 2646, 3798, 5147, 6657, 8098
    | […]
    | November 1996

    _______
    ¹ substitute the name of your favorite intelligent fully-biological species

    PointedEars
    --
    Sometimes, what you learn is wrong. If those wrong ideas are close to the
    root of the knowledge tree you build on a particular subject, pruning the
    bad branches can sometimes cause the whole tree to collapse.
    -- Mike Duffy in cljs, <news:Xns9FB6521286DB8invalidcom@94.75.214.39>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to Jukka K. Korpela on Wed Oct 21 18:53:18 2020
    In comp.infosystems.www.authoring.html,
    Jukka K. Korpela <jukkakk@gmail.com> wrote:
    &#x50;&#x6C;&#x65;&#x61;&#x73;&#x65;&#x20;&#x72;&#x65;&#x61;&#x64; &#x74;&#x68;&#x69;&#x73;&#x20;&#x61;&#x73;&#x20;&#x70;&#x6C;&#x61;&#x69;&#x6E;
    &#x74;&#x65;&#x78;&#x74;&#x2E;

    Reading it as plain text is trivial. Ampersand hash lower-case-x five
    zero semicolon. Ampersand hash lower-case-x six upper-case-C semicolon.
    Et cetera. As text/plain it leaves a lot to be desired.

    Elijah
    ------ %77%72%6f%74%65%20%61%20%43%4c%49%20%74%6f%6f%6c%20%66%6f%72%20%74%68%69%73

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jukka K. Korpela@21:1/5 to Eli the Bearded on Wed Oct 21 22:08:27 2020
    Eli the Bearded wrote:

    In comp.infosystems.www.authoring.html,
    Jukka K. Korpela <jukkakk@gmail.com> wrote:
    &#x50;&#x6C;&#x65;&#x61;&#x73;&#x65;&#x20;&#x72;&#x65;&#x61;&#x64;
    &#x74;&#x68;&#x69;&#x73;&#x20;&#x61;&#x73;&#x20;&#x70;&#x6C;&#x61;&#x69;&#x6E;
    &#x74;&#x65;&#x78;&#x74;&#x2E;

    Reading it as plain text is trivial.

    Didn’t someone quote this from the relevant RFC:
    Plain text is intended to be displayed "as-is", that is,
    no interpretation of embedded formatting commands, font attribute
    specifications, processing instructions, interpretation directives,
    or content markup should be necessary for proper display.

    Do I need to point out that it says that “no interpretation of [...]
    content markup should be necessary for proper display”?

    Are you saying tai displaying the character sequence “as-is” is proper display?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to dciwam@PointedEars.de on Wed Oct 21 19:16:27 2020
    In comp.infosystems.www.authoring.html,
    Thomas 'PointedEars' Lahn <dciwam@PointedEars.de> wrote:


    I note the lack of an attribution there[*].

    My writing:
    So, by that rule, anything in RAM, on magnetic disk, on magnetic tape,
    on SSD, on DVD-R or CD-ROM, in transit over ethernet or wifi, all of
    those are _not plain text_.

    Thomas's reply:

    No, of course not. Not all code points of US-ASCII or Unicode represent digits and letters. In particular, the first 32 code points do not; they represent non-printable control characters or are left unassigned. That
    is, they represent *data*, but not necessarily *text*.

    Control characters between 0 and 31 are either generally not used in
    output or have very well defined meanings in output. Many of them are
    used in _input_ though, eg in Unix <ctrl-c>, <ctrl-d>, <ctrl-w> are all
    things I've used today. Calling <tab> or <line feed> "non-printable"
    might be true for some narrow version of "non-printable", but that same
    narrow version of "non-printable" also holds for the 33rd entry in
    US-ASCII, U+0032, which you explicitly left out of your example.

    The lexographer Jesse Sheidlower was once asked what his favorite
    punctionation mark was: https://www.theatlantic.com/culture/archive/2012/09/writers-favorite-punctuation-marks/323287/

    I once participated in a similar exercise, and in the end I
    concluded that the humble space is the punctuation mark to beat.
    People tend to argue for the expressiveness of the semicolon, or
    the esoteric old-fashionedness of the diaeresis. But these are all
    seasonings. The meat of it is the space, and if you've ever tried
    to read manuscripts from the era before the space was regularly
    used, you'll know just how important it is. It's what gives us
    words instead of a big lump.

    ALLCAPSTEXTWITHNOWHITESPACEISPLAINTEXTANDEVENMIMETEXTPLAINBUTTHATDOESNOTMEANITISEASILYREAD

    The thirty-three codepoints between U+0000 and U+0032 (inclusive) are
    all punctuation marks of a sort, some of which never found general use.

    [Ex falso quodlibet]

    [*] This is not something I wrote, although the >> implies it was in
    my article. So perhaps the lack of attribution was deliberate?

    Elijah
    ------
    boustrophedonic inscriptions are plain text but not easily read

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas 'PointedEars' Lahn@21:1/5 to Jukka K. Korpela on Wed Oct 21 21:24:26 2020
    Jukka K. Korpela wrote:

    Lahn wrote:
    ^^^^
    Helmut Richter wrote:
    ^^^^^^^^^^^^^^
    (sic)

    You don’t *like* me, I get it. No need to point it out every time!

    What an obnoxious character :-(

    You should notice that

    <meta http-equiv="Content-Type" content="text/html;
    charset=any-code">
    (HTML before HTML5 as well)

    and

    <meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

    have different meanings. meta_http-equiv is a hint to the web server to
    declare the content type and encoding via the HTTP protocol. […]

    Not at all. How did you get that idea?

    Perhaps from the HTML specifications.

    Perhaps, but that would be a common misconception. (How common it was/is became apparent when Apache’s “AddDefaultCharset” directive had to be removed from/disabled in the default configuration, and the display errors showed up in the very same bug report because the server used to display the report was an Apache server which was still misconfigured in that way.¹)

    No non-obsolete HTML Specification specifies this. In fact, only version
    2.0 did, and it was literally made obsolete decades ago.

    The next version, HTML 3.2 of 1997, already clarified:

    ,-<https://www.w3.org/TR/2018/SPSD-html32-20180315/#meta>
    |
    | […] This can't be used to set certain HTTP headers though, see the HTTP
    | specification for details.

    Since a Web server "now" must provide at least “Content-Type: text/html” (see below) for a resource to be parsed as HTML if it is requested via HTTP,
    it is not intended for

    <meta http-equiv='Content-Type' value='text/html; charset=foo'>

    to supersede the server-specified encoding.

    It is not a job of a Web server to *interpret* the body of a HTTP message
    in order to generate a header for that HTTP message. Parsing and
    interpreting HTML, for example, is solely the domain of a HTML user
    agent.

    Instead, both HTML elements are a *substitute* – an *equivalent* – for >> the Content-Type HTTP header field, to be used by the Web _browser_, if
    that header field is not sent by the Web server.

    The various HTML Specifications make that very clear.

    ”HTTP servers may read the content of the document HEAD to generate
    header fields corresponding to any elements defining a value for the attribute HTTP-EQUIV.” https://www.w3.org/MarkUp/html-spec/html-spec_5.html#SEC5.2.5

    This was nonsense to begin with, because, as I indicated, it would require a HTTP *server* to interpret HTML (and keep up-to-date with the respective current HTML standard as well), and tacitly assume that no non-US-ASCII- compatible code sequences occur before the respective META element.

    Unicode 1.0 was introduced in 1992 already, and other character encodings
    than US-ASCII existed before, so this was a clear oversight in this specification that became an IETF standards track document (RFC 1866)
    in 1995-11.

    It is obsolete since 2000-06: <https://tools.ietf.org/html/rfc2854>

    Since that’s not how things actually worked, HTML5 specs don’t even mention the possibility of servers using <meta> tags. Neither do they prohibit such things; they don’t really deal with the operation of
    servers.

    So it is not reasonable to assume that this would work. AISB.

    The early HTML5 drafts/specs didn’t even allow <meta
    http-equiv=...> and instead used the <meta charset=...> invention,

    Questionable. Evidence?

    which was, from the beginning, meant to be handled by user agents.

    Yes.

    ______
    ¹ <https://bz.apache.org/bugzilla/show_bug.cgi?id=23421>
    --
    PointedEars
    FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
    Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Wed Oct 21 21:21:09 2020
    Jukka K. Korpela:

    Arno Welzel wrote:

    But even application/xhtml+xml is in fact plain text which is
    *interpreted* as XHTML.

    The important point is, that the content of a file of that type can be
    read as plain text as well.

    &#x50;&#x6C;&#x65;&#x61;&#x73;&#x65;&#x20;&#x72;&#x65;&#x61;&#x64; &#x74;&#x68;&#x69;&#x73;&#x20;&#x61;&#x73;&#x20;&#x70;&#x6C;&#x61;&#x69;&#x6E;
    &#x74;&#x65;&#x78;&#x74;&#x2E;

    Is this the way *you* create your XHTML files?


    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Wed Oct 21 21:22:28 2020
    Jukka K. Korpela:

    Eli the Bearded wrote:

    In comp.infosystems.www.authoring.html,
    Jukka K. Korpela <jukkakk@gmail.com> wrote:
    &#x50;&#x6C;&#x65;&#x61;&#x73;&#x65;&#x20;&#x72;&#x65;&#x61;&#x64;
    &#x74;&#x68;&#x69;&#x73;&#x20;&#x61;&#x73;&#x20;&#x70;&#x6C;&#x61;&#x69;&#x6E;
    &#x74;&#x65;&#x78;&#x74;&#x2E;

    Reading it as plain text is trivial.

    Didn’t someone quote this from the relevant RFC:
    Plain text is intended to be displayed "as-is", that is,

    Which is possible:

    Ampersand, Hash, Five, Zero, Colon...

    [...]
    Are you saying tai displaying the character sequence “as-is” is proper display?

    Yes. You did not ask for "interpret what this text means".


    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas 'PointedEars' Lahn@21:1/5 to Helmut Richter on Wed Oct 21 21:26:38 2020
    Helmut Richter wrote:

    On Tue, 20 Oct 2020, Thomas 'PointedEars' Lahn wrote:
    Helmut Richter wrote:

    You should notice that

    <meta http-equiv="Content-Type" content="text/html;
    charset=any-code">
    (HTML before HTML5 as well)

    and

    <meta charset="utf-8"> (HTML5 only, only utf-8 allowed)

    have different meanings. meta_http-equiv is a hint to the web server to
    declare the content type and encoding via the HTTP protocol. […]

    Not at all. How did you get that idea? It is not a job of a Web server
    to *interpret* the body of a HTTP message in order to generate a header
    for
    that HTTP message. Parsing and interpreting HTML, for example, is solely
    the domain of a HTML user agent.

    Thank you for repeating <huuk9tF11ncU1@mid.individual.net>. I understood
    that one as well, though.

    Go to hell.

    --
    PointedEars
    FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
    Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Wed Oct 21 21:27:13 2020
    Arno Welzel:

    Jukka K. Korpela:

    Eli the Bearded wrote:

    In comp.infosystems.www.authoring.html,
    Jukka K. Korpela <jukkakk@gmail.com> wrote:
    &#x50;&#x6C;&#x65;&#x61;&#x73;&#x65;&#x20;&#x72;&#x65;&#x61;&#x64;
    &#x74;&#x68;&#x69;&#x73;&#x20;&#x61;&#x73;&#x20;&#x70;&#x6C;&#x61;&#x69;&#x6E;
    &#x74;&#x65;&#x78;&#x74;&#x2E;

    Reading it as plain text is trivial.

    Didn’t someone quote this from the relevant RFC:
    Plain text is intended to be displayed "as-is", that is,

    Which is possible:

    Ampersand, Hash, Five, Zero, Colon...

    Well, I forgot the x after the ampersand...



    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas 'PointedEars' Lahn@21:1/5 to All on Wed Oct 21 21:36:16 2020
    A pseudonymous coward and liar trolled:

    In comp.infosystems.www.authoring.html,
    Thomas 'PointedEars' Lahn <dciwam@PointedEars.de> wrote:


    I note the lack of an attribution there[*].

    Have your eyes checked.

    My writing:
    So, by that rule, anything in RAM, on magnetic disk, on magnetic tape,
    on SSD, on DVD-R or CD-ROM, in transit over ethernet or wifi, all of
    those are _not plain text_.

    Thomas's reply:

    See below.

    No, of course not. Not all code points of US-ASCII or Unicode represent
    digits and letters. In particular, the first 32 code points do not; they
    represent non-printable control characters or are left unassigned. That
    is, they represent *data*, but not necessarily *text*.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    Control characters between 0 and 31 are either generally not used in
    output or have very well defined meanings in output.

    You don’t say. *facepalm*

    [Ex falso quodlibet]

    [*] This is not something I wrote, although the >> implies it was in
    my article. So perhaps the lack of attribution was deliberate?

    Why are you lying?

    <https://www.netmeister.org/news/learn2quote.html>

    Score adjusted

    --
    PointedEars
    FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
    Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jukka K. Korpela@21:1/5 to Arno Welzel on Wed Oct 21 22:50:06 2020
    Arno Welzel wrote:

    Ampersand, Hash, Five, Zero, Colon...

    [...]
    Are you saying tai displaying the character sequence “as-is” is proper >> display?

    Yes. You did not ask for "interpret what this text means".

    For HTML (which is what we are discussing here), “proper display” means displaying the content as defined in HTML specifications. It would inappropriate for a browser to display the tags, the character
    references, the comments, etc., “as-is”. It would mean rendering an HTML document as plain text (which it is not, by definition), refusing to do
    the job of a browser.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli the Bearded@21:1/5 to usenet@PointedEars.de on Wed Oct 21 20:07:55 2020
    In comp.infosystems.www.authoring.html,
    Thomas 'PointedEars' Lahn <usenet@PointedEars.de> wrote:
    A pseudonymous coward and liar trolled:

    Ever the classy person there.

    [Ex falso quodlibet]
    [*] This is not something I wrote, although the >> implies it was in
    my article. So perhaps the lack of attribution was deliberate?
    Why are you lying?

    $ lynx -source -dump 'news:<eli$2010201433@qaz.wtf>' |grep
    quodlibet
    $ lynx -source -dump 'news:<2173853.ElGaqSPkdT@PointedEars.de>' |grep quodlibet
    [Ex falso quodlibet]
    $

    Elijah
    ------
    still recalls <eli$1603151953@qz.little-neck.ny.us>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Arno Welzel on Wed Oct 21 14:38:48 2020
    On Wed, 21 Oct 2020 21:21:09 +0200, Arno Welzel wrote:

    Jukka K. Korpela:

    Arno Welzel wrote:

    But even application/xhtml+xml is in fact plain text which is
    *interpreted* as XHTML.

    The important point is, that the content of a file of that type can be
    read as plain text as well.

    &#x50;&#x6C;&#x65;&#x61;&#x73;&#x65;&#x20;&#x72;&#x65;&#x61;&#x64; &#x74;&#x68;&#x69;&#x73;&#x20;&#x61;&#x73;&#x20;&#x70;&#x6C;&#x61;&#x69;&#x6E;
    &#x74;&#x65;&#x78;&#x74;&#x2E;

    Is this the way *you* create your XHTML files?

    Reminds me of the old days at my college computing center, when we
    would have to key in a series of octal codes to cold boot the Univac
    1107 after repairs.

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas 'PointedEars' Lahn@21:1/5 to Eli the Bearded on Thu Oct 22 03:28:03 2020
    Eli the Bearded wrote:

    In comp.infosystems.www.authoring.html,
    Thomas 'PointedEars' Lahn <usenet@PointedEars.de> wrote:
    A pseudonymous coward and liar trolled:

    Ever the classy person there.

    At least you can’t say now that there was no proper attribution :-p

    [Ex falso quodlibet]
    [*] This is not something I wrote, although the >> implies it was in
    my article. So perhaps the lack of attribution was deliberate?
    Why are you lying?

    $ lynx -source -dump 'news:<eli$2010201433@qaz.wtf>' |grep
    quodlibet
    $ lynx -source -dump 'news:<2173853.ElGaqSPkdT@PointedEars.de>' |grep quodlibet
    [Ex falso quodlibet]
    $

    Oh honey, the buses don’t go where you live, yes?

    It was a SUMMARY of what you wrote as indicated by the BRACKETS. As you
    could have READ in “How do I quote correctly in Usenet?” which I REFERRED YOU TO.

    As you can’t be smart enough to understand Latin, there is the translation:

    <https://en.wikipedia.org/wiki/Principle_of_explosion>

    *facepalm*

    --
    PointedEars
    FAQ: <http://PointedEars.de/faq> | <http://PointedEars.de/es-matrix> <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
    Twitter: @PointedEars2 | Please do not cc me./Bitte keine Kopien per E-Mail.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Thu Oct 22 10:19:37 2020
    Jukka K. Korpela:

    Arno Welzel wrote:

    Ampersand, Hash, Five, Zero, Colon...

    [...]
    Are you saying tai displaying the character sequence “as-is” is proper >>> display?

    Yes. You did not ask for "interpret what this text means".

    For HTML (which is what we are discussing here), “proper display” means

    "proper display" is not required to read something as plain text.

    You can even print this on a sheet of paper and give it to someone to
    type it in and you ge the the same file again which can again be
    displayed using a web browser.

    Try this with a PNG image or a MP3 file.


    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Arno Welzel on Thu Oct 22 10:08:15 2020
    On Thu, 22 Oct 2020 10:19:37 +0200, Arno Welzel wrote:

    Jukka K. Korpela:

    Arno Welzel wrote:

    Ampersand, Hash, Five, Zero, Colon...

    [...]
    Are you saying tai displaying the character sequence ?as-is? is proper >>> display?

    Yes. You did not ask for "interpret what this text means".

    For HTML (which is what we are discussing here), ?proper display? means

    "proper display" is not required to read something as plain text.

    You can even print this on a sheet of paper and give it to someone to
    type it in and you ge the the same file again which can again be
    displayed using a web browser.

    Try this with a PNG image or a MP3 file.

    I think the two of you are actually using different terminology. To
    Arno, and to me, "plain text" is not something with no codes in it,
    it's something where a "text editor" can see all the characters.

    I think Jukka is equating plain text" to type="text/plain". I won't
    say that's wrong, but it's not the only interpretation.

    --
    Stan Brown, Tehachapi, California, USA
    https://BrownMath.com/
    https://OakRoadSystems.com/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jukka K. Korpela@21:1/5 to Stan Brown on Thu Oct 22 21:09:04 2020
    Stan Brown wrote:

    I think the two of you are actually using different terminology. To
    Arno, and to me, "plain text" is not something with no codes in it,
    it's something where a "text editor" can see all the characters.

    I think Jukka is equating plain text" to type="text/plain". I won't
    say that's wrong, but it's not the only interpretation.

    It is the definition given in the RFC for MIME types (media types), so I
    would argue that when discussing e.t. whether HTML is plain text, it is
    the correct definition.

    You are confusing plain text, subtype text/plain, with the broader
    concept of text, major type text. Note that HTML is labelled and served
    as text/html (unless an application type not used), specifically
    distinguishing HTML text from other types of text, such as plain text or
    Rich Text Format (text/rtf).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Phillip Helbig (undress to reply@21:1/5 to jukkakk@gmail.com on Thu Oct 22 18:05:48 2020
    In article <rmq3df$ddn$1@dont-email.me>, "Jukka K. Korpela"
    <jukkakk@gmail.com> writes:

    For HTML (which is what we are discussing here), proper display means displaying the content as defined in HTML specifications. It would inappropriate for a browser to display the tags, the character
    references, the comments, etc., as-is. It would mean rendering an HTML document as plain text (which it is not, by definition), refusing to do
    the job of a browser.

    Jukka knows his stuff! Just today I came across
    jkorpela.fi/forms/file.html and from there to a lot of really
    interesting stuff concerning HTML, the web, character encodings, and so
    on.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)