INN (and perhaps other servers) has the possibility to provide keywords
I'm wondering whether:
- it shouldn't be advertised as ":keywords" instead of "Keywords:full"
as the header field is not in the original article.
I'm wondering whether:
- it shouldn't be advertised as ":keywords" instead of "Keywords:full"
as the header field is not in the original article.
Article "metadata" is data about articles that does not occur within the article itself, so I will go for ":keywords".
BUT only two metadata items are defined in RFC 3977, ":lines" and
":bytes". RFC 3977 say "To avoid the risk of a clash with a future
registered extension, the names of METADATA items defined by private extensions SHOULD begin with ":x-".
So, perhalps it's better to name it ":x-keywords"?
So, I'll go for ":x-keywords" MAY be instead "Keywords"
For OVER, I think the value of this metadata item SOULD consist of the metadata name, a single space, and then the value ; as explained in RFC
3977 : "For all subsequent fields that contain headers, the content MUST
be the entire header line other than the trailing CRLF. For all
subsequent fields that contain metadata, the field consists of the
metadata name, a single space, and then the value.)
So, perhalps it's better to name it ":x-keywords"?
Since RFC 3977, there has been RFC 6648 which deprecates the use of "X-" prefix and similar constructs in application protocols. That's why I
did not propose that name but directly ":keywords".
I am half-tempted to advertise ":keywords" instead of Keywords in the
next release so as to comply with the protocol (the keywords are not
present in the article itself), and properly handle "HDR :keywords" vs
"HDR Keywords" results, the same way "HDR Lines" return the real header
field if present.
I am half-tempted to advertise ":keywords" instead of Keywords in the
next release so as to comply with the protocol (the keywords are not
present in the article itself), and properly handle "HDR :keywords" vs
"HDR Keywords" results, the same way "HDR Lines" return the real
header field if present.
I think it's the right choice even if I don't see how this header can be useful in any way (because the words are totally unusable).
Perhaps it would be better to encode the words rather than remove the non-ASCII characters?
INN (and perhaps other servers) has the possibility to provide keywords
in overview data. It advertises "Keywords:full" in response to LIST OVERVIEW.FMT and then adds "Keywords: a,b,c,d" in OVER responses. No Keywords header field is added in the articles, and the contents of an existing one is kept at the beginning of the generated one in overview.
I'm wondering whether:
- it shouldn't be advertised as ":keywords" instead of "Keywords:full" as
the header field is not in the original article.
I am unsure though if such a change would break implementations that look
for it in overview (but is there any such news client? ...)
I'm wondering whether:
- it shouldn't be advertised as ":keywords" instead of "Keywords:full" as
the header field is not in the original article.
I believe that's correct. Keywords:full would imply that it's a copy of a header in the article named Keywords.
Astonishingly, we don't seem to have set up an IANA registry for metadata names in LIST OVERVIEW.FMT, which would have been the normal way of doing
it, so I think we can just use :keywords without telling anybody.
It's kind of an interesting idea, but text tokenization is a lot more complicated than that code, as you're discovering with its total lack of understanding of anything other than English. If the body is
base64-encoded (or even quoated-printable), I suspect it will similarly collapse like a house of cards, since I doubt it understands MIME
structure. And let's not even mention trying to tokenize languages that
are farther afield from English.
I'm honestly not sure it's worth the effort of trying to fix, although of course now that we've talked about it someone will probably wonder if it
will solve their problems and experiment with it again. :)
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 71:26:07 |
Calls: | 6,712 |
Files: | 12,244 |
Messages: | 5,356,970 |