• [inn] Header folded immediately after header name results in "invalid s

    From Adam W.@21:1/5 to All on Tue Jun 6 19:49:53 2023
    Hi,

    inn version 2.7.0.

    Let's try to post this message (minimal example):

    #v+
    From: test@test.test
    Newsgroups: alt.test
    Subject:
    test

    test
    #v-

    The Subject header body is folded here, but immediately after the header
    name. The real life example is when Message-ID body is too long to fit in
    the same line with the header name (and that's why I detected it and
    started digging).

    When trying to post it, I'm getting message:

    441 Invalid syntax encountered in Subject header field body (unexpected byte or empty content line)

    The check is done in nnrpd/post.c. Function IsValidHeaderBody is called
    (from libinn, headers.c), which does some checks on the header to check
    if it's sane.

    When calling this function with an argument "\n test", this check is
    triggered:

    /* Folding detected. We expect CRLF or lone LF as some parts
    * of INN code internally remove CR.
    * Check that the line that has just been processed is not
    * "empty" and that the following character marks the beginning
    * of a continuation line. */
    if (emptycontentline || !ISWHITE(p[1])) {
    printf("false1 p[1]=%d\n", p[1]);
    return false;
    }

    I believe this test is wrong, as emptycontentline will always be false if
    the header body (value) is folded just after the header name. It's been
    there since 2016, so I doubt I'm the only one who encountered this
    error... or maybe that's not an error and this header is in fact wrong
    and should not be folded (and it's a bug in formail then)?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Urs =?UTF-8?Q?Jan=C3=9Fen?=@21:1/5 to Adam W. on Tue Jun 6 22:04:01 2023
    Adam W. wrote:
    The Subject header body is folded here, but immediately after the header name. The real life example is when Message-ID body is too long to fit in

    [RFC 5536 2.2]
    | o Every line of a header field body (including the first and any
    | that are subsequently folded) MUST contain at least one non-
    | whitespace character.
    |
    | NOTE: This means that no header field body defined by or
    | referenced by this document can be empty. As a result, rather
    | than using the <unstructured> syntax from Section 3.2.5 of
    | [RFC5322], this document uses a stricter definition:
    |
    | unstructured = *WSP VCHAR *( [FWS] VCHAR ) *WSP
    |
    | NOTE: The [RFC5322] specification sometimes uses [FWS] at the
    | beginning or end of ABNF describing header field content. This
    | specification uses *WSP in such cases, also in cases where this
    | specification redefines constructs from [RFC5322]. This is
    | done for consistency with the restriction described here, but
    | the restriction applies to all header fields, not just those
    | where ABNF is defined in this document.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Adam W. on Tue Jun 6 17:03:23 2023
    gof-cut-this-news@cut-this-chmurka.net.invalid (Adam W.) writes:

    The real life example is when Message-ID body is too long to fit in the
    same line with the header name (and that's why I detected it and started digging).

    Urs posted the correct standards reference. I just wanted to add that you should never fold the Message-ID for Netnews articles. (Which is a
    somewhat obvious consequence of the standards rule about having some non-whitespace characters on the header field line, but is worth
    reiterating separately.)

    Header field lines do not have to be folded unless they contain RFC 2047 encoding (which the Message-ID header field may not), and may be as long
    as 998 octets, and the maximum length of an FQDN is 254 octets if I
    remember correctly. That leaves plenty of space. News software does not expect a folded Message-ID and I suspect INN is far from the only piece of software that would break. It may not look aesthetically pleasing to have
    a long line, but the standard effectively requires it.

    I would, in general, never fold Path, Message-ID, or Date, since these are critical fields for the Netnews protocol and a lot of existing software
    isn't going to handle it correctly. I think there are provisions in the standard for folding Path (I don't recall whether we restricted the Date
    syntax to disallow folding), but if you want to maximize how interoperable
    the articles are, it's best to avoid those provisions.

    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to urs@buil.tin.org on Tue Jun 6 22:39:43 2023
    Urs Janßen <urs@buil.tin.org> writes:
    Russ Allbery wrote:

    Header field lines do not have to be folded unless they contain RFC 2047
    encoding (which the Message-ID header field may not), and may be as long
    as 998 octets, and the maximum length of an FQDN is 254 octets if I
    remember correctly. That leaves plenty of space. News software does not
    expect a folded Message-ID and I suspect INN is far from the only piece of

    It's even more restrictive (RFC 5536 3.1.3)

    | msg-id = "<" msg-id-core ">"
    | ; maximum length is 250 octets
    | msg-id-core = id-left "@" id-right
    [...]
    | The <msg-id> MUST NOT be more than 250 octets in length.

    Oh, right! I had completely forgotten that we limited the length of the message ID.

    Unfortunately (one of the changes from RFC 1036 which I did not like as
    some programs may be not aware of this) Path can "now" be folded (RFC
    5536 3.1.5):

    | path = "Path:" SP *WSP path-list tail-entry *WSP CRLF
    | path-list = *( path-identity [FWS] [path-diagnostic] "!" )

    Yeah, in retrospect that was probably a mistake.

    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Urs =?UTF-8?Q?Jan=C3=9Fen?=@21:1/5 to Russ Allbery on Wed Jun 7 05:35:40 2023
    Russ Allbery wrote:
    Header field lines do not have to be folded unless they contain RFC 2047 encoding (which the Message-ID header field may not), and may be as long
    as 998 octets, and the maximum length of an FQDN is 254 octets if I
    remember correctly. That leaves plenty of space. News software does not expect a folded Message-ID and I suspect INN is far from the only piece of

    It's even more restrictive (RFC 5536 3.1.3)

    | msg-id = "<" msg-id-core ">"
    | ; maximum length is 250 octets
    | msg-id-core = id-left "@" id-right
    [...]
    | The <msg-id> MUST NOT be more than 250 octets in length.
    |
    | NOTE: The length restriction ensures that systems that accept
    | message identifiers as a parameter when referencing an article
    | (e.g., [RFC3977]) can rely on a bounded length.
    |
    | Observe that <msg-id> includes the < and >.

    software that would break. It may not look aesthetically pleasing to have
    a long line, but the standard effectively requires it.

    I would, in general, never fold Path, Message-ID, or Date, since these are critical fields for the Netnews protocol and a lot of existing software
    isn't going to handle it correctly. I think there are provisions in the standard for folding Path (I don't recall whether we restricted the Date syntax to disallow folding), but if you want to maximize how interoperable the articles are, it's best to avoid those provisions.

    Unfortunately (one of the changes from RFC 1036 which I did not like as
    some programs may be not aware of this) Path can "now" be folded (RFC 5536 3.1.5):

    | path = "Path:" SP *WSP path-list tail-entry *WSP CRLF
    | path-list = *( path-identity [FWS] [path-diagnostic] "!" )

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed Jun 7 08:40:41 2023
    Hi Urs and Russ,

    Unfortunately (one of the changes from RFC 1036 which I did not like as
    some programs may be not aware of this) Path can "now" be folded (RFC
    5536 3.1.5):

    Does RFC 1036 really say that Path cannot be folded?
    I do not manage to find the reference.

    2. Message Format
    The Internet convention of continuation header lines (beginning with
    a blank or tab) is allowed.

    2.1.6. Path

    The names may be
    separated by any punctuation character or characters (except "."
    which is considered part of the hostname). Thus, the following are
    valid entries:

    cbosgd!mhuxj!mhuxt
    cbosgd, mhuxj, mhuxt
    @cbosgd.ATT.COM,@mhuxj.ATT.COM,@mhuxt.ATT.COM
    teklabs, zehntel, sri-unix@cca!decvax

    However, the existing convention of placing the
    host name and an "!" at the front of the path, and of starting the
    path with the host name, an "!", and the user name, should be
    maintained when possible.




    | path = "Path:" SP *WSP path-list tail-entry *WSP CRLF
    | path-list = *( path-identity [FWS] [path-diagnostic] "!" )

    Yeah, in retrospect that was probably a mistake.


    FWIW, "Son of 1036" (RFC 1849) was already saying in 1994:

    5.6. Path

    NOTE: This syntax has the disadvantage of containing no white
    space, making it impossible to continue a Path header across
    several lines. Implementors of relayers and reading agents are
    warned that it is intended that the successor to this Draft will
    change the definition of path delimiter to:

    path-delimiter = "!" [ space ]

    and are urged to fix their software to handle (i.e., ignore) white
    space following the exclamation points. They are urged to hurry;
    some ill-behaved systems reportedly already feel free to add such
    white space.

    NOTE: [RFC1036] allows considerably more flexibility in choice of
    delimiter, in theory, but this flexibility has never been used,
    and most news software does not implement it properly. The
    grammar reflects the current reality. Note, in particular, that
    [RFC1036] treats "_" as a delimiter, but in fact it is known to
    appear in relayer names occasionally.



    So I believe RFC 5536 was just documenting what was going on with "!"
    and spaces, and naturally folding on whitespace.

    --
    Julien ÉLIE

    « Ça n'a été qu'un coup de glaive dans l'eau. » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Urs =?UTF-8?Q?Jan=C3=9Fen?=@21:1/5 to All on Wed Jun 7 07:10:44 2023
    Julien ÉLIE wrote:
    Hi Urs and Russ,
    Does RFC 1036 really say that Path cannot be folded?

    Sorry, in fact it was RFC 1849 (5.6.) not RFC 1036 as you already found
    out.

    So I believe RFC 5536 was just documenting what was going on with "!"
    and spaces, and naturally folding on whitespace.

    I still think allowing FWS in Path is/was more asking for trouble than
    needed.
    I know that a lot of older software had/has issues with FWS in
    general and 992 bytes is plenty of space for Path, esp. if you avoid the
    often quite useless verification info.
    The same goes for at least Date, Newsgroups and Followup-To.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Urs =?UTF-8?Q?Jan=C3=9Fen?=@21:1/5 to All on Wed Jun 7 07:30:22 2023
    Julien ÉLIE wrote:
    Hi Urs and Russ,
    Does RFC 1036 really say that Path cannot be folded?

    Sorry, in fact it was RFC 1849 (5.6.) not RFC 1036 as you already found
    out.

    In fact I had son of RFC 1036 in mind, de facto standard between around
    1992 and 2010 wgen it became RFC 1849 (just to document the current state).

    So I believe RFC 5536 was just documenting what was going on with "!"
    and spaces, and naturally folding on whitespace.

    I still think allowing FWS in Path is/was more asking for trouble than
    needed.
    I know that a lot of older software had/has issues with FWS in
    general and 992 bytes is plenty of space for Path, esp. if you avoid the
    often quite useless verification info.
    The same goes for at least Date, Newsgroups and Followup-To.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed Jun 7 10:21:24 2023
    Hi Urs,

    I still think allowing FWS in Path is/was more asking for trouble than needed.
    I know that a lot of older software had/has issues with FWS in
    general and 992 bytes is plenty of space for Path, esp. if you avoid the often quite useless verification info.

    Yes I understand the point and the change from Son of 1036.

    If the Path header field does not allow FWS, the problem that would have
    been to delt with in RFC 5536/5537 is Path trimming. As an FQDN can
    have about 254 characters, and path diagnostics can also add a MISMATCH
    with an additional FQDN, and a server can add several path identities...
    in theory we could reach the maximum allowed length in only 1 hop!
    Or 3 hops if not taking into account path diagnostics and several path identities.

    It would have been something to explain, limit and standardize.
    Instead of using the already existing FWS mechanism specifically
    designed to treat such folding in large header fields.

    Well, we cannot do much about that now.
    Maybe to add more interoperability coverage, adding an RFC erratum to
    say that Path folding SHOULD be avoided when possible, and SHOULD occur
    only when the addition of a path identity or diagnostic would make the
    header field larger than 998 bytes?



    The same goes for at least Date, Newsgroups and Followup-To.

    As for Date, the CFWS is only at the end of the field, so normally the
    parsing should be fine in only the first line:

    date-time = [ day-of-week "," ] date time [CFWS]
    day-of-week = ([FWS] day-name) / obs-day-of-week

    NOTE: The [RFC5322] specification sometimes uses [FWS] at the
    beginning or end of ABNF describing header field content. This
    specification uses *WSP in such cases, also in cases where this
    specification redefines constructs from [RFC5322].


    Maybe the suggested RFC erratum about Path folding could include these 3 additional fields.

    --
    Julien ÉLIE

    « Le tennis c'est comme le ping-pong, sauf qu'au tennis, les joueurs
    sont debout sur la table. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Urs =?UTF-8?Q?Jan=C3=9Fen?=@21:1/5 to All on Wed Jun 7 09:53:20 2023
    Julien ÉLIE wrote:
    If the Path header field does not allow FWS, the problem that would have
    been to delt with in RFC 5536/5537 is Path trimming. As an FQDN can
    have about 254 characters, and path diagnostics can also add a MISMATCH
    with an additional FQDN, and a server can add several path identities...

    I am not a fan of adding such diagnoses to the Path as they cannot be
    trusted anywhere except on the hop that adds them.

    Several path identities might be useful in large scale setups but that could/should be limited to 2-3.

    Well, we cannot do much about that now.

    Fortunately, FWS in Path does not really appear (at least in non binarie usenet).

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Adam W.@21:1/5 to Russ Allbery on Wed Jun 7 10:02:33 2023
    Russ Allbery <eagle@eyrie.org> wrote:

    but if you want to maximize how interoperable the articles are,

    Actually, in this specific case, I don't :)

    I'm feeding some mailing lists into my own inn to read with my own tin (I
    don't like mailing lists, I think newsgroups are much more suited for this purpose). As long as both inn and tin accept it, I'm fine with that.

    Back to the Message-ID. It seems that it's folded by some Microsoft tools
    (what a surprise) -- some kind of online Outlook or something. Here's the sample Message-ID:

    Message-ID:
    <YT3PR01MB6374D9DE3A7128AF7DE72EC1A253A@YT3PR01MB6374.CANPRD01.PROD.OUTLOOK.COM>

    It came in this format from a group on groups.io (a mailing groups
    server), formail didn't unfold it (I'm using formail to add and remove
    some headers before feeding article to rpost) and it was rejected by my
    inn.

    Actually, there are more headers formatted this way, but they're not
    needed (for example x-ms-exchange-antispam-messagedata-0).

    I'm wondering if there are some tools to unfold it (to reformat headers, because sometimes they should be folded), or should I reinvent the wheel
    and write something to do it... are (any of) you aware of such tools?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Adam W. on Wed Jun 7 08:23:54 2023
    gof-cut-this-news@cut-this-chmurka.net.invalid (Adam W.) writes:

    Back to the Message-ID. It seems that it's folded by some Microsoft
    tools (what a surprise) -- some kind of online Outlook or
    something. Here's the sample Message-ID:

    Message-ID:
    <YT3PR01MB6374D9DE3A7128AF7DE72EC1A253A@YT3PR01MB6374.CANPRD01.PROD.OUTLOOK.COM>

    Oh, sigh, now I understand your problem.

    Yeah, alas, you're going to have to add custom processing to unfold those
    for INN, which is really annoying. Email messages often play fast and
    loose with message IDs and do all sorts of weird things with them (I saw
    one mail server that reused the same message ID for every message it sent
    out) because they're much less important in email than in Usenet.

    I'm wondering if there are some tools to unfold it (to reformat headers, because sometimes they should be folded), or should I reinvent the wheel
    and write something to do it... are (any of) you aware of such tools?

    I don't know of any. When I was doing this sort of thing, I wrote my own
    mail to news gatewaying in Perl. (News::Gateway is still on CPAN, but I'm waiting for it to be 30 years old before I try to update it, apparently.)

    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)