• Relay : List of headers that should appear only once in an article?

    From Franck@21:1/5 to All on Sun May 15 08:59:27 2022
    Hello,

    When INJECTING an article, some headers should not appear and others
    should appear only once. I do the checks in a way similar to INN
    (nnrp/post.c) but with some additions, such as the format expected by
    the header value.

    When RELAYING an article, I am currently only testing the unicity of a
    few headers that are useful for the relay but I would like to do better.

    Is there an exhaustive list of headers that should only appear once that
    I could test when receiving an article in a server to server exchange?

    Does the declaration of headers in innd/innd.c match this requirement?


    Thanks for your help.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Henning Hucke@21:1/5 to Franck on Sun May 15 12:47:12 2022
    On 2022-05-15, Franck <my@mail.is.invalid> wrote:

    Hey Franck,

    [...]
    Is there an exhaustive list of headers that should only appear once that
    I could test when receiving an article in a server to server exchange?
    [...]

    you (already?) know about RFCs (https://www.rfc-editor.org/) 5536 and
    possibly 3977?

    Best regards,
    Henning
    --
    Applause, n:
    The echo of a platitude from the mouth of a fool.
    -- Ambrose Bierce

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Franck on Sun May 15 09:31:51 2022
    Franck <my@mail.is.invalid> writes:

    When RELAYING an article, I am currently only testing the unicity of a
    few headers that are useful for the relay but I would like to do better.

    Is there an exhaustive list of headers that should only appear once that
    I could test when receiving an article in a server to server exchange?

    It is to some degree up to you. Relaying agents are allowed to be strict
    about RFC 5536 compliance, or they're allowed to relay messages that don't follow the standard but can still be processed.

    I think it's generally better for Usenet to be fairly strict, but the
    minimum is probably the headers that have direct protocol effect and which therefore make the article ambiguous in ways that affect relaying if there
    are duplicate headers, namely:

    Control
    Date
    Distribution
    Injection-Date
    Message-ID
    Newsgroups
    Path
    Supersedes

    Not directly affecting relaying, but still probably nonsensical if
    duplicated and thus probably worth rejecting because readers or serving
    agents won't be able to make sense of the article if they're duplicated,
    are:

    Archive
    Content-Transfer-Encoding
    Content-Type
    Expires
    Followup-To
    Injection-Info
    MIME-Version
    References

    RFC 5536 also requires all of the following headers occur at most once:

    Approved
    Lines
    Organization
    Summary
    User-Agent
    Xref

    but duplication is unlikely to cause major practical problems (although
    some servers may honor Lines instead of count lines for themselves and get confused, and duplicated Xref headers would cause serious problems when
    copying article numbers from another server).

    I may be forgetting some of the MIME headers since I didn't refresh my
    memory from the relevant RFCs.

    In some cases it's arguable that the article is still sensible if the
    header is duplicated but both copies of the header have the same value
    (the most common duplication error in my experience), but it's still technically invalid to duplicate them.

    The list in innd/innd.c controls what headers are exposed to posting
    filters, so is sort of absurdly long and includes all kinds of things that probably aren't directly relevant.

    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Franck@21:1/5 to All on Sun May 15 20:54:30 2022
    Hi Russ,

    Thanks for your explicit reply.

    It is to some degree up to you. Relaying agents are allowed to be strict about RFC 5536 compliance, or they're allowed to relay messages that don't follow the standard but can still be processed.

    For the moment, I think to be strict with 5536.

    Control
    Date
    Distribution
    Injection-Date
    Message-ID
    Newsgroups
    Path
    Supersedes

    Checked to be unique.

    Archive
    Content-Transfer-Encoding
    Content-Type
    Expires
    Followup-To
    Injection-Info
    MIME-Version
    References

    Checked to be unique.

    Approved
    Lines
    Organization
    Summary
    User-Agent
    Xref

    Checked to be unique.


    What about :

    - From
    - Subject

    - Archived-At
    - Bcc
    - Cc
    - Keywords
    - Reply-To
    - Sender
    - To

    - Also-Control
    - Article-Name
    - Article-Updates
    - Date-Received
    - Nntp-Posting-Date
    - Nntp-Posting-Host
    - Posting-Version
    - Relay-Version
    - See-Also
    - X-Complaints-To
    - X-Trace

    - Cancel-Key
    - Cancel-Lock

    PS : My software do not conform to 8315 for the moment.

    but duplication is unlikely to cause major practical problems (although
    some servers may honor Lines instead of count lines for themselves and get confused, and duplicated Xref headers would cause serious problems when copying article numbers from another server).

    My software count lines for itself.

    I may be forgetting some of the MIME headers since I didn't refresh my
    memory from the relevant RFCs.

    :-) I need to refresh mine also!

    In some cases it's arguable that the article is still sensible if the
    header is duplicated but both copies of the header have the same value
    (the most common duplication error in my experience), but it's still technically invalid to duplicate them.

    Added to my to-list :-)

    The list in innd/innd.c controls what headers are exposed to posting
    filters, so is sort of absurdly long and includes all kinds of things that probably aren't directly relevant.

    Ok, thanks for the explanations.

    I do not manage filters with calls to perl or python but they can be set
    for injection and/or relay (Some sort of postfilter and cleanfeed) and I
    will use this (long) list to initialize a combo in the GUI part of the software.

    I implemented filters to be "fixed" or "user configurable".

    Fixed ones are used to let the software be RFC/Draft compliant and
    reject articles, like :

    - "POSTED" in 'Path' => Article already seen.
    - "World" in "Distribution",
    - Number of occurences (21 message-ids max) in "References",
    - Number of occurences of terms in "Injection-Info",
    - Dates way to far in the past/future.

    And so on...

    Or to reject article based on a configuration option, like :

    - FQDN/Path Entry in 'Path' => Article already seen.
    - Max number of "Newsgroups", "Followup-to".

    And so on...

    Filters are also available to check the body of the article.

    Configurable filters will be added ONLY when I'll code the GUI part of
    the software.


    Again,
    Thanks for your help.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Franck on Sun May 15 12:42:11 2022
    Franck <my@mail.is.invalid> writes:

    What about :

    - From
    - Subject

    Oh, sorry, yes, I knew I forgot something. Those should go into the
    "readers are going to have a hard time understanding the message"
    category of rejecting duplicates.

    - Archived-At
    - Bcc
    - Cc
    - Keywords
    - Reply-To
    - Sender
    - To

    Reply-To is probably in the From and Subject territory.

    None of the rest are Usenet headers, so I think this is just a question of whether you want to generally impose a uniqueness rule on email headers
    that happen to show up in Usenet articles. I think that's a reasonable
    thing to do if you want, but you'll be dropping articles that will be
    readable and not pose any practical problems.

    (A Bcc header showing up in a Usenet post indicates that someone or
    someone's software was very confused about how everything works.)

    - Nntp-Posting-Date
    - Nntp-Posting-Host
    - X-Complaints-To
    - X-Trace

    These are all obsolete trace headers that should be replaced by
    Injection-Info, but which are still in use in the wild. They're all informational headers that don't carry any protocol meaning, so this is a
    bit like the email header case. Maybe a slightly stronger argument for uniqueness since they are relevant to abuse situations.

    - Also-Control

    I'm fairly sure that this is thoroughly obsolete and not honored by
    anything, although maybe I would be surprised. If it is honored, you'd
    want it to be unique, but really nothing should honor it.

    - Article-Name
    - Article-Updates
    - Date-Received
    - Posting-Version
    - Relay-Version
    - See-Also

    I think these are all just random headers that nothing uses. (Some of
    them, like Relay-Version, are just so incredibly obsolete that nothing
    will use them any more.)

    - Cancel-Key
    - Cancel-Lock

    Oh, good call, RFC 8315 says these MUST NOT appear more than once, so may
    as well reject messages with more than one of them as well.

    In some cases it's arguable that the article is still sensible if the
    header is duplicated but both copies of the header have the same value
    (the most common duplication error in my experience), but it's still
    technically invalid to duplicate them.

    Added to my to-list :-)

    Note that for relaying you can't fix this (by dropping one of the
    duplicates) since that's an unpermitted alteration of the article. You
    have to either accept it or reject it.

    On injection, it may not be a bad idea to toss duplicate header fields
    that have the same content before rejecting articles that still have duplicates. It's just a bit friendlier to broken posting agents; whether
    you want to be friendly to such things is a bit of a judgment call.

    I do not manage filters with calls to perl or python but they can be set
    for injection and/or relay (Some sort of postfilter and cleanfeed) and I
    will use this (long) list to initialize a combo in the GUI part of the software.

    I'm not sure I would. I don't think having a long list of headers that
    you care about was the right design for INN's filters. It's just hard to
    fix now.

    I'd be more inclined to populate a dropdown with protocol headers that
    people are likely to care about and then let people type in the names of additional headers if they care.

    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Franck@21:1/5 to All on Mon May 16 17:52:57 2022
    Hi Russ,

    Thank you for the reply, I have taken into account almost all of your
    remarks concerning the headers.


    Added to my to-list :-)

    Note that for relaying you can't fix this (by dropping one of the
    duplicates) since that's an unpermitted alteration of the article. You
    have to either accept it or reject it.

    Of course.

    On injection, it may not be a bad idea to toss duplicate header fields
    that have the same content before rejecting articles that still have duplicates. It's just a bit friendlier to broken posting agents; whether
    you want to be friendly to such things is a bit of a judgment call.

    Considered but not ranked #1 on my todo-list.

    I do not manage filters with calls to perl or python but they can be set
    for injection and/or relay (Some sort of postfilter and cleanfeed) and I
    will use this (long) list to initialize a combo in the GUI part of the
    software.

    I'm not sure I would. I don't think having a long list of headers that
    you care about was the right design for INN's filters. It's just hard to
    fix now.

    I'd be more inclined to populate a dropdown with protocol headers that
    people are likely to care about and then let people type in the names of additional headers if they care.

    Sold.
    I'll use "netnews" headers (https://www.iana.org/assignments/message-headers/message-headers.xhtml)
    to populate the dropdown.

    Thanks for the help.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Mon May 16 20:49:54 2022
    Hi Franck and Russ,

    When INJECTING an article, some headers should not appear and
    others should appear only once. I do the checks in a way similar
    to INN (nnrp/post.c)
    Incidentally, I see that I once wrote at the end of the header table:

    /* The Comments and Original-Sender header fields can appear more than
    once
    * in the headers of an article. Consequently, we MUST NOT put them here.
    */

    So all the header fields listed there are supposed to appear only once
    (and is tested by StripOffHeaders()).



    - Article-Name

    It is "Article-Names" (with an "s"). I mention it in case you
    implemented it without the "s".



    I do not manage filters with calls to perl or python but they can be set
    for injection and/or relay (Some sort of postfilter and cleanfeed) and I
    will use this (long) list to initialize a combo in the GUI part of the
    software.

    I'm not sure I would. I don't think having a long list of headers that
    you care about was the right design for INN's filters. It's just hard to
    fix now.

    Indeed!

    https://github.com/InterNetNews/inn/issues/73
    "Provide the entire article headers to innd filters, probably in a
    special key in the hash similar to the BODY key."

    This would have saved the need to manually add useful header fields.
    At least, there aren't new ones every year to add! Hopefully it has
    been a long time since the last addition (following a user request).


    Amongst other header fields not mentioned in this thread, I would check
    that X-PGP-Key and X-PGP-Sig are unique.

    --
    Julien ÉLIE

    « Lots of people want to ride with you in the limo, but what you want is
    someone who will take the bus with you when the limo breaks down. »
    (Oprah Winfrey)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Franck@21:1/5 to All on Tue May 17 07:26:33 2022
    Hi Julien,

     /* The Comments and Original-Sender header fields can appear more than once
      * in the headers of an article.  Consequently, we MUST NOT put them here.
      */

    So all the header fields listed there are supposed to appear only once
    (and is tested by StripOffHeaders()).

    Since this table is part of the "nnrp.c" code, I assume that it is only
    used for injection (POST) not for transit (IHAVE)? No?

    I just want to offer the possibility (configurable) of rejecting
    articles received from a feed with duplicate headers that don't really
    make sense. Otherwise only primordial headers will be tested to be unique.

    - Article-Name

    It is "Article-Names" (with an "s").  I mention it in case you
    implemented it without the "s".

    I checked for safety but it's just a typo in the message, it's
    implemented as "Article-Names".

      https://github.com/InterNetNews/inn/issues/73
    "Provide the entire article headers to innd filters, probably in a
    special key in the hash similar to the BODY key."

    This would have saved the need to manually add useful header fields.
    At least, there aren't new ones every year to add!  Hopefully it has
    been a long time since the last addition (following a user request).

    My implementation allows to check the headers one by one EXCEPT if an
    error has already been detected (since the article will obviously be
    rejected).

    Moreover, all the checks (filters or not) are only performed if the
    article is not already in error, in order to avoid wasting system's
    resources.

    Amongst other header fields not mentioned in this thread, I would check
    that X-PGP-Key and X-PGP-Sig are unique.

    Thanks to mention them, will be added to be unique for transit.

    Have nice day,
    Franck

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Tue May 17 21:01:59 2022
    Bonsoir Franck,

      /* The Comments and Original-Sender header fields can appear more
    than once
       * in the headers of an article.  Consequently, we MUST NOT put them
    here.
       */

    So all the header fields listed there are supposed to appear only once
    (and is tested by StripOffHeaders()).

    Since this table is part of the "nnrp.c" code, I assume that it is only
    used for injection (POST) not for transit (IHAVE)? No?

    Yes, that's right. This is enforced at injection time.


    I just want to offer the possibility (configurable) of rejecting
    articles received from a feed with duplicate headers that don't really
    make sense. Otherwise only primordial headers will be tested to be unique.

    Ah, OK. I thought you were looking for header fields that were supposed
    to be unique (as you answered "Checked to be unique" in previous
    discussions).
    If you are looking for the primordial header fields that could be
    harmful to other software, this is indeed not that list from nnrpd; the discussion in this thread answered that.


    Amongst other header fields not mentioned in this thread, I would
    check that X-PGP-Key and X-PGP-Sig are unique.

    Thanks to mention them, will be added to be unique for transit.

    You're welcome.

    --
    Julien ÉLIE

    « – C'est une bonne situation ça, scribe ?
    – Oh, c'est une situation assise. » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)