• Renumber newsgroups by post date

    From Eli@21:1/5 to All on Sun Apr 23 15:06:43 2023
    How can I best renumber the article numbers of all newsgroups, sorted by post date.

    Julien's FAQ states that sorting the history file can be done with the following command:

    sort -t '~' -k3n <history > history.sorted

    But it is probably not enough to replace the history file with the sorted file after this, as this does not affect the overview data.

    Or should this be done via news.daily, such as:

    news.daily delayrm lowmark expireover expireoverflags="-p -e" flags="-p"

    I use tradeindexed.

    Can someone help me with this?

    Thank you in advance.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Frank@21:1/5 to All on Sun Apr 23 16:08:22 2023
    Eli:

    How can I best renumber the article numbers of all newsgroups, sorted by post date.

    Why?! Renumbering articles is generally a bad idea. Newsreaders rely on
    them to tell what's new and what to mark as read.

    If a server renumbers its articles, its readers would need to throw out
    their newsrc and show everything as unread.

    Can someone help me with this?

    - stop accepting articles on the existing server
    - make a sorted list of storage tokens (you'll probably need to write a
    script for this. loop over the history and make a list of tokens and their posting date, then sort it)
    - set up a new empty server
    - use the sorted list to feed articles into the new server
    - swap servers
    - start accepting articles again

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to Frank on Tue Apr 25 09:34:09 2023
    On 23 Apr 2023 at 18:08:22 CEST, "Frank" <franky@xxx.yyy> wrote:

    Eli:

    How can I best renumber the article numbers of all newsgroups, sorted by post
    date.

    Why?! Renumbering articles is generally a bad idea. Newsreaders rely on
    them to tell what's new and what to mark as read.

    If a server renumbers its articles, its readers would need to throw out
    their newsrc and show everything as unread.

    It is a new server and has no readers yet, so that isn't a problem.

    Can someone help me with this?

    - stop accepting articles on the existing server
    - make a sorted list of storage tokens (you'll probably need to write a script for this. loop over the history and make a list of tokens and their posting date, then sort it)
    - set up a new empty server
    - use the sorted list to feed articles into the new server
    - swap servers
    - start accepting articles again

    With 300,000,000 articles, the server is quite large and transferring these articles to another server over a single connection takes forever.

    Sorting the history file by post date is no problem, but is it possible to rebuild the overview databases based on this sorted history file?

    Perhaps this is more of a question for Julien to answer?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matija Nalis@21:1/5 to Eli on Fri Apr 28 15:11:26 2023
    On Tue, 25 Apr 2023 09:34:09 GMT, Eli <eliistheman@gmail.com> wrote:
    On 23 Apr 2023 at 18:08:22 CEST, "Frank" <franky@xxx.yyy> wrote:
    If a server renumbers its articles, its readers would need to throw out
    their newsrc and show everything as unread.

    It is a new server and has no readers yet, so that isn't a problem.

    I don't think sorting the history file will help in your case.

    IIRC (but it has been some time, so take this with a grain of salt), in tradspool, each newsgroup is a separate directory, and each message is
    stored in separate file, whose filename is the number of the article.

    So e.g. /var/spool/inn2/news/admin/peering/1234 will be a name of file for message number 1234 in `news.admin.peering`. Message 1233 in the file with
    same name would be message shown before it, and 1235 message after it. If
    you want them to be in chronological order (so `article 1233` with show
    OLDER message then `article 1234`), you'll need to rename those files in
    such order.

    So if you do not want to involve another server; to do such renumber,
    one would need to (hopefully I'm not forgetting important step):

    - stop/pause inn
    - for each group, rename files so their numbers sequentially follow the
    chronological order of `Date` headers in their content (you might need
    to write a relatively simple script for that; I don't know if any exist
    already)
    - do "ctlinnd renumber ''" to update active file with new low/high watermarks
    - rm old tradindexed overviews (just to be safe) & force tradindexed
    overview rebuild from scratch for each group.

    In it's most simple case, Note that due to need to open & parse at least
    dozen lines from each of 300M files, and then do sort & mass-rename and
    finally to rebuild overviews, it will likely also take significant time.

    Depending on the underlying storage you might be able to gain speed
    increase via parallelizing this process by processing multiple groups at
    the same time. Or you might not (e.g. in HDD case, it is quite possible
    that disk thrashing would make it much slower instead if you try to parallelize)

    In more advanced case (e.g. if your overview records are fine and not
    suspect; you could probably parse them instead of article files to gain speedup; that might make the script more complicated, though)

    - use the sorted list to feed articles into the new server

    With 300,000,000 articles, the server is quite large and transferring these articles to another server over a single connection takes forever.

    Usin other server is the alternative, yes. You can run it in parallel over multiple connections, but only per-group parallelism (not at article
    level!).

    I.e. you must feed all articles in news.admin.peering sequentially in
    wanted chronological order, but you can feed news.software.nntp (and any
    number of other groups) at the same time (but articles in each group must
    be fed sequentially). Using CNFS on new server might make it faster too
    (but it has its own quirks to be aware of).

    Sorting the history file by post date is no problem, but is it possible to rebuild the overview databases based on this sorted history file?

    history file is used to detect duplicate articles when receiving them so
    they can be promptly refused. If is AFAIR not exposed to users otherwise.

    What is shown to NNRP users IIRC depends on per-group overview databases (tradindexed in your case?) e.g. via `XOVER 1234-`, which in turn depend on
    how the articles are numbered in spool itself (determined by filenames in
    case of tradspool) e.g. via `ARTICLE 1234`.

    --
    Opinions above are GNU-copylefted.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to All on Fri Apr 28 18:32:45 2023
    On 28 Apr 2023 at 15:11:26 CEST, "Matija Nalis" <mnalis-news@voyager.hr>
    wrote:

    On Tue, 25 Apr 2023 09:34:09 GMT, Eli <eliistheman@gmail.com> wrote:
    On 23 Apr 2023 at 18:08:22 CEST, "Frank" <franky@xxx.yyy> wrote:
    If a server renumbers its articles, its readers would need to throw out
    their newsrc and show everything as unread.

    It is a new server and has no readers yet, so that isn't a problem.

    So if you do not want to involve another server; to do such renumber,
    one would need to (hopefully I'm not forgetting important step):

    Thank you very much for the detailed explanation. However, I wonder if
    manually renumbering the article files works, since crossposts are stored as symbolic links. But it might be worth a try.

    With 300,000,000 articles, the server is quite large and transferring these >>> articles to another server over a single connection takes forever.

    Usin other server is the alternative, yes. You can run it in parallel over multiple connections, but only per-group parallelism (not at article
    level!).

    With single or multiple connections, things will probably go wrong again due
    to the crossposts.

    As an example: Suppose there are two newsgroups, named A and B.
    Both newsgroups have articles from the years 2003 to 2023.

    First, newsgroup A is transferred to the new server.
    Newsgroup A has an article from 2022 that has been crossposted to newsgroup B. Since newsgroup B does not yet have articles on the new server, this article will get article number 1 in newsgroup B. So the same problem arises again on the new server.

    Newsgroup B (new server):
    Article number 1: 2022
    Article number 2: 2003

    So it seems that renumbering by posted date is not possible at all due to the crossposts.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Matija Nalis@21:1/5 to Eli on Sat Apr 29 22:12:47 2023
    On Fri, 28 Apr 2023 18:32:45 GMT, Eli <eliistheman@gmail.com> wrote:
    On 28 Apr 2023 at 15:11:26 CEST, "Matija Nalis" <mnalis-news@voyager.hr>
    So if you do not want to involve another server; to do such renumber,
    one would need to (hopefully I'm not forgetting important step):

    Thank you very much for the detailed explanation. However, I wonder if manually renumbering the article files works, since crossposts are stored as symbolic links. But it might be worth a try.

    I'm not sure, but I think crossposts may have been stored as hardlinks instead? If that is true, then they wouldn't mind such renaming.

    But if they are indeed symlinks, then yes, your script would need to fix them too (by looking at Newsgroups header, doing readdir() in each group, and finding realink(2) where it points until it finds one that need to be
    fixed). That would obviously make it even slower, yes.

    With 300,000,000 articles, the server is quite large and transferring these
    articles to another server over a single connection takes forever.

    Usin other server is the alternative, yes. You can run it in parallel over >> multiple connections, but only per-group parallelism (not at article
    level!).

    With single or multiple connections, things will probably go wrong again due to the crossposts.

    Ah yes, you are correct, crossposts would break parallelism with multiple connections.

    But, it should still work for single connection, given good preparation
    (see below).

    So it seems that renumbering by posted date is not possible at all due to the crossposts.

    You'd first have to create a list of all messages sorted by date (sorted history file would be great for that, were it not for the fact that it
    contains ONLY articles that arrived in last xx days, and not ALL of them).

    And then you would simply feed the articles from that sorted list to the
    new (empty) server.

    They would be arriving on new server just like they did in the real life
    in chronological order, to one group or to the other, and even crossposts
    would arrive correctly (as message "X" would be in all cases after all
    older ones, but before all newer ones, regardless of group(s) they were
    posted to).

    --
    Opinions above are GNU-copylefted.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sat Apr 29 23:01:37 2023
    Hi Eli,

    As an example: Suppose there are two newsgroups, named A and B.
    Both newsgroups have articles from the years 2003 to 2023.

    First, newsgroup A is transferred to the new server.
    Newsgroup A has an article from 2022 that has been crossposted to newsgroup B.
    Since newsgroup B does not yet have articles on the new server, this article will get article number 1 in newsgroup B. So the same problem arises again on the new server.

    Newsgroup B (new server):
    Article number 1: 2022
    Article number 2: 2003

    So it seems that renumbering by posted date is not possible at all due to the crossposts.

    If you're renumbering the articles like Matija suggested for tradspool,
    you won't encounter that problem as you do not transfer articles from a
    server to another, but rebuilding the history file and overview data
    from your renumbered tradspool.

    What you are describing is a pullnews-like scenario ("newsgroup A is transferred to the new server").

    --
    Julien ÉLIE

    « I had some words with my wife, and she had some paragraphs with me. »
    (Sigmund Freud)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sat Apr 29 23:17:13 2023
    Hi Matija,

    - for each group, rename files so their numbers sequentially follow the
    chronological order of `Date` headers in their content (you might need
    to write a relatively simple script for that; I don't know if any exist
    already)

    FWIW, in <patharticles>, the dates can be obtained with something like:
    grep -m 1 '^Date: ' *
    and the header field values converted to epoch with the convdate tool,
    like in:
    convdate -n 'Fri, 28 Apr 2023 15:11:26 +0200'


    You'll also need updating the Xref header fields in articles.

    --
    Julien ÉLIE

    « Farpaitement ! » (Obélix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Sun Apr 30 18:23:26 2023
    On 29 Apr 2023 at 23:17:13 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Matija,

    - for each group, rename files so their numbers sequentially follow the
    chronological order of `Date` headers in their content (you might need
    to write a relatively simple script for that; I don't know if any exist >> already)

    FWIW, in <patharticles>, the dates can be obtained with something like:
    grep -m 1 '^Date: ' *
    and the header field values converted to epoch with the convdate tool,
    like in:
    convdate -n 'Fri, 28 Apr 2023 15:11:26 +0200'


    You'll also need updating the Xref header fields in articles.

    Thank you Julien.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to All on Sun Apr 30 18:22:41 2023
    On 29 Apr 2023 at 22:12:47 CEST, "Matija Nalis" <mnalis-news@voyager.hr>
    wrote:

    On Fri, 28 Apr 2023 18:32:45 GMT, Eli <eliistheman@gmail.com> wrote:
    On 28 Apr 2023 at 15:11:26 CEST, "Matija Nalis" <mnalis-news@voyager.hr>
    So if you do not want to involve another server; to do such renumber,
    one would need to (hopefully I'm not forgetting important step):

    Thank you very much for the detailed explanation. However, I wonder if
    manually renumbering the article files works, since crossposts are stored as >> symbolic links. But it might be worth a try.

    I'm not sure, but I think crossposts may have been stored as hardlinks instead?
    If that is true, then they wouldn't mind such renaming.

    But if they are indeed symlinks, then yes, your script would need to fix them too (by looking at Newsgroups header, doing readdir() in each group, and finding realink(2) where it points until it finds one that need to be
    fixed). That would obviously make it even slower, yes.

    With 300,000,000 articles, the server is quite large and transferring these
    articles to another server over a single connection takes forever.

    Usin other server is the alternative, yes. You can run it in parallel over >>> multiple connections, but only per-group parallelism (not at article
    level!).

    With single or multiple connections, things will probably go wrong again due >> to the crossposts.

    Ah yes, you are correct, crossposts would break parallelism with multiple connections.

    But, it should still work for single connection, given good preparation
    (see below).

    So it seems that renumbering by posted date is not possible at all due to the
    crossposts.

    You'd first have to create a list of all messages sorted by date (sorted history file would be great for that, were it not for the fact that it contains ONLY articles that arrived in last xx days, and not ALL of them).

    And then you would simply feed the articles from that sorted list to the
    new (empty) server.

    They would be arriving on new server just like they did in the real life
    in chronological order, to one group or to the other, and even crossposts would arrive correctly (as message "X" would be in all cases after all
    older ones, but before all newer ones, regardless of group(s) they were posted to).

    I'll give it a try soon.
    Thanks for the info.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri May 5 18:01:27 2023
    On 29 Apr 2023 at 23:01:37 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    As an example: Suppose there are two newsgroups, named A and B.
    Both newsgroups have articles from the years 2003 to 2023.

    First, newsgroup A is transferred to the new server.
    Newsgroup A has an article from 2022 that has been crossposted to newsgroup B.
    Since newsgroup B does not yet have articles on the new server, this article >> will get article number 1 in newsgroup B. So the same problem arises again on
    the new server.

    Newsgroup B (new server):
    Article number 1: 2022
    Article number 2: 2003

    So it seems that renumbering by posted date is not possible at all due to the
    crossposts.

    If you're renumbering the articles like Matija suggested for tradspool,
    you won't encounter that problem as you do not transfer articles from a server to another, but rebuilding the history file and overview data
    from your renumbered tradspool.

    What you are describing is a pullnews-like scenario ("newsgroup A is transferred to the new server").

    Is there a way to do the same when using the timecaf storage?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri May 5 20:09:07 2023
    Hi Eli,

    If you're renumbering the articles like Matija suggested for tradspool,
    you won't encounter that problem as you do not transfer articles from a
    server to another, but rebuilding the history file and overview data
    from your renumbered tradspool.

    Is there a way to do the same when using the timecaf storage?

    Renumbering articles in-place stored in timecaf buffers? No, that's not
    simple at all; you'll need rewriting the whole CAF (index + articles).
    Only tradspool can be done with "rudimentary" grep/sed commands.

    --
    Julien ÉLIE

    « I had some words with my wife, and she had some paragraphs with me. »
    (Sigmund Freud)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri May 5 18:44:16 2023
    On 5 May 2023 at 20:09:07 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    If you're renumbering the articles like Matija suggested for tradspool,
    you won't encounter that problem as you do not transfer articles from a
    server to another, but rebuilding the history file and overview data
    from your renumbered tradspool.

    Is there a way to do the same when using the timecaf storage?

    Renumbering articles in-place stored in timecaf buffers? No, that's not simple at all; you'll need rewriting the whole CAF (index + articles).
    Only tradspool can be done with "rudimentary" grep/sed commands.

    That is sad to hear.

    So it would actually be better if pullnews would download all articles per newsgroup and ignore the crossposts. Just download everything first, save the articles in their folders and add the xref field. Nothing more. Then, when all articles from all newsgroups have been downloaded, use 'ctlinnd renumber' to import the articles into INN and rebuild the overview data? Filtering might
    not work in this case?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri May 5 21:37:05 2023
    Hi Eli,

    So it would actually be better if pullnews would download all articles per newsgroup and ignore the crossposts. Just download everything first, save the articles in their folders and add the xref field. Nothing more.

    I'm wondering whether you could just:

    - Download all the articles with "pullnews -r" (it will write a file
    with all the articles within). You may run several instances of
    pullnews to have several files.

    - Parse the articles within these files (they are separated with "#!
    rnews <size>" lines) to take the dates and write the articles in a new
    batch file, ordered by posting date.

    - Inject these batch files into innd (with rnews). No need to change
    any Xref header fields. The articles will be treated in order, assigned
    new Xref, and you'll have article numbers and history file sorted as you
    want.


    Filtering might not work in this case?

    It will work as all the articles will be processed by innd.

    --
    Julien ÉLIE

    « – Rotomagnus, c'est par là ?
    – P'têt ben qu'oui. » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri May 5 20:58:59 2023
    On 5 May 2023 at 21:37:05 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    I'm wondering whether you could just:

    - Download all the articles with "pullnews -r" (it will write a file
    with all the articles within). You may run several instances of
    pullnews to have several files.

    - Parse the articles within these files (they are separated with "#!
    rnews <size>" lines) to take the dates and write the articles in a new
    batch file, ordered by posting date.

    - Inject these batch files into innd (with rnews). No need to change
    any Xref header fields. The articles will be treated in order, assigned
    new Xref, and you'll have article numbers and history file sorted as you want.

    Hi Julien,

    I let pullnews -r exporting about 4000 articles to the batch file named 'rnews01.batch'.
    The articles in the batch file are complete, saying headers and bodies, each separated with '#! rnews <bytes>'

    Then I used 'rnews -v rnews01.batch'
    But unfortunately INN doesn't accept the articles.

    Each article is refushed with the error:
    "rnews01.batch: rejected 437 No body [Path: not-for-mail ...]"

    Any suggestion?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to Eli on Fri May 5 21:10:38 2023
    On 5 May 2023 at 22:58:59 CEST, "Eli" <eliistheman@gmail.com> wrote:

    On 5 May 2023 at 21:37:05 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    I'm wondering whether you could just:

    - Download all the articles with "pullnews -r" (it will write a file
    with all the articles within). You may run several instances of
    pullnews to have several files.

    - Parse the articles within these files (they are separated with "#!
    rnews <size>" lines) to take the dates and write the articles in a new
    batch file, ordered by posting date.

    - Inject these batch files into innd (with rnews). No need to change
    any Xref header fields. The articles will be treated in order, assigned
    new Xref, and you'll have article numbers and history file sorted as you
    want.

    Hi Julien,

    I let pullnews -r exporting about 4000 articles to the batch file named 'rnews01.batch'.
    The articles in the batch file are complete, saying headers and bodies, each separated with '#! rnews <bytes>'

    Then I used 'rnews -v rnews01.batch'
    But unfortunately INN doesn't accept the articles.

    Each article is refushed with the error:
    "rnews01.batch: rejected 437 No body [Path: not-for-mail ...]"

    Any suggestion?

    I see something strange in the news log.
    For each of the above articles it says:
    "May 5 22:36:04.708 - not-for-mail <msg-id>^M 437 No body"

    Note the '^M'. It seems it doesn't seem to understand this newline character? The 'rnews01.batch' file contains these '^M' characters at the end of each line.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri May 5 23:38:53 2023
    Hi Eli,

    I've tried to convert the batchfile using 'dos2unix' and also 'sed -e "s/\r//g"'

    It won't work because the <size> changes...


    but other than the errors are gone, the articles are not
    transferred at all. The news log and others remain completely empty.

    Aren't these articles already in your spool?
    If the Message-IDs are already in the history, rnews won't try to send them.

    --
    Julien ÉLIE

    « Mieux vaut allumer une bougie que maudire les ténèbres. » (Lao Zi)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri May 5 21:38:58 2023
    On 5 May 2023 at 23:36:44 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    I see something strange in the news log.
    For each of the above articles it says:
    "May 5 22:36:04.708 - not-for-mail <msg-id>^M 437 No body"

    Note the '^M'. It seems it doesn't seem to understand this newline character?
    The 'rnews01.batch' file contains these '^M' characters at the end of each >> line.

    Indeed, I'll have a look. Either by having pullnews write articles with
    mere LF, or/and having rnews understand CRLF.

    Unfortunately, if you change CRLF by hand, <size> becomes wrong in "#!
    rnews <size>"...

    Ah, that's why dos2unix doesn't work !
    I look forward to your solution.

    Thanks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to Eli on Fri May 5 21:37:00 2023
    On 5 May 2023 at 23:10:38 CEST, "Eli" <eliistheman@gmail.com> wrote:

    On 5 May 2023 at 22:58:59 CEST, "Eli" <eliistheman@gmail.com> wrote:

    On 5 May 2023 at 21:37:05 CEST, "Julien ÉLIE"
    <iulius@nom-de-mon-site.com.invalid> wrote:

    I'm wondering whether you could just:

    - Download all the articles with "pullnews -r" (it will write a file
    with all the articles within). You may run several instances of
    pullnews to have several files.

    - Parse the articles within these files (they are separated with "#!
    rnews <size>" lines) to take the dates and write the articles in a new
    batch file, ordered by posting date.

    - Inject these batch files into innd (with rnews). No need to change
    any Xref header fields. The articles will be treated in order, assigned >>> new Xref, and you'll have article numbers and history file sorted as you >>> want.

    Hi Julien,

    I let pullnews -r exporting about 4000 articles to the batch file named
    'rnews01.batch'.
    The articles in the batch file are complete, saying headers and bodies, each >> separated with '#! rnews <bytes>'

    Then I used 'rnews -v rnews01.batch'
    But unfortunately INN doesn't accept the articles.

    Each article is refushed with the error:
    "rnews01.batch: rejected 437 No body [Path: not-for-mail ...]"

    Any suggestion?

    I see something strange in the news log.
    For each of the above articles it says:
    "May 5 22:36:04.708 - not-for-mail <msg-id>^M 437 No body"

    Note the '^M'. It seems it doesn't seem to understand this newline character? The 'rnews01.batch' file contains these '^M' characters at the end of each line.

    I've tried to convert the batchfile using 'dos2unix' and also 'sed -e "s/\r//g"', but other than the errors are gone, the articles are not transferred at all. The news log and others remain completely empty.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri May 5 23:36:44 2023
    Hi Eli,

    I see something strange in the news log.
    For each of the above articles it says:
    "May 5 22:36:04.708 - not-for-mail <msg-id>^M 437 No body"

    Note the '^M'. It seems it doesn't seem to understand this newline character? The 'rnews01.batch' file contains these '^M' characters at the end of each line.

    Indeed, I'll have a look. Either by having pullnews write articles with
    mere LF, or/and having rnews understand CRLF.

    Unfortunately, if you change CRLF by hand, <size> becomes wrong in "#!
    rnews <size>"...

    Thanks for the report! Seems like "-r" is not a widely-used parameter...

    --
    Julien ÉLIE

    « Une robe de femme doit être comme une plaidoirie : assez longue pour
    couvrir le sujet, assez courte pour être suivie. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri May 5 22:05:32 2023
    On 5 May 2023 at 23:38:53 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    I've tried to convert the batchfile using 'dos2unix' and also 'sed -e
    "s/\r//g"'

    It won't work because the <size> changes...


    but other than the errors are gone, the articles are not
    transferred at all. The news log and others remain completely empty.

    Aren't these articles already in your spool?
    If the Message-IDs are already in the history, rnews won't try to send them.

    I tried again with another newsgroup, making sure the articles were not yet in the spool and for that batch (after dos2unix) only the first article is imported, but the body of this article also contains the header of the second article. So it seems that dos2unix removes the distinction between the articles.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sat May 6 00:13:51 2023
    Hi Eli,

    The 'rnews01.batch' file contains these '^M' characters at the end of each >>> line.

    Indeed, I'll have a look. Either by having pullnews write articles with
    mere LF, or/and having rnews understand CRLF.

    I look forward to your solution.

    Could you please test this following patch?

    I've tested it with 2 articles in an rnews batch generated with rnews,
    and it was imported fine.
    (The first 2 fixes for Xref and Bytes are not needed in your case, but
    should be fixed in the final commit as well as how $tx_len is computed.)


    --- a/frontends/pullnews.in
    +++ b/frontends/pullnews.in
    @@ -1150,7 +1150,7 @@ sub crossFeedGroup {
    my $xref_h
    = "Xref: "
    . $upstreamParams->{$server}->{name}
    - . " $group:$i\n";
    + . " $group:$i\r\n";
    splice(@{$article}, $idx_blank_pre_body, 0, $xref_h);
    $tx_len += length($xref_h);
    $idx_blank_pre_body++;
    @@ -1162,7 +1162,7 @@ sub crossFeedGroup {
    # field is not counted, as well as header fields
    removed by
    # pullnews.
    my $bytes_real_count = $tx_len + scalar(@{$article});
    - my $bytes_h = "Bytes: $bytes_real_count\n";
    + my $bytes_h = "Bytes: $bytes_real_count\r\n";
    splice(@{$article}, $idx_blank_pre_body, 0, $bytes_h);
    $tx_len += length($bytes_h);
    $idx_blank_pre_body++;
    @@ -1207,8 +1207,13 @@ sub crossFeedGroup {
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Sat May 6 06:38:28 2023
    On 6 May 2023 at 00:13:51 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Could you please test this following patch?

    I've tested it with 2 articles in an rnews batch generated with rnews,
    and it was imported fine.
    (The first 2 fixes for Xref and Bytes are not needed in your case, but
    should be fixed in the final commit as well as how $tx_len is computed.)

    Works!

    After applying the patch, all articles were successfully imported.

    Thanks Julien.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sat May 6 09:28:56 2023
    Hi Eli,

    As an example: Suppose there are two newsgroups, named A and B.
    Both newsgroups have articles from the years 2003 to 2023.

    First, newsgroup A is downloaded using 'pullnews -r'.
    Then, newsgroup B is downloaded using 'pullnews -r'.

    Both groups are downloaded into two separated batchfiles.

    When finished downloading, the batchfile created for newsgroup A is feeded to INN using rnews.

    The 2 batch files have to be merged in one, ordered by posting date, and
    not fed separately to INN.

    --
    Julien ÉLIE

    « Tout est dans tout, et réciproquement. » (Pierre Dac)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Sat May 6 07:22:30 2023
    On 5 May 2023 at 21:37:05 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    So it would actually be better if pullnews would download all articles per >> newsgroup and ignore the crossposts. Just download everything first, save the
    articles in their folders and add the xref field. Nothing more.

    I'm wondering whether you could just:

    - Download all the articles with "pullnews -r" (it will write a file
    with all the articles within). You may run several instances of
    pullnews to have several files.

    - Parse the articles within these files (they are separated with "#!
    rnews <size>" lines) to take the dates and write the articles in a new
    batch file, ordered by posting date.

    - Inject these batch files into innd (with rnews). No need to change
    any Xref header fields. The articles will be treated in order, assigned
    new Xref, and you'll have article numbers and history file sorted as you want.

    Hi Julien,

    I don't think this solves the problem either

    As an example: Suppose there are two newsgroups, named A and B.
    Both newsgroups have articles from the years 2003 to 2023.

    First, newsgroup A is downloaded using 'pullnews -r'.
    Then, newsgroup B is downloaded using 'pullnews -r'.

    Both groups are downloaded into two separated batchfiles.

    When finished downloading, the batchfile created for newsgroup A is feeded to INN using rnews.

    It contains an article from 2022 that has been crossposted to newsgroup B.

    Since the batchfile for newsgroup B is not feeded to INN yet, this article
    will get article number 1 in newsgroup B. So the problem arises again.

    After both batchfiles are feeded to INN, newsgroup B looks like:
    Article number 1: 2022
    Article number 2: 2003

    I haven't tried this in practice yet, but it doesn't seem like this method
    will work. Or am I wrong?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Sat May 6 07:39:10 2023
    On 6 May 2023 at 09:28:56 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,
    The 2 batch files have to be merged in one, ordered by posting date, and
    not fed separately to INN.

    Oh yes, that makes sense of course :D

    Thank you!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Wed May 17 07:10:24 2023
    On 10 May 2023 at 22:40:09 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    I see in the source code that there is a hard-coded number for the
    maximum number of articles a single CAF file has room for:

    [storage/timecaf/caf.h]
    /*
    ** Number of slots to put in TOC by default. Can be raised if we ever get
    ** more than 256*1024=262144 articles in a file (frightening thought).
    */

    The "262145: CAF_ERR_ARTWONTFIT" error corresponds to it.

    Hi Julien,

    Is it possible to rebuild the history and overview data with just the timecaf files?

    This in case the system is crashed and I only backed up the timecaf files.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed May 17 18:22:05 2023
    Hi Eli,

    Incidentally, in case you start any other thread about INN or any other
    news server, please do that in the news.software.nntp newsgroup. This
    one, news.admin.peering, is not the appropriate one for that kind of
    questions. Besides, you may find useful information in other
    discussions taking place in news.software.nntp.

    Nobody still has popped up to say that we weren't in charter :-)

    --
    Julien ÉLIE

    « We always hold hands. If I let go, she shops. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed May 17 18:17:00 2023
    Hi Eli,

    I see in the source code that there is a hard-coded number for the
    maximum number of articles a single CAF file has room for:

    [storage/timecaf/caf.h]
    /*
    ** Number of slots to put in TOC by default. Can be raised if we ever get >> ** more than 256*1024=262144 articles in a file (frightening thought).
    */

    Is it possible to rebuild the history and overview data with just the timecaf files?

    Yes, with "makehistory -O -F".


    This in case the system is crashed and I only backed up the timecaf files.

    I of course hope it won't crash.

    --
    Julien ÉLIE

    « Quand j'étais petit, je séchais la natation. » (Philippe Geluck)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Wed May 17 17:38:39 2023
    On 17 May 2023 at 18:22:05 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    Incidentally, in case you start any other thread about INN or any other
    news server, please do that in the news.software.nntp newsgroup.

    You are absolutely correct.

    Will do.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)