It seems like it should be possible to use the same batch file that you generate for innxmit (when feeding all articles to another server, for example) with innfeed, but I can't seem to get this to work as expected. I was
hoping to leverage innfeed's multiple connections to get through this process faster. Streaming 290,000,000 articles via a single connection is s-l-o-w.
I create a custom innfeed configuration file and specify it, which is a copy of stock innfeed.conf but change the PID file and add the remote peer. I place
the batch file in /usr/local/news/spool/innfeed/peername and start innfeed using the custom configuration file. It seems to be reading through the entire
batch file, but only offers a small number of articles to the remote server, and the input file is removed.
I'm not sure why it isn't going through the entire batch, any ideas?
This is what is logged:
May 13 00:29:27 spool1 innfeed[28888]: ME starting at Sat May 13 00:29:27 2023
(INN 2.7.0)
May 13 00:29:27 spool1 innfeed[28888]: loading /usr/local/news/etc/spoolfeed.conf
May 13 00:29:30 spool1 innfeed[28888]: gatekeeper new hand-prepared backlog file
May 13 00:29:30 spool1 innfeed[28888]: gatekeeper grabbing external tape file May 13 00:29:30 spool1 innfeed[28888]: gatekeeper:0 connected
May 13 00:29:30 spool1 innfeed[28888]: gatekeeper remote MODE STREAM
May 13 00:31:58 spool1 innfeed[15156]: ME time 600511 idle 600511(26) blstats 0(26) stsfile 0(0) write 0(0)
May 13 00:36:28 spool1 innfeed[28888]: ME received shutdown signal
May 13 00:36:28 spool1 innfeed[28888]: gatekeeper checkpoint seconds 418 offered 17 accepted 0 refused 17 rejected 0 missing 0 accsize 0 rejsize 0 spooled 0 on_close 0 unspooled 17 deferred 0/0.0 requeued 0 queue 0.0/10:100,0,0,0,0,0
May 13 00:36:28 spool1 innfeed[28888]: gatekeeper final seconds 418 offered 17
accepted 0 refused 17 rejected 0 missing 0 accsize 0 rejsize 0 spooled 0 on_close 0 unspooled 17 deferred 0/0.0 requeued 0 queue 0.0/10:100,0,0,0,0,0 May 13 00:36:28 spool1 innfeed[28888]: gatekeeper:0 checkpoint seconds 418 offered 17 accepted 0 refused 17 rejected 0 accsize 0 rejsize 0
May 13 00:36:28 spool1 innfeed[28888]: gatekeeper:0 final seconds 418 offered 17 accepted 0 refused 17 rejected 0 accsize 0 rejsize 0
May 13 00:36:28 spool1 innfeed[28888]: gatekeeper global seconds 418 offered 17 accepted 0 refused 17 rejected 0 missing 0 accsize 0 rejsize 0 spooled 0 unspooled 17
May 13 00:36:28 spool1 innfeed[28888]: ME global seconds 421 offered 17 accepted 0 refused 17 rejected 0 missing 0 accsize 0 rejsize 0 spooled 0 unspooled 17
May 13 00:36:28 spool1 innfeed[28888]: ME finishing at Sat May 13 00:36:28 2023
I was
hoping to leverage innfeed's multiple connections to get through this process
faster. Streaming 290,000,000 articles via a single connection is s-l-o-w.
So I see that innfeed wants:
@token@ <message-id>
How would I go about creating a batch in the format innfeed needs from my history file?
Hi Jesse,
Couldn't you run several innxmit instances in parallel?
If you're worrying about a chronological feed, you may split the file containing your tokens to feed into several interleaved parts. For
instance with 4 innxmit instances, one file with tokens 1, 5, 9, etc.
another file with tokens 2, 6, 10, etc.
We may assume they will be fed at a similar pace.
Last time we spoke about that, running "sm -H '@token@'" to retrieve the Message-ID was not fast enough (especially if you have to run it on
millions of articles). I don't know how it could be done otherwise in command-line.
Otherwise, innfeed should be modified to lookup the Message-ID, when not given, in a similar way as innxmit does.
As innfeed is called in a newsfeeds feed, and innd has the Message-ID at
that time, it saves the lookup to give it along with the storage token. innfeed needs it because of the NNTP protocol (IHAVE <mid>). That's certainly why innfeed expects that format.
So I see that innfeed wants:
@token@ <message-id>
And the history file does not contain the Message-ID, and I am assuming when I
cancelled innxmit and it rewrote the batch file that I used for innfeed, the first 17 lines have the Message-ID, but I believe this is coming from innxmit doing the lookup prior to sending and writing this out in the batch when it exits...
How would I go about creating a batch in the format innfeed needs from my history file?
Hmm, I'm wondering whether innfeed really makes use of the <message-id>
innd gives, apart for the IHAVE <message-id> exchange with the remote
peer. I've not tested but maybe you couldn't try a batch with:
@token1@ <1@dumbid>
@token2@ <2@dumbid>
@token3@ <3@dumbid>
where @tokenX@ are the tokens you want to send, followed with an
arbitrary Message-ID (yet different for each token). Are the 3 articles
fed OK?
If the remote server is INN, it won't mind receiving an article with a Message-ID different than the one in the NNTP command used to send it.
Hmm, I'm wondering whether innfeed really makes use of the <message-id>
innd gives, apart for the IHAVE <message-id> exchange with the remote
peer. I've not tested but maybe you couldn't try a batch with:
@token1@ <1@dumbid>
@token2@ <2@dumbid>
@token3@ <3@dumbid>
where @tokenX@ are the tokens you want to send, followed with an
arbitrary Message-ID (yet different for each token). Are the 3 articles
fed OK?
I haven't checked the source code, but I would expect innfeed to use the message ID in the CHECK command to avoid reading the whole article from
disk in the (very common) case that the remote server declines the CHECK.
If the remote server is INN, it won't mind receiving an article with a
Message-ID different than the one in the NNTP command used to send it.
This remains true, of course. You'll put a bunch of bogus message IDs in
the remote server's conflict cache, so it's not exactly the friendliest
thing to do, but for cooperating servers it might work.
That said, the message ID is also given on the TAKETHIS command line, and
I'm not sure if innfeed gets that from the article or the input batch.
If from the input batch, a server would be entirely within its rightsYes, it would totally be within its rights.
to reject a message where the message ID on the TAKETHIS command line
was different than the message ID in the article. (I'm not sure if
any do.)
Hi Russ,
Hmm, I'm wondering whether innfeed really makes use of the <message-id>
innd gives, apart for the IHAVE <message-id> exchange with the remote
peer. I've not tested but maybe you couldn't try a batch with:
@token1@ <1@dumbid>
@token2@ <2@dumbid>
@token3@ <3@dumbid>
where @tokenX@ are the tokens you want to send, followed with an
arbitrary Message-ID (yet different for each token). Are the 3 articles >>> fed OK?
I haven't checked the source code, but I would expect innfeed to use the
message ID in the CHECK command to avoid reading the whole article from
disk in the (very common) case that the remote server declines the CHECK.
I've just quickly had a look, and I do not see any parsing of headers.
For CHECK, IHAVE and TAKETHIS, the Message-ID used is taken from artMsgId(article) where article->msgid has been set with newArticle()
using the input from innd.
So this kludge may really work for Jesse. I look forward to reading
your results!
On May 16, 2023 at 3:12:37 PM CDT, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:
Hi Russ,
I've just quickly had a look, and I do not see any parsing of headers.Hmm, I'm wondering whether innfeed really makes use of the <message-id> >>>> innd gives, apart for the IHAVE <message-id> exchange with the remote
peer. I've not tested but maybe you couldn't try a batch with:
@token1@ <1@dumbid>
@token2@ <2@dumbid>
@token3@ <3@dumbid>
where @tokenX@ are the tokens you want to send, followed with an
arbitrary Message-ID (yet different for each token). Are the 3 articles >>>> fed OK?
I haven't checked the source code, but I would expect innfeed to use the >>> message ID in the CHECK command to avoid reading the whole article from
disk in the (very common) case that the remote server declines the CHECK. >>
For CHECK, IHAVE and TAKETHIS, the Message-ID used is taken from
artMsgId(article) where article->msgid has been set with newArticle()
using the input from innd.
So this kludge may really work for Jesse. I look forward to reading
your results!
Indeed it does with a small test:
Batch file:
@05000000940900000000001C496C00000000@ <1@dumbid>
@0500000021AE0000000000020A2100000000@ <2@dumbid>
@0500000002D8000000000006802500000000@ <3@dumbid>
Messages accepted on the remote server with actual Message-IDs:
May 16 17:03:08.665 + news.blueworldhosting.com <u3n1li$20fql$3@dont-email.me> 3641
May 16 17:03:08.787 + news.blueworldhosting.com <%RD7M.3041094$vBI8.1207177@fx15.iad> 2474
May 16 17:03:08.791 + news.blueworldhosting.com <ac766a8d-a700-4b72-80d4-6fd11376fe9dn@googlegroups.com> 6717
Messages accepted on the remote server with actual Message-IDs
This seems to be working great, moving around 95,000 articles every ten minutes according to innfeed logs, but I'm seeing some low occurrence of the following message in /var/log/news/news on the destination server:
(null) 439 Bad "Message-ID" header field
There are 101 occurrences of this error out of a few hundred thousand articles
fed. Should I be concerned about that, and is there any way for me to find out
which articles are problematic?
Source server is INN 2.7.0 and destination is 2.7.1. Besides difference in overview method, the servers have a pretty identical configuration.
Hi Jesse,
Messages accepted on the remote server with actual Message-IDs
That's a good news :)
This seems to be working great, moving around 95,000 articles every ten
minutes according to innfeed logs, but I'm seeing some low occurrence of the >> following message in /var/log/news/news on the destination server:
(null) 439 Bad "Message-ID" header field
There are 101 occurrences of this error out of a few hundred thousand articles
fed. Should I be concerned about that, and is there any way for me to find out
which articles are problematic?
Source server is INN 2.7.0 and destination is 2.7.1. Besides difference in >> overview method, the servers have a pretty identical configuration.
Strange.
It would be worthwhile investigating on a few Message-IDs.
If your remote peer has the following logs:
<mid1> accepted
(null) 439 Bad "Message-ID" header field
<mid3> accepted
You may try to run on your feeding peer:
grephistory '<mid1>'
It will give the storage token of that article.
Then look at your innfeed-batch file, and try to retrieve some tokens
after it with an "sm '@token@'" command. One of them should be the
article with <mid2>. (And <mid3> afterwards.)
Do you see anything special with that article?
Were they first received with INN 2.7.0 or a previous version?
Now I have a repeatable process
I'm not sure if the ordering of articles received will be exact since
I'm using 20 connections against the batch?
I do have a question though, I see some entries are being made to /usr/local/news/spool/innfeed/batch.output. Innfeed's manpage makes a short reference to this file in that it is where entries go that could not be processed for some reason, but it does not explain what innfeed does with this
.output file. Does it ever get reprocessed automatically?
I do have a question though, I see some entries are being made to
/usr/local/news/spool/innfeed/batch.output. Innfeed's manpage makes a
short
reference to this file in that it is where entries go that could not be
processed for some reason, but it does not explain what innfeed does
with this
.output file. Does it ever get reprocessed automatically?
Yes, they are reprocessed automatically. This is parameterized with the backlog* keys in innfeed.conf.
Many thanks for this question! I agree the innfeed(8) manual page should
say they are processed. We are under the impression they are not, with
the current wording. I'll add a sentence about that.
FYI, every backlog-rotate-period seconds, something like that happens:
if [ ! -f PEER.input ]; then
if [ -f PEER ]; then
mv PEER PEER.input
elif [ -f PEER.output ]; then
mv PEER.output PEER
fi
fi
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 307 |
Nodes: | 16 (2 / 14) |
Uptime: | 126:56:36 |
Calls: | 6,854 |
Files: | 12,360 |
Messages: | 5,417,485 |