Forum: >>> Magnum BBS <<<

INN odd behavior during expireover?

From Jesse Rehmer@21:1/5 to All on Fri Jun 24 04:05:31 2022

For the past week I've been pull articles into a news server running FreeBSD and INN 2.7.0rc1, and got to a stopping point after adding a few hundred gigabytes to the spool, so I decided to run news.daily since I hadn't since this endevor began.

I understand during certain portions of the news.daily process the server gets paused. However, I am noticing very strange behavior when the server claims it is not paused (according to logs), but while expireover is running.

INN is unresponsive to ctlinnd commands for ~5 minutes and suddenly becomes responsive, the upstream innfeed process (on another VM) reports a non-responsive connection during these periods. NNRPd is responing normally
for the most part, I can AUTH, issue GROUP, BODY, and similar commands, but when I issue a POST I get no response from the server until my connection
times out.

I can't find anything useful in the logs as to what is happening while INN is not responding, nor do I see any other underlying OS or I/O issue. The expireover process is using 100% of one CPU core, and quite a bit of I/O, but not enough that should cause this kind of unresponsiveness. The OS is responsive and other applications/daemons on the box respond as expected so I don't see a server-wide performance problem.

I've put this server under taxing I/O load while injecting articles over the last and never saw this behavior until expireover had been running for a
little over 4 hours. Any ideas?

top:
last pid: 53975; load averages: 1.22, 1.20, 1.15

up 4+20:06:35 22:59:56
75 processes: 2 running, 73 sleeping
CPU: 2.1% user, 0.0% nice, 10.6% system, 0.0% interrupt, 87.3% idle
Mem: 160M Active, 6103M Inact, 62M Laundry, 8652M Wired, 905M Free
ARC: 5910M Total, 3785M MFU, 1455M MRU, 912K Anon, 91M Header, 578M Other
3725M Compressed, 8526M Uncompressed, 2.29:1 Ratio
Swap: 2048M Total, 2048M Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
8478 news 1 103 0 66M 42M CPU5 5 434:17 99.86% expireover 55548 root 1 20 0 32M 8916K select 3 3:45 0.09% snmpd 53975 root 1 20 0 14M 3440K CPU3 3 0:00 0.05% top 31561 root 4 20 0 49M 18M select 0 6:12 0.05% vmtoolsd
53913 news 1 24 4 37M 16M sbwait 6 0:00 0.03% nnrpd 29975 root 1 20 0 13M 2292K select 1 4:26 0.01% syslogd -----

iostat:
root@spool1:/var/log # iostat
tty da0 cd0 pass0 cpu
tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
1 728 43.4 222 9.4 17.2 0 0.0 0.0 0 0.0 6 0 6 0 88 -----

systat:
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average ||||||

/0% /10 /20 /30 /40 /50 /60 /70 /80 /90 /100
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
news expireover XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXX
-----

Cheers,
Jesse

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to jesse.rehmer@blueworldhosting.com on Fri Jun 24 04:42:39 2022

On Jun 23, 2022 at 11:05:31 PM CDT, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:
<snip>

I've put this server under taxing I/O load while injecting articles over the last and never saw this behavior until expireover had been running for a little over 4 hours. Any ideas?

<snip>

So after killing the few nnrpd processes that were running off, the
performance issue has resolved. There were only two active processes, my reader, and one from UZANTO-RETO (<https://www.uzantoreto.com/>) who scans my server. They make slow requests and were not hammering the box, and my connection is idle. The expireover process is still running same as before,
but article delivery and other commands are good.

As soon as I start NNRPd and make a connection the problem returns even though I am doing nothing. WTH.

-Jesse

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From The Doctor@21:1/5 to jesse.rehmer@blueworldhosting.com on Fri Jun 24 04:15:44 2022

In article <4353171044.279cb5a2@freebsd-inject1.usenet.blueworldhosting.com>, Jesse Rehmer <jesse.rehmer@blueworldhosting.com> wrote:

For the past week I've been pull articles into a news server running FreeBSD >and INN 2.7.0rc1, and got to a stopping point after adding a few hundred >gigabytes to the spool, so I decided to run news.daily since I hadn't since >this endevor began.

I understand during certain portions of the news.daily process the server gets >paused. However, I am noticing very strange behavior when the server claims it >is not paused (according to logs), but while expireover is running.

INN is unresponsive to ctlinnd commands for ~5 minutes and suddenly becomes >responsive, the upstream innfeed process (on another VM) reports a >non-responsive connection during these periods. NNRPd is responing normally >for the most part, I can AUTH, issue GROUP, BODY, and similar commands, but >when I issue a POST I get no response from the server until my connection >times out.

I can't find anything useful in the logs as to what is happening while INN is >not responding, nor do I see any other underlying OS or I/O issue. The >expireover process is using 100% of one CPU core, and quite a bit of I/O, but >not enough that should cause this kind of unresponsiveness. The OS is >responsive and other applications/daemons on the box respond as expected so I >don't see a server-wide performance problem.

I've put this server under taxing I/O load while injecting articles over the >last and never saw this behavior until expireover had been running for a >little over 4 hours. Any ideas?

top:
last pid: 53975; load averages: 1.22, 1.20, 1.15

up 4+20:06:35 22:59:56
75 processes: 2 running, 73 sleeping
CPU: 2.1% user, 0.0% nice, 10.6% system, 0.0% interrupt, 87.3% idle
Mem: 160M Active, 6103M Inact, 62M Laundry, 8652M Wired, 905M Free
ARC: 5910M Total, 3785M MFU, 1455M MRU, 912K Anon, 91M Header, 578M Other
3725M Compressed, 8526M Uncompressed, 2.29:1 Ratio
Swap: 2048M Total, 2048M Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 8478 news 1 103 0 66M 42M CPU5 5 434:17 99.86% expireover
55548 root 1 20 0 32M 8916K select 3 3:45 0.09% snmpd >53975 root 1 20 0 14M 3440K CPU3 3 0:00 0.05% top >31561 root 4 20 0 49M 18M select 0 6:12 0.05% vmtoolsd
53913 news 1 24 4 37M 16M sbwait 6 0:00 0.03% nnrpd >29975 root 1 20 0 13M 2292K select 1 4:26 0.01% syslogd >-----

iostat:
root@spool1:/var/log # iostat
tty da0 cd0 pass0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
1 728 43.4 222 9.4 17.2 0 0.0 0.0 0 0.0 6 0 6 0 88
-----

systat:
/0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
Load Average ||||||

/0% /10 /20 /30 /40 /50 /60 /70 /80 /90 /100
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
news expireover XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
root idle XXXXXXXXXXXXXXX
-----

Cheers,
Jesse

How is your expire configuration configured?
--
Member - Liberal International This is doctor@@nl2k.ab.ca Ici doctor@@nl2k.ab.ca
Yahweh, Queen & country!Never Satan President Republic!Beware AntiChrist rising!
Look at Psalms 14 and 53 on Atheism https://www.empire.kred/ROOTNK?t=94a1f39b Saying good things is evil when they are lies. -unknown Beware https://mindspring.com

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to All on Fri Jun 24 05:17:29 2022

On Jun 23, 2022 at 11:15:44 PM CDT, "The Doctor" <The Doctor> wrote:

How is your expire configuration configured?

On this box it is the first time the expire process has been ran since injecting ~52,000,000 articles.

expire.ctl is pretty simple, though I understand given the history I've
brought over that the *.jobs* and *.test is going to take a long time.

*:A:1:never:never
*.jobs*:A:1:90:90
*.test:A:1:300:300
news.lists.filters:A:1:10:10

I've put this hardware and INN under much heavier load. In tests prior to filling the spool this server has achieved article delivery throughput rates
of almost 100 articles per second (input stream 600-800Mbps using text and binary articles), and kept up with output of 1.2Gbps to downstream nodes without delay and using far more disk I/O an throughput that I'm seeing occur now while the expireover process is running.

What I'm seeing now just doesn't make any sense.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to 4575903189.1ff43929@freebsd-inject1 on Fri Jun 24 10:47:01 2022

On Jun 23, 2022 at 11:42:39 PM CDT, "Jesse Rehmer" in <4575903189.1ff43929@freebsd-inject1.usenet.blueworldhosting.com> wrote:

On Jun 23, 2022 at 11:05:31 PM CDT, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:
<snip>

I've put this server under taxing I/O load while injecting articles over the >> last and never saw this behavior until expireover had been running for a
little over 4 hours. Any ideas?

<snip>

So after killing the few nnrpd processes that were running off, the performance issue has resolved. There were only two active processes, my reader, and one from UZANTO-RETO (<https://www.uzantoreto.com/>) who scans my server. They make slow requests and were not hammering the box, and my connection is idle. The expireover process is still running same as before, but article delivery and other commands are good.

As soon as I start NNRPd and make a connection the problem returns even though
I am doing nothing. WTH.

-Jesse

There was an 18-minute gap in innd activity according to the news.notice log where it was completely unresponsive (no response on port 119, no response to ctlinnd commands, etc.). During this period truss did not output anything for innd or its children (unfortunately I wasn't attached when the condition began). When things sprang to life what I saw with truss looked like history
or overview read/writes and articles started flowing.

What I'm seeing in the logs is the period of unresponsiveness ends with innd logging its stats. There are some interesting entries after that, such as "SERVER cant sendto CCreader" and I have a lot of "Connection reset by peer" and "broken pipe" messages. The latter two appear frequently for a few minutes and subside.

news.notice:

Jun 24 05:10:50 spool1 innd[39682]: rejecting[python] <part26of137.j3RKrZ2e0nT3100BTRqM@camelsystem-powerpost.local> 437 Binary (yEnc)
Jun 24 05:10:50 spool1 innd[39682]: rejecting[python] <ozVqI.399066$DJ2.392163@fx42.iad> 437 Binary (yEnc)
Jun 24 05:10:51 spool1 innd[39682]: rejecting[python] <ozVqI.216709$Qf2.141390@fx38.iad> 437 Binary (yEnc)
Jun 24 05:28:51 spool1 innd[39682]: ME time 1106338 hishave 2281(1872) hisgrep 3391(1340) hiswrite 2113(761) hissync 0(0) idle 12369(8902) artclean 7(955) artwrite 1303(761) artcncl 0(0) hisgrep/artcncl 0(0) hishave/artcncl 0(0) artlog/artcncl 0(0) hiswrite/artcncl 0(0) sitesend 16(761) overv 1080211(761) perl 27(761) python 2857(1868) nntpread 135(9724) artparse 182(8378) artlog/artparse 0(0) artlog 36(955) datamove 27(10983)
Jun 24 05:28:51 spool1 innd[39682]: ME HISstats 0 hitpos 926 hitneg 0 missed 946 dne
Jun 24 05:28:51 spool1 innd[39682]: ME status seconds 3613 accepted 27217 refused 14 rejected 10902 duplicate 0 accepted size 248355872 duplicate size 0 rejected size 3782013184
Jun 24 05:28:51 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com> status seconds 3613 accepted 1346 refused 5 rejected 50 duplicate 0 accepted size 124000704 duplicate size 0 rejected size 91707
Jun 24 05:28:51 spool1 innd[39682]: localhost status seconds 1856 accepted 25871 refused 9 rejected 10852 duplicate 0 accepted size 124355152 duplicate size 0 rejected size 3781921280
Jun 24 05:28:51 spool1 innd[39682]: ctlinnd command s
Jun 24 05:28:51 spool1 innd[39682]: SERVER cant sendto CCreader bytes 211 Connection refused
Jun 24 05:28:51 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com> connected 38 streaming allowed
Jun 24 05:28:51 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com> connected 77 streaming allowed
Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com> connected 292 streaming allowed
Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:30>
cant
read: Connection reset by peer
Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:38>
cant
read: Connection reset by peer
Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:77>
cant
read: Connection reset by peer
Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com> connected 295 streaming allowed
Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:165> cant
read: Connection reset by peer
Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:165> cant
write: Broken pipe
Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:165> checkpoint seconds 1131 accepted 6 refused 1 rejected 0 duplicate 0 accepted size 27845 duplicate size 0 rejected size 0
Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:165> closed seconds 1131 accepted 6 refused 1 rejected 0 duplicate 0 accepted size 27845 duplicate size 0 rejected size 0
Jun 24 05:28:52 spool1 innd[39682]: localhost:176 cant read: Connection reset by peer
Jun 24 05:28:52 spool1 innd[39682]: localhost:176 cant write: Broken pipe
Jun 24 05:28:52 spool1 innd[39682]: localhost:176 checkpoint seconds 1108 accepted 0 refused 0 rejected 68 duplicate 0 accepted size 0 duplicate size 0 rejected size 27010432
Jun 24 05:28:52 spool1 innd[39682]: localhost:176 closed seconds 2815 accepted 0 refused 0 rejected 4268 duplicate 0 accepted size 0 duplicate size 0
rejected size 1689575168
Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com> connected 165 streaming allowed
Jun 24 05:28:52 spool1 innd[39682]: localhost:188 cant read: Connection reset by peer
Jun 24 05:28:52 spool1 innd[39682]: localhost:188 cant write: Broken pipe
Jun 24 05:28:52 spool1 innd[39682]: localhost:188 checkpoint seconds 1089 accepted 44 refused 0 rejected 0 duplicate 0 accepted size 78612 duplicate
size 0 rejected size 0
Jun 24 05:28:52 spool1 innd[39682]: localhost:188 closed seconds 2679 accepted 10005 refused 9 rejected 39 duplicate 0 accepted size 24383800 duplicate size
0 rejected size 441789
Jun 24 05:28:52 spool1 innd[39682]: localhost:190 cant read: Connection reset by peer
Jun 24 05:28:52 spool1 innd[39682]: localhost:190 cant write: Broken pipe

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Jesse Rehmer on Fri Jun 24 09:06:51 2022

Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

There was an 18-minute gap in innd activity according to the news.notice
log where it was completely unresponsive (no response on port 119, no response to ctlinnd commands, etc.). During this period truss did not
output anything for innd or its children (unfortunately I wasn't
attached when the condition began). When things sprang to life what I
saw with truss looked like history or overview read/writes and articles started flowing.

What I'm seeing in the logs is the period of unresponsiveness ends with
innd logging its stats. There are some interesting entries after that,
such as "SERVER cant sendto CCreader" and I have a lot of "Connection
reset by peer" and "broken pipe" messages. The latter two appear
frequently for a few minutes and subside.

This feels like lock contention. What overview method are you using? I
tried to find that in the thread but may have missed it.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Jesse Rehmer on Fri Jun 24 10:56:11 2022

Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

My apologies, I did leave out some critical details - I am using tradspool/tradindexed.

Do you by chance have a really large group that would take a long time to expire? It's been a while since I wrote the locking strategy for
tradindexed, but expiring a group locks it for write and receiving an
article in the same group will also attempt to lock it for write, and innd
is single-threaded. So if you receive an article for a group while that
group is being expired, innd will wait for the lock until the expiration
of that group finishes. If that group is extremely large, this could well
take ten minutes.

Additional observations:

- While innd is unresponsive, nnrpd was happily serving clients and the
only indicator something was amiss from the client perspective is
posting would fail with a timeout reported by the client.

Right, only innd would wait for the lock, so it would fail to spawn new
nnrpd processes if you use the single port model, and nnrpd can't POST
because it hands articles off to innd, but nnrpd doesn't care (and indeed
won't even lock the group overview for read unless it sees that the
overview data has been replaced, which only happens at the end of
expiration and takes a much shorter length of time).

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to 87a6a2gd38.fsf@hope.eyrie.org on Fri Jun 24 18:12:10 2022

On Jun 24, 2022 at 12:56:11 PM CDT, "Russ Allbery" in <87a6a2gd38.fsf@hope.eyrie.org> wrote:

Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

My apologies, I did leave out some critical details - I am using
tradspool/tradindexed.

Do you by chance have a really large group that would take a long time to expire? It's been a while since I wrote the locking strategy for tradindexed, but expiring a group locks it for write and receiving an
article in the same group will also attempt to lock it for write, and innd
is single-threaded. So if you receive an article for a group while that group is being expired, innd will wait for the lock until the expiration
of that group finishes. If that group is extremely large, this could well take ten minutes.

I injected articles from 2003-present for almost all of the Big8 groups, so many groups have hundreds of thousands of articles.

I will admit, it has been 10+ years since I've used INN with a spool
containing this many articles, but long ago helped another admin move to a new server. That box was receiving a full feed, I don't recall the number of articles in the spool, but it was several terabytes (my size is ~150GB for comparison). When we ran expireover on that spool the average run time was 6-8 hours and do not recall experiencing any issues like I am seeing. My
expireover run on a fraction of the amount of data and on faster hardware was ~14 hours. Is that expected?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to 87edzegi5g.fsf@hope.eyrie.org on Fri Jun 24 17:18:24 2022

On Jun 24, 2022 at 11:06:51 AM CDT, "Russ Allbery" in <87edzegi5g.fsf@hope.eyrie.org> wrote:

Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

There was an 18-minute gap in innd activity according to the news.notice
log where it was completely unresponsive (no response on port 119, no
response to ctlinnd commands, etc.). During this period truss did not
output anything for innd or its children (unfortunately I wasn't
attached when the condition began). When things sprang to life what I
saw with truss looked like history or overview read/writes and articles
started flowing.

What I'm seeing in the logs is the period of unresponsiveness ends with
innd logging its stats. There are some interesting entries after that,
such as "SERVER cant sendto CCreader" and I have a lot of "Connection
reset by peer" and "broken pipe" messages. The latter two appear
frequently for a few minutes and subside.

This feels like lock contention. What overview method are you using? I tried to find that in the thread but may have missed it.

I agree, but unfortunately lack the technical knowledge to understand better why this would be happening, and what if anything I could do to resolve it.

My apologies, I did leave out some critical details - I am using tradspool/tradindexed. Additional observations:

- While innd is unresponsive, nnrpd was happily serving clients and the only indicator something was amiss from the client perspective is posting would
fail with
a timeout reported by the client.
- Article flow from the upstream feeder was at it's lowest point during this period.
- The problem resolved as expireover was nearing completion based on these timestamps:

The last connection issue reported from the upstream feeder:

Jun 24 06:25:44 feed1 innfeed[91878]: <spool1.usenet.blueworldhosting.com>:1 cxnsleep non-responsive connection

Expire output:

Expire messages:
expireover start Thu Jun 23 15:39:00 CDT 2022: ( -z/var/log/news/expire.rm -Z/var/log/news/expire.lowmark)
Article lines processed 53326142
Articles dropped 56726
Overview index dropped 57054
expireover end Fri Jun 24 07:06:07 CDT 2022
lowmarkrenumber begin Fri Jun 24 07:06:07 CDT 2022: (/var/log/news/expire.lowmark)
lowmarkrenumber end Fri Jun 24 07:06:07 CDT 2022
expirerm start Fri Jun 24 07:06:07 CDT 2022
expirerm end Fri Jun 24 07:09:14 CDT 2022
expire begin Fri Jun 24 07:09:44 CDT 2022: (-v1)
Article lines processed 32835813
Articles retained 32805948
Entries expired 29865
expire end Fri Jun 24 08:52:30 CDT 2022
all done Fri Jun 24 08:52:30 CDT 2022

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to 9110422123.e5d78522@freebsd-inject1 on Fri Jun 24 17:51:00 2022

On Jun 24, 2022 at 12:18:24 PM CDT, "Jesse Rehmer" in <9110422123.e5d78522@freebsd-inject1.usenet.blueworldhosting.com> wrote:

I agree, but unfortunately lack the technical knowledge to understand better why this would be happening, and what if anything I could do to resolve it.

My apologies, I did leave out some critical details - I am using tradspool/tradindexed. Additional observations:

- While innd is unresponsive, nnrpd was happily serving clients and the only indicator something was amiss from the client perspective is posting would fail with
a timeout reported by the client.
- Article flow from the upstream feeder was at it's lowest point during this period.
- The problem resolved as expireover was nearing completion based on these timestamps:

The last connection issue reported from the upstream feeder:

Jun 24 06:25:44 feed1 innfeed[91878]: <spool1.usenet.blueworldhosting.com>:1 cxnsleep non-responsive connection

Expire output:

Expire messages:
expireover start Thu Jun 23 15:39:00 CDT 2022: ( -z/var/log/news/expire.rm -Z/var/log/news/expire.lowmark)
Article lines processed 53326142
Articles dropped 56726
Overview index dropped 57054
expireover end Fri Jun 24 07:06:07 CDT 2022
lowmarkrenumber begin Fri Jun 24 07:06:07 CDT 2022: (/var/log/news/expire.lowmark)
lowmarkrenumber end Fri Jun 24 07:06:07 CDT 2022
expirerm start Fri Jun 24 07:06:07 CDT 2022
expirerm end Fri Jun 24 07:09:14 CDT 2022
expire begin Fri Jun 24 07:09:44 CDT 2022: (-v1)
Article lines processed 32835813
Articles retained 32805948
Entries expired 29865
expire end Fri Jun 24 08:52:30 CDT 2022
all done Fri Jun 24 08:52:30 CDT 2022

Did some more poking around, and this may just be coincidental, but found errors from pyClean in /usr/local/news/pyclean/log/traceback. The timestamp of the traceback file coincides roughly with when I noticed the issue, but there are no timestamps within the individual traceback messages.

-rw-rw-r-- 1 news news 2097 Jun 23 21:11 traceback

root@spool1:/usr/local/news/pyclean/log # cat traceback
Traceback (most recent call last):
File "/usr/local/news/bin/filter/filter_innd.py", line 324, in filter_art
return self.pyfilter.filter(art)
File "/usr/local/news/bin/filter/filter_innd.py", line 702, in filter
if art[Lines] and int(art[Lines]) != int(art[__LINES__]):
ValueError: invalid literal for int() with base 10: "20\r\n AT&T Launches Free Wi-Fi Service Throughout August in Select\r\n McDonald's Outlets;
$2.99 a Day Thereafter\r\n - Jul 29, 2003 07:00 AM (PR Newswire)"
Traceback (most recent call last):
File "/usr/local/news/bin/filter/filter_innd.py", line 324, in filter_art
return self.pyfilter.filter(art)
File "/usr/local/news/bin/filter/filter_innd.py", line 1001, in filter
if not self.bin_allowed:
AttributeError: Filter instance has no attribute 'bin_allowed'
Traceback (most recent call last):
File "/usr/local/news/bin/filter/filter_innd.py", line 324, in filter_art
return self.pyfilter.filter(art)
File "/usr/local/news/bin/filter/filter_innd.py", line 1001, in filter
if not self.bin_allowed:
AttributeError: Filter instance has no attribute 'bin_allowed'
Traceback (most recent call last):
File "/usr/local/news/bin/filter/filter_innd.py", line 324, in filter_art
return self.pyfilter.filter(art)
File "/usr/local/news/bin/filter/filter_innd.py", line 1001, in filter
if not self.bin_allowed:
AttributeError: Filter instance has no attribute 'bin_allowed'
Traceback (most recent call last):
File "/usr/local/news/bin/filter/filter_innd.py", line 324, in filter_art
return self.pyfilter.filter(art)
File "/usr/local/news/bin/filter/filter_innd.py", line 702, in filter
if art[Lines] and int(art[Lines]) != int(art[__LINES__]):
ValueError: invalid literal for int() with base 10: 'l54'
Traceback (most recent call last):
File "/usr/local/news/bin/filter/filter_innd.py", line 324, in filter_art
return self.pyfilter.filter(art)
File "/usr/local/news/bin/filter/filter_innd.py", line 1001, in filter
if not self.bin_allowed:
AttributeError: Filter instance has no attribute 'bin_allowed'

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jun 24 21:08:48 2022

Hi Jesse,

As a follow-up, would you recommend a different overview storage mechanism?

Incidentally, I was planning to make a video about exactly that subject: explaining how to set up each of the overview storage mechanisms, and
comparing them.
I would recommend either tradindexed or ovsqlite.
Nonetheless, I do not have a spool as huge as yours. It seems that performances are not satisfactory with 53 millions of overview lines :-/

My understanding is that buffindexed would probably make my situation worse

Yes, do *not* use it in your situation!

ovdb is a non-starter for me, and ovsqlite has *serious* performance issues. I started building this box with the intent to use ovsqlite, but when I started injecting articles the throughput rate was abysmal.

Did you try to increase transrowlimit when injecting your articles?

In case it could help, here are the results of the benchmark I performed
last week with 3,713,040 overview lines on a VPS with 2 vCore of 2,4 GHz
and 4 GB of RAM.

Duration of expireover
----------------------

ovdb -> 0:14
compressed ovdb -> 0:21
tradindexed -> 0:25
ovsqlite -> 0:27
compressed ovsqlite -> 0:44
buffindexed -> 1:21

Duration of an overview rebuild
-------------------------------

tradindexed -> 1:30
ovdb -> 2:16
buffindexed -> 2:25
compressed ovdb -> 3:36
ovsqlite -> 3:48
compressed ovsqlite -> 5:31

Yes I agree ovsqlite (version 3.34.1 here) is slow when adding new
articles, but quite fast at expiry (and I do not think it will lock any
table).

--
Julien ÉLIE

« Les amis de la vérité sont ceux qui la cherchent, et non ceux qui se
vantent de l'avoir trouvée. » (Condorcet)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to 9433024807.7c4bdc6e@freebsd-inject1 on Fri Jun 24 18:48:25 2022

On Jun 24, 2022 at 1:12:10 PM CDT, "Jesse Rehmer" in <9433024807.7c4bdc6e@freebsd-inject1.usenet.blueworldhosting.com> wrote:

On Jun 24, 2022 at 12:56:11 PM CDT, "Russ Allbery" in <87a6a2gd38.fsf@hope.eyrie.org> wrote:

Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

My apologies, I did leave out some critical details - I am using
tradspool/tradindexed.

Do you by chance have a really large group that would take a long time to
expire? It's been a while since I wrote the locking strategy for
tradindexed, but expiring a group locks it for write and receiving an
article in the same group will also attempt to lock it for write, and innd >> is single-threaded. So if you receive an article for a group while that
group is being expired, innd will wait for the lock until the expiration
of that group finishes. If that group is extremely large, this could well >> take ten minutes.

I injected articles from 2003-present for almost all of the Big8 groups, so many groups have hundreds of thousands of articles.

I will admit, it has been 10+ years since I've used INN with a spool containing this many articles, but long ago helped another admin move to a new
server. That box was receiving a full feed, I don't recall the number of articles in the spool, but it was several terabytes (my size is ~150GB for comparison). When we ran expireover on that spool the average run time was 6-8
hours and do not recall experiencing any issues like I am seeing. My expireover run on a fraction of the amount of data and on faster hardware was ~14 hours. Is that expected?

As a follow-up, would you recommend a different overview storage mechanism?
My understanding is that buffindexed would probably make my situation worse, ovdb is a non-starter for me, and ovsqlite has *serious* performance issues.
I started building this box with the intent to use ovsqlite, but when I
started injecting articles the throughput rate was abysmal.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to Jesse Rehmer on Fri Jun 24 11:59:13 2022

Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

"Russ Allbery" in <87a6a2gd38.fsf@hope.eyrie.org> wrote:

Do you by chance have a really large group that would take a long time
to expire? It's been a while since I wrote the locking strategy for
tradindexed, but expiring a group locks it for write and receiving an
article in the same group will also attempt to lock it for write, and
innd is single-threaded. So if you receive an article for a group
while that group is being expired, innd will wait for the lock until
the expiration of that group finishes. If that group is extremely
large, this could well take ten minutes.

I injected articles from 2003-present for almost all of the Big8 groups,
so many groups have hundreds of thousands of articles.

I'm not sure hundreds of thousands would be enough to take that long.
Maybe? It's been a long time since I've benchmarked this stuff, and never
with modern hardware. It feels more likely to me if you have a single
group with tens of millions of articles in it.

I will admit, it has been 10+ years since I've used INN with a spool containing this many articles, but long ago helped another admin move to
a new server. That box was receiving a full feed, I don't recall the
number of articles in the spool, but it was several terabytes (my size
is ~150GB for comparison). When we ran expireover on that spool the
average run time was 6-8 hours and do not recall experiencing any issues
like I am seeing. My expireover run on a fraction of the amount of data
and on faster hardware was ~14 hours. Is that expected?

Were you using tradindexed before I rewrote it to make it safer, maybe?

You could try ovsqlite, which will have much finer-grained locking rather
than the all-or-nothing approach that tradindexed uses (and kind of has to
use due to its structure).

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jun 24 21:24:51 2022

Hi Russ,

Do you by chance have a really large group that would take a long time to expire? It's been a while since I wrote the locking strategy for tradindexed, but expiring a group locks it for write and receiving an
article in the same group will also attempt to lock it for write, and innd
is single-threaded. So if you receive an article for a group while that group is being expired, innd will wait for the lock until the expiration
of that group finishes. If that group is extremely large, this could well take ten minutes.

I've not looked at the complexity but couldn't the expiration of the
overview of a newsgroup be done in bulks of 100,000 articles for instance?
Lock the group, process the first 100,000 overview lines (and retain
that n lines have been dropped), release the lock, try to acquire a new
lock and process the 100,000 next lines from line (100,000 - n), etc.

--
Julien ÉLIE

« Le café est un breuvage qui fait dormir quand on n'en prend pas. »
(Alphonse Allais)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to t95241$kphs$1@news.trigofacile.com on Fri Jun 24 19:22:27 2022

On Jun 24, 2022 at 2:08:48 PM CDT, "Julien ÉLIE" in <t95241$kphs$1@news.trigofacile.com> wrote:

Hi Jesse,

As a follow-up, would you recommend a different overview storage mechanism?

Incidentally, I was planning to make a video about exactly that subject: explaining how to set up each of the overview storage mechanisms, and comparing them.
I would recommend either tradindexed or ovsqlite.
Nonetheless, I do not have a spool as huge as yours. It seems that performances are not satisfactory with 53 millions of overview lines :-/

And I'm only about half-way finished injecting articles... :)

ovdb is a non-starter for me, and ovsqlite has *serious* performance issues. >> I started building this box with the intent to use ovsqlite, but when I
started injecting articles the throughput rate was abysmal.

Did you try to increase transrowlimit when injecting your articles?

I have zero sqlite knowledge, did glance at the options in ovsqlite.conf, but lack knowledge how to best tune these values; so hastily switched to tradindexed as I knew I would get the throughput expected with no tinkering.

If you have suggestions for tuning, I'm open to giving ovsqlite another try, especially after seeing your benchmark results for expiry and rebuild.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jun 24 21:37:03 2022

Hi Jesse,

*:A:1:never:never
*.jobs*:A:1:90:90
*.test:A:1:300:300
news.lists.filters:A:1:10:10

You may also want to expire control.cancel sooner... It may get very
large...

To sort your newsgroups by increasing article counts, run:

% getlist -R counts | sort -n -k 4

The fourth field contains the number of articles.

--
Julien ÉLIE

« Le café est un breuvage qui fait dormir quand on n'en prend pas. »
(Alphonse Allais)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to t953ov$kqir$1@news.trigofacile.com on Fri Jun 24 19:53:14 2022

On Jun 24, 2022 at 2:37:03 PM CDT, "Julien ÉLIE" in <t953ov$kqir$1@news.trigofacile.com> wrote:

Hi Jesse,

*:A:1:never:never
*.jobs*:A:1:90:90
*.test:A:1:300:300
news.lists.filters:A:1:10:10

You may also want to expire control.cancel sooner... It may get very large...

To sort your newsgroups by increasing article counts, run:

% getlist -R counts | sort -n -k 4

The fourth field contains the number of articles.

Thanks for the reminder about adding control.cancel... Right now it shouldn't be an issue:

[news@spool1 ~]$ getlist -R counts | sort -n -k 4 | grep control.cancel control.cancel 15447 1 15447 n

However, I can see where ignoring could cause problems in the future. Do you also recommend "control"?

Somehow I've missed the getlist utility all these years... pretty spiffy.
Looks like I have a few groups that are indeed larger than even I expected. I haven't paid attention to the high/low counts and whatnot since beginning the injection project.

---------------
rec.outdoors.rv-travel 355495 1 355482 y
alt.folklore.computers 377199 1 377133 y
talk.politics.guns 417832 1 417766 y
comp.sys.mac.system 443025 1 442909 y
alt.politics 460595 1 460585 y
soc.culture.polish 515936 1 515825 y
comp.lang.c 519799 1 517421 y
soc.culture.jewish 521143 1 520973 y
alt.fan.rush-limbaugh 529383 1 529297 y
comp.sys.mac.advocacy 540625 1 540607 y
rec.crafts.metalworking 585043 1 585042 y
rec.arts.sf.written 632786 1 632665 y
soc.culture.usa 671665 1 671598 y
soc.culture.israel 749617 1 749617 y
alt.atheism 769678 1 769661 y
rec.sport.pro-wrestling 777234 1 777234 y
rec.arts.drwho 798601 1 798600 y
rec.sport.football.college 828057 1 828057 y
rec.sport.tennis 849759 1 849759 y
rec.arts.tv 1367076 1 1366870 y
---------------

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Russ Allbery@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri Jun 24 12:57:26 2022

Julien ÉLIE <iulius@nom-de-mon-site.com.invalid> writes:

I've not looked at the complexity but couldn't the expiration of the
overview of a newsgroup be done in bulks of 100,000 articles for
instance? Lock the group, process the first 100,000 overview lines (and retain that n lines have been dropped), release the lock, try to acquire
a new lock and process the 100,000 next lines from line (100,000 - n),
etc.

The basic problem is that the expire process involves rewriting the data
file, which invalidates all of the offset information stored in the index. That's why it holds a write lock on the index while expiring.

Multiple passes would potentially require rewriting the data file and
updating the offsets multiple times, which I would worry would cause other performance issues. (Back when I was testing this, I think expiration was mostly I/O bound and writing the data file was the expensive bit.)

I can think of a couple of possible approaches:

* Store articles that arrive in groups being expired but queue the
overview information and store it later once the lock is released. This
would require some complex state tracking in innd that may be tricky to
get right and not lose data.

* Perform the bulk of the expiration without holding a lock, and then take
the lock at the end, find only the articles that have come in during
expiration, and add them to the new data file before replacing the data
file. I think this would work and is probably a simpler place to put
the complexity, but it will require some careful design to make sure
there are no race conditions and we can find all the articles that came
in after expiry started.

Deferring the article back to the peer probably won't work since not all
peers honor deferrals and I think the article is already stored before we
try to write overview.

Using a proper database fixes this problem correctly (but obviously at the
risk of introducing other issues). In an ideal world, it would be nice to
have one overview mechanism that's good enough we could just have everyone
use it.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to 87sfntg7h5.fsf@hope.eyrie.org on Fri Jun 24 20:08:13 2022

On Jun 24, 2022 at 2:57:26 PM CDT, "Russ Allbery" in <87sfntg7h5.fsf@hope.eyrie.org> wrote:

Deferring the article back to the peer probably won't work since not all peers honor deferrals and I think the article is already stored before we
try to write overview.

I'm not a developer by any means, but based on what I see with truss, I
believe this statement to be correct.

sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
read(282,"fNCus+94CPtlu8N1+G\r\n\tx9e9mKrN"...,7223) = 2752 (0xac0) msync(0x808e00000,2101248,MS_ASYNC) = 0 (0x0) openat(AT_FDCWD,"/usr/local/news/spool/articles/talk/politics/guns/417838",O_ WRONLY|O_CREAT|O_EXCL,0664) = 109 (0x6d)
writev(109,[{"Sender: Nomen Nescio <nobody@diz"...,2936},{"news.usenet.blueworldhosting.com"...,33},{"spool1.use net.blueworldhosting.c"...,35},{"usenet.blueworldhosting.com!feed"...,133},{" spool1.usenet.blueworldhosting.c"...,172},{"\r\n\r\nIn article <a90a66d9-444"...,606}],6) = 3915 (0xf4b)
close(109) = 0 (0x0) link("/usr/local/news/spool/articles/talk/politics/guns/417838","/usr/local/n ews/spool/articles/alt/politics/obama/143985") = 0 (0x0) link("/usr/local/news/spool/articles/talk/politics/guns/417838","/usr/local/n ews/spool/articles/alt/politics/nationalism/white/5158") = 0 (0x0) link("/usr/local/news/spool/articles/talk/politics/guns/417838","/usr/local/n ews/spool/articles/rec/arts/tv/1367080") = 0 (0x0) link("/usr/local/news/spool/articles/talk/politics/guns/417838","/usr/local/n ews/spool/articles/alt/politics/democrats/176951") = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0)
write(173,"417838\tRe: Do you support laws "...,443) = 443 (0x1bb) lseek(173,0x0,SEEK_CUR) = 290699959 (0x1153bab7) pwrite(165,"\M-|\M-8S\^Q\0\0\0\0\M-;\^A\0\0"...,56,0x16509d8) = 56 (0x38) msync(0x802d64000,4096,MS_ASYNC) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0)
write(254,"143985\tRe: Do you support laws "...,443) = 443 (0x1bb) lseek(254,0x0,SEEK_CUR) = 82669496 (0x4ed6fb8) pwrite(182,"\M-}m\M-m\^D\0\0\0\0\M-;\^A\0\0"...,56,0x7b0880) = 56 (0x38) msync(0x802b31000,4096,MS_ASYNC) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0) openat(AT_FDCWD,"/usr/local/news/spool/overview/a/p/n/w/alt.politics.national ism.white.IDX",O_RDWR|O_CREAT,0664) = 109 (0x6d)
fstat(109,{ mode=-rw-rw-r-- ,inode=34168327,size=288792,blksize=131072 }) = 0 (0x0)
fcntl(109,F_GETFD,) = 0 (0x0)
fcntl(109,F_SETFD,FD_CLOEXEC) = 0 (0x0) openat(AT_FDCWD,"/usr/local/news/spool/overview/a/p/n/w/alt.politics.national ism.white.DAT",O_RDWR|O_APPEND|O_CREAT,0664) = 113 (0x71)
fcntl(113,F_GETFD,) = 0 (0x0)
fcntl(113,F_SETFD,FD_CLOEXEC) = 0 (0x0)
close(35) = 0 (0x0)
close(86) = 0 (0x0)
fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0)
write(113,"5158\tRe: Do you support laws th"...,441) = 441 (0x1b9) lseek(113,0x0,SEEK_CUR) = 3775657 (0x399ca9) pwrite(109,"\M-p\M^Z9\0\0\0\0\0\M-9\^A\0\0\0"...,56,0x46818) = 56 (0x38) msync(0x802b31000,4096,MS_ASYNC) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0)
write(211,"1367080\tRe: Do you support laws"...,444) = 444 (0x1bc) lseek(211,0x0,SEEK_CUR) = 643936438 (0x2661b0b6) pwrite(57,"\M-z\M-.a&\0\0\0\0\M-<\^A\0\0\0"...,56,0x4902888) = 56 (0x38) msync(0x802d32000,4096,MS_ASYNC) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0)
write(229,"176951\tRe: Do you support laws "...,443) = 443 (0x1bb) lseek(229,0x0,SEEK_CUR) = 107740676 (0x66bfe04) pwrite(9,"I\M-|k\^F\0\0\0\0\M-;\^A\0\0\0\0"...,56,0x9733d0) = 56 (0x38) msync(0x802b30000,4096,MS_ASYNC) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0) write(5,"[F274954F241C01B17699DD5A2DDD219"...,98) = 98 (0x62) msync(0x80de00000,398255200,MS_ASYNC) = 0 (0x0) msync(0x825a00000,298691400,MS_ASYNC) = 0 (0x0)
lseek(7,0x0,SEEK_SET) = 0 (0x0)
write(7,"dbz 6 49781900 14 66\n32810510 3"...,111) = 111 (0x6f)
write(1,"Jun 24 15:05:02.205 + feed1.usen"...,100) = 100 (0x64)
write(1," 3912 inpaths!\n",15) = 15 (0xf) write(112,"news.usenet.blueworldhosting.com"...,194) = 194 (0xc2) write(282,"239 <ed02c4f9d54f3b99bfe0840ae0c"...,50) = 50 (0x32) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to All on Fri Jun 24 20:14:58 2022

On Jun 24, 2022 at 3:08:13 PM CDT, "Jesse Rehmer" in <0129301868.eccd854a@freebsd-inject1.usenet.blueworldhosting.com> wrote:

Argh... I pasted the wrong snippet, this is better, I think:

sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(283,{ 12 14 15 17 282 },{ },0x0,{ 449.000000 }) = 1 (0x1) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
read(282,"dhosting.com junk:1181312\r\n\r"...,7223) = 2286 (0x8ee) openat(AT_FDCWD,"/usr/local/news/spool/articles/uk/media/radio/archers/2289", O_WRONLY|O_CREAT|O_EXCL,0664) = 232 (0xe8)
writev(232,[{"Path: ",6},{"news.usenet.blueworldhosting.com"...,33},{"spool1.usenet.blueworldhost ing.c"...,35},{"usenet.blueworldhosting.com!feed"...,942},{"spool1.usenet.blu eworldhosting.c"...,62},{"\r\n\r\nOn Fri, 24 Jun 2022 10:0"...,2261}],6) =
3339 (0xd0b)
close(232) = 0 (0x0) openat(AT_FDCWD,"/usr/local/news/spool/overview/u/m/r/a/uk.media.radio.archer s.IDX",O_RDWR|O_CREAT,0664) = 232 (0xe8)
fstat(232,{ mode=-rw-rw-r-- ,inode=5370741,size=128128,blksize=128512 }) = 0 (0x0)
fcntl(232,F_GETFD,) = 0 (0x0)
fcntl(232,F_SETFD,FD_CLOEXEC) = 0 (0x0) openat(AT_FDCWD,"/usr/local/news/spool/overview/u/m/r/a/uk.media.radio.archer s.DAT",O_RDWR|O_APPEND|O_CREAT,0664) = 270 (0x10e)
fcntl(270,F_GETFD,) = 0 (0x0)
fcntl(270,F_SETFD,FD_CLOEXEC) = 0 (0x0)
close(274) = 0 (0x0)
close(283) = 0 (0x0)
fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0)
write(270,"2289\tRe: OT: Boodly Asda\tNick "...,323) = 323 (0x143) lseek(270,0x0,SEEK_CUR) = 898638 (0xdb64e) pwrite(232,"\v\M-5\r\0\0\0\0\0C\^A\0\0\0\0\0"...,56,0x1f480) = 56 (0x38) msync(0x802d7d000,4096,MS_ASYNC) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0) write(5,"[4FAD1BE6FF8B411EF21EBF2100E369F"...,98) = 98 (0x62)
write(1,"Jun 24 15:11:55.536 + feed1.usen"...,100) = 100 (0x64)
write(1," 3336 inpaths!\n",15) = 15 (0xf) write(112,"news.usenet.blueworldhosting.com"...,204) = 204 (0xcc) write(282,"239 <ab6cbhtfltggcj4f9fsp5o1nh2o"...,50) = 50 (0x32) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jun 24 22:39:36 2022

Hi Jesse,

Thanks for the reminder about adding control.cancel... Right now it shouldn't be an issue:

[news@spool1 ~]$ getlist -R counts | sort -n -k 4 | grep control.cancel control.cancel 15447 1 15447 n

However, I can see where ignoring could cause problems in the future. Do you also recommend "control"?

I would recommend having control.newgroup, control.rmgroup and control.checkgroups. Then there shouldn't be anything (but crap) in
control.
Traffic is very low in these groups so only expiring control.cancel
should be OK.

Somehow I've missed the getlist utility all these years... pretty spiffy.

:-)

Looks like I have a few groups that are indeed larger than even I expected. I haven't paid attention to the high/low counts and whatnot since beginning the injection project.

rec.arts.tv 1367076 1 1366870 y

Is it your largest group? (It represents about 2,5% of all your articles.)

--
Julien ÉLIE

« Le café est un breuvage qui fait dormir quand on n'en prend pas. »
(Alphonse Allais)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to All on Sat Jun 25 07:52:23 2022

If I change:

*:A:1:never:never

to:

*:A:never:never:never

Would that stop expireover from traversing through all the groups and overview data, except specific groups listed later in expire.ctl?

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sat Jun 25 22:30:51 2022

Hi Jesse,

If I change:

*:A:1:never:never

to:

*:A:never:never:never

Would that stop expireover from traversing through all the groups and overview
data, except specific groups listed later in expire.ctl?

No, expireover has all the same a look at all the overview lines to
check whether the related articles still exist in the spool. (Yet, as
you're using tradspool, this check is in fact not done because it is
considered expansive, unless you give the -s flag to expireover.)

In the source code of tdx_data_expire_start(), there's a note about
exactly the point you are raising:

** There's probably some room for optimization here for newsgroups that
** don't expire so that the files don't have to be rewritten, or
** newsgroups where all the data at the end of the file is still good
** and just needs to be moved as-is.

However, you may want to run expireover only on a subset of newsgroups,
like you say. Instead of running it on all your newsgroups, you may
want to run "news.daily noexpireover" (plus the usual flags you're
using), and a few individual overview expiry:

grep -E 'news.lists.filters |\.jobs|\.test ' <pathdb>/active \
| expireover -f - -Z <pathtmp>/lowmark
ctlinnd lowmark <pathtmp>/lowmark
rm -f <pathtmp>/lowmark

The grep command extracts the newsgroups from your active file.

(And why not expire the others monthly, and not all of them the same day?)
Just ideas.

--
Julien ÉLIE

« Le café est un breuvage qui fait dormir quand on n'en prend pas. »
(Alphonse Allais)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to t95241$kphs$1@news.trigofacile.com on Fri Jul 1 00:27:31 2022

On Jun 24, 2022 at 2:08:48 PM CDT, "Julien ÉLIE" in <t95241$kphs$1@news.trigofacile.com> wrote:

ovdb is a non-starter for me, and ovsqlite has *serious* performance issues. >> I started building this box with the intent to use ovsqlite, but when I
started injecting articles the throughput rate was abysmal.

Did you try to increase transrowlimit when injecting your articles?

Hi Julien - I am going to give ovsqlite another try. I was reading through the man page and configuration file, paying attention to this recommendation:

A transaction occurs every transrowlimit articles or transtimelimit
seconds, whichever is smaller. You are encouraged to keep the default
value for row limits and, instead, adjust the time limit according to
how many articles your news server usually accepts per second during
normal operation (you can find statistics about incoming articles in
your daily Usenet reports).
Inserting or deleting a database row within a transaction is very fast
whereas committing a transaction is slow, especially on rotating
storage. Setting transaction limits too low leads to poor
performance. When rebuilding overview data, it may be worth
temporarily raising these values, though.

Do you recommend increasing both transrowlimit and transtimelimit? I am also looking at the pagesize and cachesize parameters.

Thanks,

Jesse

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sun Jul 3 23:32:05 2022

Hi Jesse,

I am going to give ovsqlite another try. I was reading through the
man page and configuration file, paying attention to this recommendation:

A transaction occurs every transrowlimit articles or transtimelimit
seconds, whichever is smaller. You are encouraged to keep the default
value for row limits and, instead, adjust the time limit according to
how many articles your news server usually accepts per second during
normal operation (you can find statistics about incoming articles in
your daily Usenet reports).
Inserting or deleting a database row within a transaction is very fast whereas committing a transaction is slow, especially on rotating
storage. Setting transaction limits too low leads to poor
performance. When rebuilding overview data, it may be worth
temporarily raising these values, though.

Do you recommend increasing both transrowlimit and transtimelimit? I am also looking at the pagesize and cachesize parameters.

When testing on my news server, which is a relatively fast VPS, I did
not notice any change with a x10 to transrowlimit or transtimelimit
during an overview rebuild.
This recommendation comes from the implementer of ovsqlite, and I
believe it may be useful on slower systems (with rotating hard drives or
slow CPU).

I kept the defaults for pagesize and cachesize. I'm not an SQLite
expert, and ovsqlite is fast enough in my testing news server. Of
course if you find out tweaks that make it even faster in your news
server, feel free to share and I'll be happy to add them in our
documentation for other users.

--
Julien ÉLIE

« La médecine est un métier dangereux. Ceux qui ne meurent pas peuvent
vous faire un procès. » (Coluche)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	71:32:43
Calls:	6,712
Files:	12,244
Messages:	5,356,970

INN odd behavior during expireover?

Who's Online

System Info