• INN odd behavior during expireover?

    From Jesse Rehmer@21:1/5 to All on Fri Jun 24 04:05:31 2022
    For the past week I've been pull articles into a news server running FreeBSD and INN 2.7.0rc1, and got to a stopping point after adding a few hundred gigabytes to the spool, so I decided to run news.daily since I hadn't since this endevor began.

    I understand during certain portions of the news.daily process the server gets paused. However, I am noticing very strange behavior when the server claims it is not paused (according to logs), but while expireover is running.

    INN is unresponsive to ctlinnd commands for ~5 minutes and suddenly becomes responsive, the upstream innfeed process (on another VM) reports a non-responsive connection during these periods. NNRPd is responing normally
    for the most part, I can AUTH, issue GROUP, BODY, and similar commands, but when I issue a POST I get no response from the server until my connection
    times out.

    I can't find anything useful in the logs as to what is happening while INN is not responding, nor do I see any other underlying OS or I/O issue. The expireover process is using 100% of one CPU core, and quite a bit of I/O, but not enough that should cause this kind of unresponsiveness. The OS is responsive and other applications/daemons on the box respond as expected so I don't see a server-wide performance problem.

    I've put this server under taxing I/O load while injecting articles over the last and never saw this behavior until expireover had been running for a
    little over 4 hours. Any ideas?

    top:
    last pid: 53975; load averages: 1.22, 1.20, 1.15

    up 4+20:06:35 22:59:56
    75 processes: 2 running, 73 sleeping
    CPU: 2.1% user, 0.0% nice, 10.6% system, 0.0% interrupt, 87.3% idle
    Mem: 160M Active, 6103M Inact, 62M Laundry, 8652M Wired, 905M Free
    ARC: 5910M Total, 3785M MFU, 1455M MRU, 912K Anon, 91M Header, 578M Other
    3725M Compressed, 8526M Uncompressed, 2.29:1 Ratio
    Swap: 2048M Total, 2048M Free

    PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
    8478 news 1 103 0 66M 42M CPU5 5 434:17 99.86% expireover 55548 root 1 20 0 32M 8916K select 3 3:45 0.09% snmpd 53975 root 1 20 0 14M 3440K CPU3 3 0:00 0.05% top 31561 root 4 20 0 49M 18M select 0 6:12 0.05% vmtoolsd
    53913 news 1 24 4 37M 16M sbwait 6 0:00 0.03% nnrpd 29975 root 1 20 0 13M 2292K select 1 4:26 0.01% syslogd -----

    iostat:
    root@spool1:/var/log # iostat
    tty da0 cd0 pass0 cpu
    tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
    1 728 43.4 222 9.4 17.2 0 0.0 0.0 0 0.0 6 0 6 0 88 -----

    systat:
    /0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
    Load Average ||||||

    /0% /10 /20 /30 /40 /50 /60 /70 /80 /90 /100
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    news expireover XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXX
    -----

    Cheers,
    Jesse

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to jesse.rehmer@blueworldhosting.com on Fri Jun 24 04:42:39 2022
    On Jun 23, 2022 at 11:05:31 PM CDT, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:
    <snip>
    I've put this server under taxing I/O load while injecting articles over the last and never saw this behavior until expireover had been running for a little over 4 hours. Any ideas?
    <snip>

    So after killing the few nnrpd processes that were running off, the
    performance issue has resolved. There were only two active processes, my reader, and one from UZANTO-RETO (<https://www.uzantoreto.com/>) who scans my server. They make slow requests and were not hammering the box, and my connection is idle. The expireover process is still running same as before,
    but article delivery and other commands are good.

    As soon as I start NNRPd and make a connection the problem returns even though I am doing nothing. WTH.

    -Jesse

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From The Doctor@21:1/5 to jesse.rehmer@blueworldhosting.com on Fri Jun 24 04:15:44 2022
    In article <4353171044.279cb5a2@freebsd-inject1.usenet.blueworldhosting.com>, Jesse Rehmer <jesse.rehmer@blueworldhosting.com> wrote:
    For the past week I've been pull articles into a news server running FreeBSD >and INN 2.7.0rc1, and got to a stopping point after adding a few hundred >gigabytes to the spool, so I decided to run news.daily since I hadn't since >this endevor began.

    I understand during certain portions of the news.daily process the server gets >paused. However, I am noticing very strange behavior when the server claims it >is not paused (according to logs), but while expireover is running.

    INN is unresponsive to ctlinnd commands for ~5 minutes and suddenly becomes >responsive, the upstream innfeed process (on another VM) reports a >non-responsive connection during these periods. NNRPd is responing normally >for the most part, I can AUTH, issue GROUP, BODY, and similar commands, but >when I issue a POST I get no response from the server until my connection >times out.

    I can't find anything useful in the logs as to what is happening while INN is >not responding, nor do I see any other underlying OS or I/O issue. The >expireover process is using 100% of one CPU core, and quite a bit of I/O, but >not enough that should cause this kind of unresponsiveness. The OS is >responsive and other applications/daemons on the box respond as expected so I >don't see a server-wide performance problem.

    I've put this server under taxing I/O load while injecting articles over the >last and never saw this behavior until expireover had been running for a >little over 4 hours. Any ideas?

    top:
    last pid: 53975; load averages: 1.22, 1.20, 1.15

    up 4+20:06:35 22:59:56
    75 processes: 2 running, 73 sleeping
    CPU: 2.1% user, 0.0% nice, 10.6% system, 0.0% interrupt, 87.3% idle
    Mem: 160M Active, 6103M Inact, 62M Laundry, 8652M Wired, 905M Free
    ARC: 5910M Total, 3785M MFU, 1455M MRU, 912K Anon, 91M Header, 578M Other
    3725M Compressed, 8526M Uncompressed, 2.29:1 Ratio
    Swap: 2048M Total, 2048M Free

    PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 8478 news 1 103 0 66M 42M CPU5 5 434:17 99.86% expireover
    55548 root 1 20 0 32M 8916K select 3 3:45 0.09% snmpd >53975 root 1 20 0 14M 3440K CPU3 3 0:00 0.05% top >31561 root 4 20 0 49M 18M select 0 6:12 0.05% vmtoolsd
    53913 news 1 24 4 37M 16M sbwait 6 0:00 0.03% nnrpd >29975 root 1 20 0 13M 2292K select 1 4:26 0.01% syslogd >-----

    iostat:
    root@spool1:/var/log # iostat
    tty da0 cd0 pass0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
    1 728 43.4 222 9.4 17.2 0 0.0 0.0 0 0.0 6 0 6 0 88
    -----

    systat:
    /0 /1 /2 /3 /4 /5 /6 /7 /8 /9 /10
    Load Average ||||||

    /0% /10 /20 /30 /40 /50 /60 /70 /80 /90 /100
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    news expireover XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    root idle XXXXXXXXXXXXXXX
    -----

    Cheers,
    Jesse

    How is your expire configuration configured?
    --
    Member - Liberal International This is doctor@@nl2k.ab.ca Ici doctor@@nl2k.ab.ca
    Yahweh, Queen & country!Never Satan President Republic!Beware AntiChrist rising!
    Look at Psalms 14 and 53 on Atheism https://www.empire.kred/ROOTNK?t=94a1f39b Saying good things is evil when they are lies. -unknown Beware https://mindspring.com

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to All on Fri Jun 24 05:17:29 2022
    On Jun 23, 2022 at 11:15:44 PM CDT, "The Doctor" <The Doctor> wrote:

    How is your expire configuration configured?

    On this box it is the first time the expire process has been ran since injecting ~52,000,000 articles.

    expire.ctl is pretty simple, though I understand given the history I've
    brought over that the *.jobs* and *.test is going to take a long time.

    *:A:1:never:never
    *.jobs*:A:1:90:90
    *.test:A:1:300:300
    news.lists.filters:A:1:10:10

    I've put this hardware and INN under much heavier load. In tests prior to filling the spool this server has achieved article delivery throughput rates
    of almost 100 articles per second (input stream 600-800Mbps using text and binary articles), and kept up with output of 1.2Gbps to downstream nodes without delay and using far more disk I/O an throughput that I'm seeing occur now while the expireover process is running.

    What I'm seeing now just doesn't make any sense.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to 4575903189.1ff43929@freebsd-inject1 on Fri Jun 24 10:47:01 2022
    On Jun 23, 2022 at 11:42:39 PM CDT, "Jesse Rehmer" in <4575903189.1ff43929@freebsd-inject1.usenet.blueworldhosting.com> wrote:

    On Jun 23, 2022 at 11:05:31 PM CDT, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:
    <snip>
    I've put this server under taxing I/O load while injecting articles over the >> last and never saw this behavior until expireover had been running for a
    little over 4 hours. Any ideas?
    <snip>

    So after killing the few nnrpd processes that were running off, the performance issue has resolved. There were only two active processes, my reader, and one from UZANTO-RETO (<https://www.uzantoreto.com/>) who scans my server. They make slow requests and were not hammering the box, and my connection is idle. The expireover process is still running same as before, but article delivery and other commands are good.

    As soon as I start NNRPd and make a connection the problem returns even though
    I am doing nothing. WTH.

    -Jesse

    There was an 18-minute gap in innd activity according to the news.notice log where it was completely unresponsive (no response on port 119, no response to ctlinnd commands, etc.). During this period truss did not output anything for innd or its children (unfortunately I wasn't attached when the condition began). When things sprang to life what I saw with truss looked like history
    or overview read/writes and articles started flowing.

    What I'm seeing in the logs is the period of unresponsiveness ends with innd logging its stats. There are some interesting entries after that, such as "SERVER cant sendto CCreader" and I have a lot of "Connection reset by peer" and "broken pipe" messages. The latter two appear frequently for a few minutes and subside.

    news.notice:

    Jun 24 05:10:50 spool1 innd[39682]: rejecting[python] <part26of137.j3RKrZ2e0nT3100BTRqM@camelsystem-powerpost.local> 437 Binary (yEnc)
    Jun 24 05:10:50 spool1 innd[39682]: rejecting[python] <ozVqI.399066$DJ2.392163@fx42.iad> 437 Binary (yEnc)
    Jun 24 05:10:51 spool1 innd[39682]: rejecting[python] <ozVqI.216709$Qf2.141390@fx38.iad> 437 Binary (yEnc)
    Jun 24 05:28:51 spool1 innd[39682]: ME time 1106338 hishave 2281(1872) hisgrep 3391(1340) hiswrite 2113(761) hissync 0(0) idle 12369(8902) artclean 7(955) artwrite 1303(761) artcncl 0(0) hisgrep/artcncl 0(0) hishave/artcncl 0(0) artlog/artcncl 0(0) hiswrite/artcncl 0(0) sitesend 16(761) overv 1080211(761) perl 27(761) python 2857(1868) nntpread 135(9724) artparse 182(8378) artlog/artparse 0(0) artlog 36(955) datamove 27(10983)
    Jun 24 05:28:51 spool1 innd[39682]: ME HISstats 0 hitpos 926 hitneg 0 missed 946 dne
    Jun 24 05:28:51 spool1 innd[39682]: ME status seconds 3613 accepted 27217 refused 14 rejected 10902 duplicate 0 accepted size 248355872 duplicate size 0 rejected size 3782013184
    Jun 24 05:28:51 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com> status seconds 3613 accepted 1346 refused 5 rejected 50 duplicate 0 accepted size 124000704 duplicate size 0 rejected size 91707
    Jun 24 05:28:51 spool1 innd[39682]: localhost status seconds 1856 accepted 25871 refused 9 rejected 10852 duplicate 0 accepted size 124355152 duplicate size 0 rejected size 3781921280
    Jun 24 05:28:51 spool1 innd[39682]: ctlinnd command s
    Jun 24 05:28:51 spool1 innd[39682]: SERVER cant sendto CCreader bytes 211 Connection refused
    Jun 24 05:28:51 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com> connected 38 streaming allowed
    Jun 24 05:28:51 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com> connected 77 streaming allowed
    Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com> connected 292 streaming allowed
    Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:30>
    cant
    read: Connection reset by peer
    Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:38>
    cant
    read: Connection reset by peer
    Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:77>
    cant
    read: Connection reset by peer
    Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com> connected 295 streaming allowed
    Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:165> cant
    read: Connection reset by peer
    Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:165> cant
    write: Broken pipe
    Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:165> checkpoint seconds 1131 accepted 6 refused 1 rejected 0 duplicate 0 accepted size 27845 duplicate size 0 rejected size 0
    Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com:165> closed seconds 1131 accepted 6 refused 1 rejected 0 duplicate 0 accepted size 27845 duplicate size 0 rejected size 0
    Jun 24 05:28:52 spool1 innd[39682]: localhost:176 cant read: Connection reset by peer
    Jun 24 05:28:52 spool1 innd[39682]: localhost:176 cant write: Broken pipe
    Jun 24 05:28:52 spool1 innd[39682]: localhost:176 checkpoint seconds 1108 accepted 0 refused 0 rejected 68 duplicate 0 accepted size 0 duplicate size 0 rejected size 27010432
    Jun 24 05:28:52 spool1 innd[39682]: localhost:176 closed seconds 2815 accepted 0 refused 0 rejected 4268 duplicate 0 accepted size 0 duplicate size 0
    rejected size 1689575168
    Jun 24 05:28:52 spool1 innd[39682]: <feed1.usenet.blueworldhosting.com> connected 165 streaming allowed
    Jun 24 05:28:52 spool1 innd[39682]: localhost:188 cant read: Connection reset by peer
    Jun 24 05:28:52 spool1 innd[39682]: localhost:188 cant write: Broken pipe
    Jun 24 05:28:52 spool1 innd[39682]: localhost:188 checkpoint seconds 1089 accepted 44 refused 0 rejected 0 duplicate 0 accepted size 78612 duplicate
    size 0 rejected size 0
    Jun 24 05:28:52 spool1 innd[39682]: localhost:188 closed seconds 2679 accepted 10005 refused 9 rejected 39 duplicate 0 accepted size 24383800 duplicate size
    0 rejected size 441789
    Jun 24 05:28:52 spool1 innd[39682]: localhost:190 cant read: Connection reset by peer
    Jun 24 05:28:52 spool1 innd[39682]: localhost:190 cant write: Broken pipe

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Jesse Rehmer on Fri Jun 24 09:06:51 2022
    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

    There was an 18-minute gap in innd activity according to the news.notice
    log where it was completely unresponsive (no response on port 119, no response to ctlinnd commands, etc.). During this period truss did not
    output anything for innd or its children (unfortunately I wasn't
    attached when the condition began). When things sprang to life what I
    saw with truss looked like history or overview read/writes and articles started flowing.

    What I'm seeing in the logs is the period of unresponsiveness ends with
    innd logging its stats. There are some interesting entries after that,
    such as "SERVER cant sendto CCreader" and I have a lot of "Connection
    reset by peer" and "broken pipe" messages. The latter two appear
    frequently for a few minutes and subside.

    This feels like lock contention. What overview method are you using? I
    tried to find that in the thread but may have missed it.

    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Jesse Rehmer on Fri Jun 24 10:56:11 2022
    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

    My apologies, I did leave out some critical details - I am using tradspool/tradindexed.

    Do you by chance have a really large group that would take a long time to expire? It's been a while since I wrote the locking strategy for
    tradindexed, but expiring a group locks it for write and receiving an
    article in the same group will also attempt to lock it for write, and innd
    is single-threaded. So if you receive an article for a group while that
    group is being expired, innd will wait for the lock until the expiration
    of that group finishes. If that group is extremely large, this could well
    take ten minutes.

    Additional observations:

    - While innd is unresponsive, nnrpd was happily serving clients and the
    only indicator something was amiss from the client perspective is
    posting would fail with a timeout reported by the client.

    Right, only innd would wait for the lock, so it would fail to spawn new
    nnrpd processes if you use the single port model, and nnrpd can't POST
    because it hands articles off to innd, but nnrpd doesn't care (and indeed
    won't even lock the group overview for read unless it sees that the
    overview data has been replaced, which only happens at the end of
    expiration and takes a much shorter length of time).

    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to 87a6a2gd38.fsf@hope.eyrie.org on Fri Jun 24 18:12:10 2022
    On Jun 24, 2022 at 12:56:11 PM CDT, "Russ Allbery" in <87a6a2gd38.fsf@hope.eyrie.org> wrote:

    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

    My apologies, I did leave out some critical details - I am using
    tradspool/tradindexed.

    Do you by chance have a really large group that would take a long time to expire? It's been a while since I wrote the locking strategy for tradindexed, but expiring a group locks it for write and receiving an
    article in the same group will also attempt to lock it for write, and innd
    is single-threaded. So if you receive an article for a group while that group is being expired, innd will wait for the lock until the expiration
    of that group finishes. If that group is extremely large, this could well take ten minutes.

    I injected articles from 2003-present for almost all of the Big8 groups, so many groups have hundreds of thousands of articles.

    I will admit, it has been 10+ years since I've used INN with a spool
    containing this many articles, but long ago helped another admin move to a new server. That box was receiving a full feed, I don't recall the number of articles in the spool, but it was several terabytes (my size is ~150GB for comparison). When we ran expireover on that spool the average run time was 6-8 hours and do not recall experiencing any issues like I am seeing. My
    expireover run on a fraction of the amount of data and on faster hardware was ~14 hours. Is that expected?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to 87edzegi5g.fsf@hope.eyrie.org on Fri Jun 24 17:18:24 2022
    On Jun 24, 2022 at 11:06:51 AM CDT, "Russ Allbery" in <87edzegi5g.fsf@hope.eyrie.org> wrote:

    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

    There was an 18-minute gap in innd activity according to the news.notice
    log where it was completely unresponsive (no response on port 119, no
    response to ctlinnd commands, etc.). During this period truss did not
    output anything for innd or its children (unfortunately I wasn't
    attached when the condition began). When things sprang to life what I
    saw with truss looked like history or overview read/writes and articles
    started flowing.

    What I'm seeing in the logs is the period of unresponsiveness ends with
    innd logging its stats. There are some interesting entries after that,
    such as "SERVER cant sendto CCreader" and I have a lot of "Connection
    reset by peer" and "broken pipe" messages. The latter two appear
    frequently for a few minutes and subside.

    This feels like lock contention. What overview method are you using? I tried to find that in the thread but may have missed it.

    I agree, but unfortunately lack the technical knowledge to understand better why this would be happening, and what if anything I could do to resolve it.

    My apologies, I did leave out some critical details - I am using tradspool/tradindexed. Additional observations:

    - While innd is unresponsive, nnrpd was happily serving clients and the only indicator something was amiss from the client perspective is posting would
    fail with
    a timeout reported by the client.
    - Article flow from the upstream feeder was at it's lowest point during this period.
    - The problem resolved as expireover was nearing completion based on these timestamps:

    The last connection issue reported from the upstream feeder:

    Jun 24 06:25:44 feed1 innfeed[91878]: <spool1.usenet.blueworldhosting.com>:1 cxnsleep non-responsive connection

    Expire output:

    Expire messages:
    expireover start Thu Jun 23 15:39:00 CDT 2022: ( -z/var/log/news/expire.rm -Z/var/log/news/expire.lowmark)
    Article lines processed 53326142
    Articles dropped 56726
    Overview index dropped 57054
    expireover end Fri Jun 24 07:06:07 CDT 2022
    lowmarkrenumber begin Fri Jun 24 07:06:07 CDT 2022: (/var/log/news/expire.lowmark)
    lowmarkrenumber end Fri Jun 24 07:06:07 CDT 2022
    expirerm start Fri Jun 24 07:06:07 CDT 2022
    expirerm end Fri Jun 24 07:09:14 CDT 2022
    expire begin Fri Jun 24 07:09:44 CDT 2022: (-v1)
    Article lines processed 32835813
    Articles retained 32805948
    Entries expired 29865
    expire end Fri Jun 24 08:52:30 CDT 2022
    all done Fri Jun 24 08:52:30 CDT 2022

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to 9110422123.e5d78522@freebsd-inject1 on Fri Jun 24 17:51:00 2022
    On Jun 24, 2022 at 12:18:24 PM CDT, "Jesse Rehmer" in <9110422123.e5d78522@freebsd-inject1.usenet.blueworldhosting.com> wrote:

    I agree, but unfortunately lack the technical knowledge to understand better why this would be happening, and what if anything I could do to resolve it.

    My apologies, I did leave out some critical details - I am using tradspool/tradindexed. Additional observations:

    - While innd is unresponsive, nnrpd was happily serving clients and the only indicator something was amiss from the client perspective is posting would fail with
    a timeout reported by the client.
    - Article flow from the upstream feeder was at it's lowest point during this period.
    - The problem resolved as expireover was nearing completion based on these timestamps:

    The last connection issue reported from the upstream feeder:

    Jun 24 06:25:44 feed1 innfeed[91878]: <spool1.usenet.blueworldhosting.com>:1 cxnsleep non-responsive connection

    Expire output:

    Expire messages:
    expireover start Thu Jun 23 15:39:00 CDT 2022: ( -z/var/log/news/expire.rm -Z/var/log/news/expire.lowmark)
    Article lines processed 53326142
    Articles dropped 56726
    Overview index dropped 57054
    expireover end Fri Jun 24 07:06:07 CDT 2022
    lowmarkrenumber begin Fri Jun 24 07:06:07 CDT 2022: (/var/log/news/expire.lowmark)
    lowmarkrenumber end Fri Jun 24 07:06:07 CDT 2022
    expirerm start Fri Jun 24 07:06:07 CDT 2022
    expirerm end Fri Jun 24 07:09:14 CDT 2022
    expire begin Fri Jun 24 07:09:44 CDT 2022: (-v1)
    Article lines processed 32835813
    Articles retained 32805948
    Entries expired 29865
    expire end Fri Jun 24 08:52:30 CDT 2022
    all done Fri Jun 24 08:52:30 CDT 2022

    Did some more poking around, and this may just be coincidental, but found errors from pyClean in /usr/local/news/pyclean/log/traceback. The timestamp of the traceback file coincides roughly with when I noticed the issue, but there are no timestamps within the individual traceback messages.

    -rw-rw-r-- 1 news news 2097 Jun 23 21:11 traceback

    root@spool1:/usr/local/news/pyclean/log # cat traceback
    Traceback (most recent call last):
    File "/usr/local/news/bin/filter/filter_innd.py", line 324, in filter_art
    return self.pyfilter.filter(art)
    File "/usr/local/news/bin/filter/filter_innd.py", line 702, in filter
    if art[Lines] and int(art[Lines]) != int(art[__LINES__]):
    ValueError: invalid literal for int() with base 10: "20\r\n AT&T Launches Free Wi-Fi Service Throughout August in Select\r\n McDonald's Outlets;
    $2.99 a Day Thereafter\r\n - Jul 29, 2003 07:00 AM (PR Newswire)"
    Traceback (most recent call last):
    File "/usr/local/news/bin/filter/filter_innd.py", line 324, in filter_art
    return self.pyfilter.filter(art)
    File "/usr/local/news/bin/filter/filter_innd.py", line 1001, in filter
    if not self.bin_allowed:
    AttributeError: Filter instance has no attribute 'bin_allowed'
    Traceback (most recent call last):
    File "/usr/local/news/bin/filter/filter_innd.py", line 324, in filter_art
    return self.pyfilter.filter(art)
    File "/usr/local/news/bin/filter/filter_innd.py", line 1001, in filter
    if not self.bin_allowed:
    AttributeError: Filter instance has no attribute 'bin_allowed'
    Traceback (most recent call last):
    File "/usr/local/news/bin/filter/filter_innd.py", line 324, in filter_art
    return self.pyfilter.filter(art)
    File "/usr/local/news/bin/filter/filter_innd.py", line 1001, in filter
    if not self.bin_allowed:
    AttributeError: Filter instance has no attribute 'bin_allowed'
    Traceback (most recent call last):
    File "/usr/local/news/bin/filter/filter_innd.py", line 324, in filter_art
    return self.pyfilter.filter(art)
    File "/usr/local/news/bin/filter/filter_innd.py", line 702, in filter
    if art[Lines] and int(art[Lines]) != int(art[__LINES__]):
    ValueError: invalid literal for int() with base 10: 'l54'
    Traceback (most recent call last):
    File "/usr/local/news/bin/filter/filter_innd.py", line 324, in filter_art
    return self.pyfilter.filter(art)
    File "/usr/local/news/bin/filter/filter_innd.py", line 1001, in filter
    if not self.bin_allowed:
    AttributeError: Filter instance has no attribute 'bin_allowed'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jun 24 21:08:48 2022
    Hi Jesse,

    As a follow-up, would you recommend a different overview storage mechanism?

    Incidentally, I was planning to make a video about exactly that subject: explaining how to set up each of the overview storage mechanisms, and
    comparing them.
    I would recommend either tradindexed or ovsqlite.
    Nonetheless, I do not have a spool as huge as yours. It seems that performances are not satisfactory with 53 millions of overview lines :-/


    My understanding is that buffindexed would probably make my situation worse

    Yes, do *not* use it in your situation!


    ovdb is a non-starter for me, and ovsqlite has *serious* performance issues. I started building this box with the intent to use ovsqlite, but when I started injecting articles the throughput rate was abysmal.

    Did you try to increase transrowlimit when injecting your articles?


    In case it could help, here are the results of the benchmark I performed
    last week with 3,713,040 overview lines on a VPS with 2 vCore of 2,4 GHz
    and 4 GB of RAM.


    Duration of expireover
    ----------------------

    ovdb -> 0:14
    compressed ovdb -> 0:21
    tradindexed -> 0:25
    ovsqlite -> 0:27
    compressed ovsqlite -> 0:44
    buffindexed -> 1:21


    Duration of an overview rebuild
    -------------------------------

    tradindexed -> 1:30
    ovdb -> 2:16
    buffindexed -> 2:25
    compressed ovdb -> 3:36
    ovsqlite -> 3:48
    compressed ovsqlite -> 5:31


    Yes I agree ovsqlite (version 3.34.1 here) is slow when adding new
    articles, but quite fast at expiry (and I do not think it will lock any
    table).

    --
    Julien ÉLIE

    « Les amis de la vérité sont ceux qui la cherchent, et non ceux qui se
    vantent de l'avoir trouvée. » (Condorcet)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to 9433024807.7c4bdc6e@freebsd-inject1 on Fri Jun 24 18:48:25 2022
    On Jun 24, 2022 at 1:12:10 PM CDT, "Jesse Rehmer" in <9433024807.7c4bdc6e@freebsd-inject1.usenet.blueworldhosting.com> wrote:

    On Jun 24, 2022 at 12:56:11 PM CDT, "Russ Allbery" in <87a6a2gd38.fsf@hope.eyrie.org> wrote:

    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

    My apologies, I did leave out some critical details - I am using
    tradspool/tradindexed.

    Do you by chance have a really large group that would take a long time to
    expire? It's been a while since I wrote the locking strategy for
    tradindexed, but expiring a group locks it for write and receiving an
    article in the same group will also attempt to lock it for write, and innd >> is single-threaded. So if you receive an article for a group while that
    group is being expired, innd will wait for the lock until the expiration
    of that group finishes. If that group is extremely large, this could well >> take ten minutes.

    I injected articles from 2003-present for almost all of the Big8 groups, so many groups have hundreds of thousands of articles.

    I will admit, it has been 10+ years since I've used INN with a spool containing this many articles, but long ago helped another admin move to a new
    server. That box was receiving a full feed, I don't recall the number of articles in the spool, but it was several terabytes (my size is ~150GB for comparison). When we ran expireover on that spool the average run time was 6-8
    hours and do not recall experiencing any issues like I am seeing. My expireover run on a fraction of the amount of data and on faster hardware was ~14 hours. Is that expected?

    As a follow-up, would you recommend a different overview storage mechanism?
    My understanding is that buffindexed would probably make my situation worse, ovdb is a non-starter for me, and ovsqlite has *serious* performance issues.
    I started building this box with the intent to use ovsqlite, but when I
    started injecting articles the throughput rate was abysmal.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to Jesse Rehmer on Fri Jun 24 11:59:13 2022
    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:
    "Russ Allbery" in <87a6a2gd38.fsf@hope.eyrie.org> wrote:

    Do you by chance have a really large group that would take a long time
    to expire? It's been a while since I wrote the locking strategy for
    tradindexed, but expiring a group locks it for write and receiving an
    article in the same group will also attempt to lock it for write, and
    innd is single-threaded. So if you receive an article for a group
    while that group is being expired, innd will wait for the lock until
    the expiration of that group finishes. If that group is extremely
    large, this could well take ten minutes.

    I injected articles from 2003-present for almost all of the Big8 groups,
    so many groups have hundreds of thousands of articles.

    I'm not sure hundreds of thousands would be enough to take that long.
    Maybe? It's been a long time since I've benchmarked this stuff, and never
    with modern hardware. It feels more likely to me if you have a single
    group with tens of millions of articles in it.

    I will admit, it has been 10+ years since I've used INN with a spool containing this many articles, but long ago helped another admin move to
    a new server. That box was receiving a full feed, I don't recall the
    number of articles in the spool, but it was several terabytes (my size
    is ~150GB for comparison). When we ran expireover on that spool the
    average run time was 6-8 hours and do not recall experiencing any issues
    like I am seeing. My expireover run on a fraction of the amount of data
    and on faster hardware was ~14 hours. Is that expected?

    Were you using tradindexed before I rewrote it to make it safer, maybe?

    You could try ovsqlite, which will have much finer-grained locking rather
    than the all-or-nothing approach that tradindexed uses (and kind of has to
    use due to its structure).

    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jun 24 21:24:51 2022
    Hi Russ,

    Do you by chance have a really large group that would take a long time to expire? It's been a while since I wrote the locking strategy for tradindexed, but expiring a group locks it for write and receiving an
    article in the same group will also attempt to lock it for write, and innd
    is single-threaded. So if you receive an article for a group while that group is being expired, innd will wait for the lock until the expiration
    of that group finishes. If that group is extremely large, this could well take ten minutes.

    I've not looked at the complexity but couldn't the expiration of the
    overview of a newsgroup be done in bulks of 100,000 articles for instance?
    Lock the group, process the first 100,000 overview lines (and retain
    that n lines have been dropped), release the lock, try to acquire a new
    lock and process the 100,000 next lines from line (100,000 - n), etc.

    --
    Julien ÉLIE

    « Le café est un breuvage qui fait dormir quand on n'en prend pas. »
    (Alphonse Allais)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to t95241$kphs$1@news.trigofacile.com on Fri Jun 24 19:22:27 2022
    On Jun 24, 2022 at 2:08:48 PM CDT, "Julien ÉLIE" in <t95241$kphs$1@news.trigofacile.com> wrote:

    Hi Jesse,

    As a follow-up, would you recommend a different overview storage mechanism?

    Incidentally, I was planning to make a video about exactly that subject: explaining how to set up each of the overview storage mechanisms, and comparing them.
    I would recommend either tradindexed or ovsqlite.
    Nonetheless, I do not have a spool as huge as yours. It seems that performances are not satisfactory with 53 millions of overview lines :-/

    And I'm only about half-way finished injecting articles... :)

    ovdb is a non-starter for me, and ovsqlite has *serious* performance issues. >> I started building this box with the intent to use ovsqlite, but when I
    started injecting articles the throughput rate was abysmal.

    Did you try to increase transrowlimit when injecting your articles?

    I have zero sqlite knowledge, did glance at the options in ovsqlite.conf, but lack knowledge how to best tune these values; so hastily switched to tradindexed as I knew I would get the throughput expected with no tinkering.

    If you have suggestions for tuning, I'm open to giving ovsqlite another try, especially after seeing your benchmark results for expiry and rebuild.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jun 24 21:37:03 2022
    Hi Jesse,

    *:A:1:never:never
    *.jobs*:A:1:90:90
    *.test:A:1:300:300
    news.lists.filters:A:1:10:10

    You may also want to expire control.cancel sooner... It may get very
    large...

    To sort your newsgroups by increasing article counts, run:

    % getlist -R counts | sort -n -k 4

    The fourth field contains the number of articles.

    --
    Julien ÉLIE

    « Le café est un breuvage qui fait dormir quand on n'en prend pas. »
    (Alphonse Allais)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to t953ov$kqir$1@news.trigofacile.com on Fri Jun 24 19:53:14 2022
    On Jun 24, 2022 at 2:37:03 PM CDT, "Julien ÉLIE" in <t953ov$kqir$1@news.trigofacile.com> wrote:

    Hi Jesse,

    *:A:1:never:never
    *.jobs*:A:1:90:90
    *.test:A:1:300:300
    news.lists.filters:A:1:10:10

    You may also want to expire control.cancel sooner... It may get very large...

    To sort your newsgroups by increasing article counts, run:

    % getlist -R counts | sort -n -k 4

    The fourth field contains the number of articles.

    Thanks for the reminder about adding control.cancel... Right now it shouldn't be an issue:

    [news@spool1 ~]$ getlist -R counts | sort -n -k 4 | grep control.cancel control.cancel 15447 1 15447 n

    However, I can see where ignoring could cause problems in the future. Do you also recommend "control"?

    Somehow I've missed the getlist utility all these years... pretty spiffy.
    Looks like I have a few groups that are indeed larger than even I expected. I haven't paid attention to the high/low counts and whatnot since beginning the injection project.

    ---------------
    rec.outdoors.rv-travel 355495 1 355482 y
    alt.folklore.computers 377199 1 377133 y
    talk.politics.guns 417832 1 417766 y
    comp.sys.mac.system 443025 1 442909 y
    alt.politics 460595 1 460585 y
    soc.culture.polish 515936 1 515825 y
    comp.lang.c 519799 1 517421 y
    soc.culture.jewish 521143 1 520973 y
    alt.fan.rush-limbaugh 529383 1 529297 y
    comp.sys.mac.advocacy 540625 1 540607 y
    rec.crafts.metalworking 585043 1 585042 y
    rec.arts.sf.written 632786 1 632665 y
    soc.culture.usa 671665 1 671598 y
    soc.culture.israel 749617 1 749617 y
    alt.atheism 769678 1 769661 y
    rec.sport.pro-wrestling 777234 1 777234 y
    rec.arts.drwho 798601 1 798600 y
    rec.sport.football.college 828057 1 828057 y
    rec.sport.tennis 849759 1 849759 y
    rec.arts.tv 1367076 1 1366870 y
    ---------------

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Russ Allbery@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri Jun 24 12:57:26 2022
    Julien ÉLIE <iulius@nom-de-mon-site.com.invalid> writes:

    I've not looked at the complexity but couldn't the expiration of the
    overview of a newsgroup be done in bulks of 100,000 articles for
    instance? Lock the group, process the first 100,000 overview lines (and retain that n lines have been dropped), release the lock, try to acquire
    a new lock and process the 100,000 next lines from line (100,000 - n),
    etc.

    The basic problem is that the expire process involves rewriting the data
    file, which invalidates all of the offset information stored in the index. That's why it holds a write lock on the index while expiring.

    Multiple passes would potentially require rewriting the data file and
    updating the offsets multiple times, which I would worry would cause other performance issues. (Back when I was testing this, I think expiration was mostly I/O bound and writing the data file was the expensive bit.)

    I can think of a couple of possible approaches:

    * Store articles that arrive in groups being expired but queue the
    overview information and store it later once the lock is released. This
    would require some complex state tracking in innd that may be tricky to
    get right and not lose data.

    * Perform the bulk of the expiration without holding a lock, and then take
    the lock at the end, find only the articles that have come in during
    expiration, and add them to the new data file before replacing the data
    file. I think this would work and is probably a simpler place to put
    the complexity, but it will require some careful design to make sure
    there are no race conditions and we can find all the articles that came
    in after expiry started.

    Deferring the article back to the peer probably won't work since not all
    peers honor deferrals and I think the article is already stored before we
    try to write overview.

    Using a proper database fixes this problem correctly (but obviously at the
    risk of introducing other issues). In an ideal world, it would be nice to
    have one overview mechanism that's good enough we could just have everyone
    use it.

    --
    Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

    Please post questions rather than mailing me directly.
    <https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to 87sfntg7h5.fsf@hope.eyrie.org on Fri Jun 24 20:08:13 2022
    On Jun 24, 2022 at 2:57:26 PM CDT, "Russ Allbery" in <87sfntg7h5.fsf@hope.eyrie.org> wrote:

    Deferring the article back to the peer probably won't work since not all peers honor deferrals and I think the article is already stored before we
    try to write overview.

    I'm not a developer by any means, but based on what I see with truss, I
    believe this statement to be correct.

    sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    read(282,"fNCus+94CPtlu8N1+G\r\n\tx9e9mKrN"...,7223) = 2752 (0xac0) msync(0x808e00000,2101248,MS_ASYNC) = 0 (0x0) openat(AT_FDCWD,"/usr/local/news/spool/articles/talk/politics/guns/417838",O_ WRONLY|O_CREAT|O_EXCL,0664) = 109 (0x6d)
    writev(109,[{"Sender: Nomen Nescio <nobody@diz"...,2936},{"news.usenet.blueworldhosting.com"...,33},{"spool1.use net.blueworldhosting.c"...,35},{"usenet.blueworldhosting.com!feed"...,133},{" spool1.usenet.blueworldhosting.c"...,172},{"\r\n\r\nIn article <a90a66d9-444"...,606}],6) = 3915 (0xf4b)
    close(109) = 0 (0x0) link("/usr/local/news/spool/articles/talk/politics/guns/417838","/usr/local/n ews/spool/articles/alt/politics/obama/143985") = 0 (0x0) link("/usr/local/news/spool/articles/talk/politics/guns/417838","/usr/local/n ews/spool/articles/alt/politics/nationalism/white/5158") = 0 (0x0) link("/usr/local/news/spool/articles/talk/politics/guns/417838","/usr/local/n ews/spool/articles/rec/arts/tv/1367080") = 0 (0x0) link("/usr/local/news/spool/articles/talk/politics/guns/417838","/usr/local/n ews/spool/articles/alt/politics/democrats/176951") = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0)
    write(173,"417838\tRe: Do you support laws "...,443) = 443 (0x1bb) lseek(173,0x0,SEEK_CUR) = 290699959 (0x1153bab7) pwrite(165,"\M-|\M-8S\^Q\0\0\0\0\M-;\^A\0\0"...,56,0x16509d8) = 56 (0x38) msync(0x802d64000,4096,MS_ASYNC) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0)
    write(254,"143985\tRe: Do you support laws "...,443) = 443 (0x1bb) lseek(254,0x0,SEEK_CUR) = 82669496 (0x4ed6fb8) pwrite(182,"\M-}m\M-m\^D\0\0\0\0\M-;\^A\0\0"...,56,0x7b0880) = 56 (0x38) msync(0x802b31000,4096,MS_ASYNC) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0) openat(AT_FDCWD,"/usr/local/news/spool/overview/a/p/n/w/alt.politics.national ism.white.IDX",O_RDWR|O_CREAT,0664) = 109 (0x6d)
    fstat(109,{ mode=-rw-rw-r-- ,inode=34168327,size=288792,blksize=131072 }) = 0 (0x0)
    fcntl(109,F_GETFD,) = 0 (0x0)
    fcntl(109,F_SETFD,FD_CLOEXEC) = 0 (0x0) openat(AT_FDCWD,"/usr/local/news/spool/overview/a/p/n/w/alt.politics.national ism.white.DAT",O_RDWR|O_APPEND|O_CREAT,0664) = 113 (0x71)
    fcntl(113,F_GETFD,) = 0 (0x0)
    fcntl(113,F_SETFD,FD_CLOEXEC) = 0 (0x0)
    close(35) = 0 (0x0)
    close(86) = 0 (0x0)
    fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0)
    write(113,"5158\tRe: Do you support laws th"...,441) = 441 (0x1b9) lseek(113,0x0,SEEK_CUR) = 3775657 (0x399ca9) pwrite(109,"\M-p\M^Z9\0\0\0\0\0\M-9\^A\0\0\0"...,56,0x46818) = 56 (0x38) msync(0x802b31000,4096,MS_ASYNC) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0)
    write(211,"1367080\tRe: Do you support laws"...,444) = 444 (0x1bc) lseek(211,0x0,SEEK_CUR) = 643936438 (0x2661b0b6) pwrite(57,"\M-z\M-.a&\0\0\0\0\M-<\^A\0\0\0"...,56,0x4902888) = 56 (0x38) msync(0x802d32000,4096,MS_ASYNC) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0)
    write(229,"176951\tRe: Do you support laws "...,443) = 443 (0x1bb) lseek(229,0x0,SEEK_CUR) = 107740676 (0x66bfe04) pwrite(9,"I\M-|k\^F\0\0\0\0\M-;\^A\0\0\0\0"...,56,0x9733d0) = 56 (0x38) msync(0x802b30000,4096,MS_ASYNC) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0) write(5,"[F274954F241C01B17699DD5A2DDD219"...,98) = 98 (0x62) msync(0x80de00000,398255200,MS_ASYNC) = 0 (0x0) msync(0x825a00000,298691400,MS_ASYNC) = 0 (0x0)
    lseek(7,0x0,SEEK_SET) = 0 (0x0)
    write(7,"dbz 6 49781900 14 66\n32810510 3"...,111) = 111 (0x6f)
    write(1,"Jun 24 15:05:02.205 + feed1.usen"...,100) = 100 (0x64)
    write(1," 3912 inpaths!\n",15) = 15 (0xf) write(112,"news.usenet.blueworldhosting.com"...,194) = 194 (0xc2) write(282,"239 <ed02c4f9d54f3b99bfe0840ae0c"...,50) = 50 (0x32) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to All on Fri Jun 24 20:14:58 2022
    On Jun 24, 2022 at 3:08:13 PM CDT, "Jesse Rehmer" in <0129301868.eccd854a@freebsd-inject1.usenet.blueworldhosting.com> wrote:

    Argh... I pasted the wrong snippet, this is better, I think:

    sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
    select(283,{ 12 14 15 17 282 },{ },0x0,{ 449.000000 }) = 1 (0x1) sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0 (0x0)
    read(282,"dhosting.com junk:1181312\r\n\r"...,7223) = 2286 (0x8ee) openat(AT_FDCWD,"/usr/local/news/spool/articles/uk/media/radio/archers/2289", O_WRONLY|O_CREAT|O_EXCL,0664) = 232 (0xe8)
    writev(232,[{"Path: ",6},{"news.usenet.blueworldhosting.com"...,33},{"spool1.usenet.blueworldhost ing.c"...,35},{"usenet.blueworldhosting.com!feed"...,942},{"spool1.usenet.blu eworldhosting.c"...,62},{"\r\n\r\nOn Fri, 24 Jun 2022 10:0"...,2261}],6) =
    3339 (0xd0b)
    close(232) = 0 (0x0) openat(AT_FDCWD,"/usr/local/news/spool/overview/u/m/r/a/uk.media.radio.archer s.IDX",O_RDWR|O_CREAT,0664) = 232 (0xe8)
    fstat(232,{ mode=-rw-rw-r-- ,inode=5370741,size=128128,blksize=128512 }) = 0 (0x0)
    fcntl(232,F_GETFD,) = 0 (0x0)
    fcntl(232,F_SETFD,FD_CLOEXEC) = 0 (0x0) openat(AT_FDCWD,"/usr/local/news/spool/overview/u/m/r/a/uk.media.radio.archer s.DAT",O_RDWR|O_APPEND|O_CREAT,0664) = 270 (0x10e)
    fcntl(270,F_GETFD,) = 0 (0x0)
    fcntl(270,F_SETFD,FD_CLOEXEC) = 0 (0x0)
    close(274) = 0 (0x0)
    close(283) = 0 (0x0)
    fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0)
    write(270,"2289\tRe: OT: Boodly Asda\tNick "...,323) = 323 (0x143) lseek(270,0x0,SEEK_CUR) = 898638 (0xdb64e) pwrite(232,"\v\M-5\r\0\0\0\0\0C\^A\0\0\0\0\0"...,56,0x1f480) = 56 (0x38) msync(0x802d7d000,4096,MS_ASYNC) = 0 (0x0) fcntl(8,F_SETLKW,0x7fffffffcf10) = 0 (0x0) write(5,"[4FAD1BE6FF8B411EF21EBF2100E369F"...,98) = 98 (0x62)
    write(1,"Jun 24 15:11:55.536 + feed1.usen"...,100) = 100 (0x64)
    write(1," 3336 inpaths!\n",15) = 15 (0xf) write(112,"news.usenet.blueworldhosting.com"...,204) = 204 (0xcc) write(282,"239 <ab6cbhtfltggcj4f9fsp5o1nh2o"...,50) = 50 (0x32) sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jun 24 22:39:36 2022
    Hi Jesse,

    Thanks for the reminder about adding control.cancel... Right now it shouldn't be an issue:

    [news@spool1 ~]$ getlist -R counts | sort -n -k 4 | grep control.cancel control.cancel 15447 1 15447 n

    However, I can see where ignoring could cause problems in the future. Do you also recommend "control"?

    I would recommend having control.newgroup, control.rmgroup and control.checkgroups. Then there shouldn't be anything (but crap) in
    control.
    Traffic is very low in these groups so only expiring control.cancel
    should be OK.


    Somehow I've missed the getlist utility all these years... pretty spiffy.

    :-)


    Looks like I have a few groups that are indeed larger than even I expected. I haven't paid attention to the high/low counts and whatnot since beginning the injection project.

    rec.arts.tv 1367076 1 1366870 y

    Is it your largest group? (It represents about 2,5% of all your articles.)

    --
    Julien ÉLIE

    « Le café est un breuvage qui fait dormir quand on n'en prend pas. »
    (Alphonse Allais)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to All on Sat Jun 25 07:52:23 2022
    If I change:

    *:A:1:never:never

    to:

    *:A:never:never:never

    Would that stop expireover from traversing through all the groups and overview data, except specific groups listed later in expire.ctl?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sat Jun 25 22:30:51 2022
    Hi Jesse,

    If I change:

    *:A:1:never:never

    to:

    *:A:never:never:never

    Would that stop expireover from traversing through all the groups and overview
    data, except specific groups listed later in expire.ctl?

    No, expireover has all the same a look at all the overview lines to
    check whether the related articles still exist in the spool. (Yet, as
    you're using tradspool, this check is in fact not done because it is
    considered expansive, unless you give the -s flag to expireover.)

    In the source code of tdx_data_expire_start(), there's a note about
    exactly the point you are raising:

    ** There's probably some room for optimization here for newsgroups that
    ** don't expire so that the files don't have to be rewritten, or
    ** newsgroups where all the data at the end of the file is still good
    ** and just needs to be moved as-is.


    However, you may want to run expireover only on a subset of newsgroups,
    like you say. Instead of running it on all your newsgroups, you may
    want to run "news.daily noexpireover" (plus the usual flags you're
    using), and a few individual overview expiry:

    grep -E 'news.lists.filters |\.jobs|\.test ' <pathdb>/active \
    | expireover -f - -Z <pathtmp>/lowmark
    ctlinnd lowmark <pathtmp>/lowmark
    rm -f <pathtmp>/lowmark

    The grep command extracts the newsgroups from your active file.

    (And why not expire the others monthly, and not all of them the same day?)
    Just ideas.

    --
    Julien ÉLIE

    « Le café est un breuvage qui fait dormir quand on n'en prend pas. »
    (Alphonse Allais)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to t95241$kphs$1@news.trigofacile.com on Fri Jul 1 00:27:31 2022
    On Jun 24, 2022 at 2:08:48 PM CDT, "Julien ÉLIE" in <t95241$kphs$1@news.trigofacile.com> wrote:

    ovdb is a non-starter for me, and ovsqlite has *serious* performance issues. >> I started building this box with the intent to use ovsqlite, but when I
    started injecting articles the throughput rate was abysmal.

    Did you try to increase transrowlimit when injecting your articles?

    Hi Julien - I am going to give ovsqlite another try. I was reading through the man page and configuration file, paying attention to this recommendation:

    A transaction occurs every transrowlimit articles or transtimelimit
    seconds, whichever is smaller. You are encouraged to keep the default
    value for row limits and, instead, adjust the time limit according to
    how many articles your news server usually accepts per second during
    normal operation (you can find statistics about incoming articles in
    your daily Usenet reports).
    Inserting or deleting a database row within a transaction is very fast
    whereas committing a transaction is slow, especially on rotating
    storage. Setting transaction limits too low leads to poor
    performance. When rebuilding overview data, it may be worth
    temporarily raising these values, though.

    Do you recommend increasing both transrowlimit and transtimelimit? I am also looking at the pagesize and cachesize parameters.

    Thanks,

    Jesse

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sun Jul 3 23:32:05 2022
    Hi Jesse,

    I am going to give ovsqlite another try. I was reading through the
    man page and configuration file, paying attention to this recommendation:

    A transaction occurs every transrowlimit articles or transtimelimit
    seconds, whichever is smaller. You are encouraged to keep the default
    value for row limits and, instead, adjust the time limit according to
    how many articles your news server usually accepts per second during
    normal operation (you can find statistics about incoming articles in
    your daily Usenet reports).
    Inserting or deleting a database row within a transaction is very fast whereas committing a transaction is slow, especially on rotating
    storage. Setting transaction limits too low leads to poor
    performance. When rebuilding overview data, it may be worth
    temporarily raising these values, though.

    Do you recommend increasing both transrowlimit and transtimelimit? I am also looking at the pagesize and cachesize parameters.

    When testing on my news server, which is a relatively fast VPS, I did
    not notice any change with a x10 to transrowlimit or transtimelimit
    during an overview rebuild.
    This recommendation comes from the implementer of ovsqlite, and I
    believe it may be useful on slower systems (with rotating hard drives or
    slow CPU).

    I kept the defaults for pagesize and cachesize. I'm not an SQLite
    expert, and ovsqlite is fast enough in my testing news server. Of
    course if you find out tweaks that make it even faster in your news
    server, feel free to share and I'll be happy to add them in our
    documentation for other users.

    --
    Julien ÉLIE

    « La médecine est un métier dangereux. Ceux qui ne meurent pas peuvent
    vous faire un procès. » (Coluche)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)