Hello,
On my private INN leaf system, I am running INN 2.7.1 with groupbaseexpiry=true,
ovmethod=ovdb, hismethod=hisv6, and tradspool.
I am having an issue where the news.daily job is:
* For SOME articles, correctly expiring them from all places
* For others, even in the same group, removing them from lownumber in active
(and presumably also overview) but leaving them on disk (and sometimes in
history).
I noticed this when doing a routine examination of what was using space on this
system, which is getting a full text-only feed.
I picked the group at the top of the storage space list under alt. (Please do
not derail; this group was picked solely by its disk space usage, and is not one
I read anyhow; this is a purely technical question)
news:~/articles/alt/atheism$ ls | wc -l
118089
news:~/articles/alt/atheism$ grep ^alt.atheism\ /var/lib/news/active alt.atheism 0000289856 0000288979 y
According to active, there are 877 articles in that group. But according to ls,
over 100,000.
I added a specific like to expire.ctl for this hierarchy for testing:
alt.atheism.*:A:5:5:5
alt.atheism:A:5:5:5
My expire.log shows:
expireover start Tue Jan 16 04:18:25 CST 2024: ( -z/var/log/news/expire.rm -Z/var/log/news/expire.lowmark)
Article lines processed 9050290
Articles dropped 16771
Overview index dropped 17663
expireover end Tue Jan 16 04:37:41 CST 2024
lowmarkrenumber begin Tue Jan 16 04:37:41 CST 2024: (/var/log/news/expire.lowmark)
lowmarkrenumber end Tue Jan 16 04:37:41 CST 2024
expirerm start Tue Jan 16 04:37:41 CST 2024
expirerm end Tue Jan 16 04:38:02 CST 2024
expire begin Tue Jan 16 04:38:32 CST 2024: (-v1)
Article lines processed 8051376
Articles retained 5934390
Entries expired 2116986
expire end Tue Jan 16 04:52:01 CST 2024
all done Tue Jan 16 04:52:01 CST 2024
Well that's weird. My expire.list file does have some alt/atheism/* files in it, and THOSE files are gone. But:
news:~/articles/alt/atheism$ ls -ltr | head
total 584041
-rw-rw-r-- 4 news news 5967 Aug 30 2021 1
-rw-rw-r-- 4 news news 2612 Aug 30 2021 2
-rw-rw-r-- 4 news news 2609 Aug 30 2021 3
-rw-rw-r-- 4 news news 3596 Aug 30 2021 16
-rw-rw-r-- 4 news news 3511 Aug 30 2021 24
-rw-rw-r-- 4 news news 4217 Aug 30 2021 25
-rw-rw-r-- 4 news news 3303 Aug 30 2021 27
-rw-rw-r-- 4 news news 2362 Aug 30 2021 28
-rw-rw-r-- 4 news news 3994 Aug 30 2021 32
Clearly something isn't right here.
Looking at the message-IDs from these, for some of them (for instance, the very
oldest in the list) grephistory doesn't show any entries. For others -- such as
this one from February 2023, nearly a year ago:
news:~/articles/alt/atheism$ grephistory -l 'c0eeuhhbc3ebr3t8vlvfu4ftd4bdbb49u7@4ax.com'
[43098AA196CF303E586EDE4477A98426] 1676098879~-~1676097578 @0500000002F00000000000030D4000000000@
And those dates match. But this is clearly outside what is listed in active.
ovdb seems to match active:
news:~/articles/alt/atheism$ ovdb_stat -c alt.atheism
alt.atheism: counted: low: 288979, high: 289856, count: 878 news:~/articles/alt/atheism$ ovdb_stat -g alt.atheism
alt.atheism: groupstats: low: 288979, high: 289856, count: 878, flag: y news:~/articles/alt/atheism$ ovdb_stat -i alt.atheism
alt.atheism: flags: none
alt.atheism: gid: 752; Stored in: ov00027
alt.atheism: last expired: 2024-01-16 04:20:27 CST
alt.atheism: by process id: 138688
Even expire.list is weird:
alt/atheism/288826
alt/atheism/288827
alt/atheism/288828
alt/atheism/288829
alt/atheism/288830
alt/atheism/288833
alt/atheism/288834
alt/atheism/288835
alt/atheism/288836
alt/atheism/288840
alt/atheism/288842
I don't know how to explain those gaps; 288831 and 288832 do exist on disk and
should be expired, for instance.
I believe this is happening in a number of other groups as well. That is, it's
not specific to this one. This is just the biggest example.
So my questions are:
1) Why is this happening?
2) Once I fix #1, how can I fix it? I could brute force a 'find . -mtime +x -delete' but that might not leave the history in a consistent state.
I am using the default cronjob for calling news.daily expireover lowmark delayrm.
Thanks,
John
In reading some more docs, I have a clue, but it is not an answer.
The expireover manpage documents these options:
-e Remove articles from the news spool and all overview databases
as soon as they expire out of any newsgroup to which they are
posted, rather than retain them until they expire out of all
newsgroups. -e and -k cannot be used at the same time. This
flag is ignored if groupbaseexpiry is false.
-k Retain all overview information for an article, as well as the
article itself, until it expires out of all newsgroups to
which it was posted. This can cause articles to stick around
in a newsgroup for longer than the expire.ctl rules indicate,
when they're crossposted. -e and -k cannot be used at the
same time. This flag is ignored if groupbaseexpiry is false.
It doesn't state which is the default.
- Since a tradspool article is hardlinked into every newsgroup it is
posted to, how does -e even know where to remove it from? Same for
ctlinnd cancel and such. It would have to rm/unlink it from multiple
places if crossposted.
- Part of the reason I picked tradspool was to have more fine-grained
control over expiry. I had thought that it would simply unlink a
crossposted article from any group whose rules would expire it. It
sounds like it's all-or-nothing, and perhaps -k is the default.
John Goerzen <jgoerzen@complete.org> writes:
In reading some more docs, I have a clue, but it is not an answer.
The expireover manpage documents these options:
-e Remove articles from the news spool and all overview databases
as soon as they expire out of any newsgroup to which they are
posted, rather than retain them until they expire out of all
newsgroups. -e and -k cannot be used at the same time. This
flag is ignored if groupbaseexpiry is false.
-k Retain all overview information for an article, as well as the
article itself, until it expires out of all newsgroups to
which it was posted. This can cause articles to stick around
in a newsgroup for longer than the expire.ctl rules indicate,
when they're crossposted. -e and -k cannot be used at the
same time. This flag is ignored if groupbaseexpiry is false.
It doesn't state which is the default.
I believe the default is neither of those: the overview information is >removed as the article expires from each group, but the article itself is >retained until it has expired from all groups.
Crossposting was my first guess about what behavior you were seeing, but
then I decided it couldn't be that since you had thousands of unexpired >messages. But now that you say that you keep everything forever except >certain groups, it all makes sense. Your example group, alt.atheism, is >notorious for getting tons of crossposted troll posts from tons of other >groups, so much of its traffic is crossposted.
- Since a tradspool article is hardlinked into every newsgroup it is
posted to, how does -e even know where to remove it from? Same for
ctlinnd cancel and such. It would have to rm/unlink it from multiple
places if crossposted.
It uses the Xref header of the article to find all the groups to which it
was posted. It has to retrieve that anyway to know what expiration rules
to apply. This is why although the standard doesn't require it, INN
requires the Xref header be present in the overview database.
- Part of the reason I picked tradspool was to have more fine-grained
control over expiry. I had thought that it would simply unlink a
crossposted article from any group whose rules would expire it. It
sounds like it's all-or-nothing, and perhaps -k is the default.
I forget why we don't do that. I think it's because the links aren't
always hard links; sometimes they're symlinks and in that case you can't
just delete the article out of each group independently? But I'm not
really sure.
I am a little fuzzy on the relationship between the triple of (overview, history, article storage on disk).
It sounds like you're saying:
- The overview information is per-group (which is pretty much has to be,
given what overview is for). So expireover could remove the article
from overview for a group per expire.ctl without removing the hardlink
to it from that group, or the history entry.
- Then after it is expired from all groups (I guess by checking the Xref
header?) it is finally removed from the history and on-disk as well.
Did I get that right? I think this could explain the behavior I was
seeing, of ovdb and active not showing all the old articles, but them
still being on disk and (mostly?) in history.
In that case, the only "penalty" I am paying here is the cost of the directory entry, since the inode and data is already spoken for with the other crossposts.
So, expireover -k would modify that by not even removing it from
overview either, until all references are removed.
And -e would change it to remove from everywhere as soon as even one of
the Xrefs expire it.
I forget why we don't do that. I think it's because the links aren't
always hard links; sometimes they're symlinks and in that case you
can't just delete the article out of each group independently? But I'm
not really sure.
I've been wondering of I could use different storage classes to solve
this problem. Or maybe that would introduce more; I've never had more
than one storage class before.
What is the mechanism when an article is crossposted into groups that
are in different classes?
John Goerzen <jgoerzen@complete.org> writes:
In reading some more docs, I have a clue, but it is not an answer.
It doesn't state which is the default.
I believe the default is neither of those: the overview information is removed as the article expires from each group, but the article itself is retained until it has expired from all groups.
Crossposting was my first guess about what behavior you were seeing, but
then I decided it couldn't be that since you had thousands of unexpired messages. But now that you say that you keep everything forever except certain groups, it all makes sense. Your example group, alt.atheism, is notorious for getting tons of crossposted troll posts from tons of other groups, so much of its traffic is crossposted.
- Since a tradspool article is hardlinked into every newsgroup it is
posted to, how does -e even know where to remove it from? Same for
ctlinnd cancel and such. It would have to rm/unlink it from multiple
places if crossposted.
It uses the Xref header of the article to find all the groups to which it
was posted. It has to retrieve that anyway to know what expiration rules
to apply. This is why although the standard doesn't require it, INN
requires the Xref header be present in the overview database.
- Part of the reason I picked tradspool was to have more fine-grained
control over expiry. I had thought that it would simply unlink a
crossposted article from any group whose rules would expire it. It
sounds like it's all-or-nothing, and perhaps -k is the default.
I forget why we don't do that. I think it's because the links aren't
always hard links; sometimes they're symlinks and in that case you can't
just delete the article out of each group independently? But I'm not
really sure.
On 2024-01-17, Russ Allbery <eagle@eyrie.org> wrote:
John Goerzen <jgoerzen@complete.org> writes:
In reading some more docs, I have a clue, but it is not an answer.
It doesn't state which is the default.
I believe the default is neither of those: the overview information is
removed as the article expires from each group, but the article itself is
retained until it has expired from all groups.
Hi Russ, and thanks for the reply!
I am a little fuzzy on the relationship between the triple of (overview, history, article storage on disk).
It sounds like you're saying:
- The overview information is per-group (which is pretty much has to be, given
what overview is for). So expireover could remove the article from overview
for a group per expire.ctl without removing the hardlink to it from that
group, or the history entry.
- Then after it is expired from all groups (I guess by checking the Xref
header?) it is finally removed from the history and on-disk as well.
Did I get that right? I think this could explain the behavior I was seeing, of
ovdb and active not showing all the old articles, but them still being on disk
and (mostly?) in history.
In that case, the only "penalty" I am paying here is the cost of the directory
entry, since the inode and data is already spoken for with the other crossposts.
So, expireover -k would modify that by not even removing it from overview either, until all references are removed.
And -e would change it to remove from everywhere as soon as even one of the Xrefs expire it.
If you want to keep tradspool, you may want to add the 'X' flag to the expire.ctl entry for alt.atheism, and any other similar group that you wish to
expire crossposted articles. That will expire articles posted to that group that are also crossposted to other groups. Using the '-e' flag with expireover
may result in behavior you do not expect if you have many entries in expire.ctl.
If you want to keep tradspool, you may want to add the 'X' flag to the expire.ctl entry for alt.atheism, and any other similar group that you
wish to expire crossposted articles.
Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:
If you want to keep tradspool, you may want to add the 'X' flag to the
expire.ctl entry for alt.atheism, and any other similar group that you
wish to expire crossposted articles.
Oh, good call, I completely forgot about that feature.
news:~/articles/alt/atheism$ ls -ltr | head
total 312503
-rw-rw-r-- 4 news news 5967 Aug 30 2021 1
-rw-rw-r-- 4 news news 2612 Aug 30 2021 2
-rw-rw-r-- 4 news news 2609 Aug 30 2021 3
-rw-rw-r-- 4 news news 3596 Aug 30 2021 16
-rw-rw-r-- 4 news news 3511 Aug 30 2021 24
-rw-rw-r-- 4 news news 4217 Aug 30 2021 25
-rw-rw-r-- 4 news news 3303 Aug 30 2021 27
-rw-rw-r-- 4 news news 2362 Aug 30 2021 28
-rw-rw-r-- 4 news news 3994 Aug 30 2021 32
news:~/articles/alt/atheism$ ovdb_stat -c alt.atheism
alt.atheism: counted: low: 289303, high: 290329, count: 1023
OK, so from incidental inspection, MOST but not ALL of the 60,000 articles still
in alt.atheism are visible to grephistory. These from 2021 are clearly outside
the expire range. The article count is down from the roughly 120,000 that it was before adding -e to expireover, but still the 60,000 files on disk is way higher than the 1023 articles that ovdb thinks I have.
The expireover manpage documents these options:
-e Remove articles from the news spool and all overview databases >> as soon as they expire out of any newsgroup to which they are
posted, rather than retain them until they expire out of all
newsgroups. -e and -k cannot be used at the same time. This
flag is ignored if groupbaseexpiry is false.
-k Retain all overview information for an article, as well as the >> article itself, until it expires out of all newsgroups to
which it was posted. This can cause articles to stick around
in a newsgroup for longer than the expire.ctl rules indicate,
when they're crossposted. -e and -k cannot be used at the
same time. This flag is ignored if groupbaseexpiry is false.
It doesn't state which is the default.
I believe the default is neither of those: the overview information is removed as the article expires from each group, but the article itself is retained until it has expired from all groups.
- Part of the reason I picked tradspool was to have more fine-grained
control over expiry. I had thought that it would simply unlink a
crossposted article from any group whose rules would expire it. It
sounds like it's all-or-nothing, and perhaps -k is the default.
I forget why we don't do that. I think it's because the links aren't
always hard links; sometimes they're symlinks and in that case you can't
just delete the article out of each group independently? But I'm not
really sure.
Yes, they can be moved between classes, and even of different methods.Can one move groups between classes if the storage method is the
same? (I note the warning "arbitrary but permanent" and am not
sure how it might apply here.)
I've been wondering of I could use different storage classes to
solve this problem.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 70:48:44 |
Calls: | 6,712 |
Files: | 12,244 |
Messages: | 5,356,851 |