• force expiration by path?

    From Dave McGuire@21:1/5 to All on Sun Dec 10 20:12:04 2023
    Hi folks. Can anyone tell me if there's a way to tell INN to expire
    a set of articles, as a one-time operation, based on their path?

    I'm sure it's obvious that my goal is to get rid of all the Google
    spam from the spool. I just filtered them in my cleanfeed configuration
    but would like to purge the articles that are already there, as my
    server is set up with a long expiration period.

    A perusal of the docs for expire and such have turned up nothing, so
    I'd appreciate some advice on whether or not there's a way to do this.

    Thanks,
    -Dave

    --
    Dave McGuire, President/Curator
    Large Scale Systems Museum
    New Kensington, PA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Dave McGuire on Sun Dec 10 23:14:02 2023
    On 12/10/23 19:12, Dave McGuire wrote:
      Hi folks.  Can anyone tell me if there's a way to tell INN to expire
    a set of articles, as a one-time operation, based on their path?

    Maybe and it depends. (More below.)

      I'm sure it's obvious that my goal is to get rid of all the Google
    spam from the spool.  I just filtered them in my cleanfeed configuration
    but would like to purge the articles that are already there, as my
    server is set up with a long expiration period.

    I was doing that very thing as we type this thread. -- I just checked
    and a long running command finished.

    time says that my command ran for:

    84021.02s user 19364.71s system 29% cpu 98:13:31.85 total

    This is a tradspool on a four (spinning rust) drive ZFS pool.

    Seeing as how I'm using tradspool, I'm able to delete files from the
    spool directory.

    I suspect that this isn't proper, much less pure, from an INN sense. I
    bet I should have extracted the article number and passed a given a
    cancel message to INN, likely via ctlinnd. But, I did a hack and I'll
    deal with it if / when it becomes a problem.

    That being said, I did a find across /var/spool/news/articles and had it
    exec a script per article that looked for Message-IDs that ended with @googlegroups.com.

    This is actually the second time I've done this. The first time I did
    it the process removed nearly seven million articles. Then I found out
    that the Message-ID had a different pattern, likely as fields grew over
    time. So I re-ran the process with a more forgiving format.

    export LC_ALL=C
    egrep -lm1 "^Message-ID: <[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+@googlegroups.com>$"
    ${1} > /dev/null 2>&1
    if [ ${?} -eq 0 ]; then
    echo -n "X"
    rm ${1}
    fi

    I'm sure there are other ways to do this. But it worked for me. I was
    able to let it run in the background in a window.

    time (clear; find $(pwd) -type d | while read DIR; do echo -n "${TS}${${DIR/\/var\/spool\/news\/articles\//}//\//.}${FS}"; find ${DIR} -maxdepth 1 -type f -exec /root/remove-google-groups-news-posting-if-its-spam.sh {} \; ; done; echo)

    The echo / ${TS} / ${FS} isn't important, much less required. It's
    there because I wanted to update the window title to be the newsgroup
    that was being worked on.

    I'm sure there are better ways to do this. But this has worked for me
    to do exactly what you're asking to do.

      A perusal of the docs for expire and such have turned up nothing, so
    I'd appreciate some advice on whether or not there's a way to do this.

    I'm not aware of anything built in to INN that will do this. But this
    is one way that you can do this outside of INN.

    N.B. what I did is possibly very specific to the tradspool method. I
    have no idea about other methods. It may be possible, but would likely
    require using ctlinnd to cancel the articles.



    --
    Grant. . . .

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Mon Dec 11 21:38:03 2023
    Hi Dave,

    Can anyone tell me if there's a way to tell INN to expire
    a set of articles, as a one-time operation, based on their path?

    Grant's method naturally works on tradspool and you can use it.

    In a more general case, you can parse the history file (in <pathdb> as
    set in inn.conf), retrieve the headers of each article (sm -H) and run
    the regexps you wish on these headers.
    As you're asking for a search based on the Path header field, the
    following command will write to a googlegroups.tokens file the storage
    tokens of articles sent from Google Groups:

    perl -ne 'chomp; our ($hash, $timestamps, $_) = split " "; print
    "$_\n" if $_ and qx/sm -q -H "$_" | grep Path/ =~ /!google-groups\.googlegroups\.com!not-for-mail$/' history > googlegroups.tokens

    The command will take a bit of time to run, as INN retrieves every article.

    Then, to delete these articles from your history file, just run "sm -d"
    on them. Something like:

    xargs sm -d < googlegroups.tokens


    Before doing that, check that your regexp worked, by retrieving a few
    storage tokens and verifying they're coming from Google Groups. You can
    see the contents of an article with:

    sm -R '@...token...@'

    (-R in uppercase)


    A perusal of the docs for expire and such have turned up nothing, so
    I'd appreciate some advice on whether or not there's a way to do this.

    The next run of news.daily will properly clean the overview, etc.

    --
    Julien ÉLIE

    « Qui habet aures audiendi, audiat. » (Évangiles)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dave McGuire@21:1/5 to Grant Taylor on Wed Dec 13 21:42:32 2023
    On 12/11/23 00:14, Grant Taylor wrote:
    That being said, I did a find across /var/spool/news/articles and had it
    exec a script per article that looked for Message-IDs that ended with @googlegroups.com.

    This  is actually the second time I've done this.  The first time I did
    it the process removed nearly seven million articles.  Then I found out
    that the Message-ID had a different pattern, likely as fields grew over time.  So I re-ran the process with a more forgiving format.

       export LC_ALL=C
       egrep -lm1 "^Message-ID: <[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+@googlegroups.com>$" ${1} > /dev/null 2>&1
       if [ ${?} -eq 0 ]; then
            echo -n "X"
            rm ${1}
       fi

    I'm sure there are other ways to do this.  But it worked for me.  I was able to let it run in the background in a window.

       time (clear; find $(pwd) -type d | while read DIR; do echo -n "${TS}${${DIR/\/var\/spool\/news\/articles\//}//\//.}${FS}"; find ${DIR} -maxdepth 1 -type f -exec /root/remove-google-groups-news-posting-if-its-spam.sh {} \; ; done; echo)

    The echo / ${TS} / ${FS} isn't important, much less required.  It's
    there because I wanted to update the window title to be the newsgroup
    that was being worked on.

    I'm sure there are better ways to do this.  But this has worked for me
    to do exactly what you're asking to do.

    Hi Grant, thank you, I'll give this a shot. The window title thing
    is a nice touch. :)

    Thanks,
    -Dave

    --
    Dave McGuire, President/Curator
    Large Scale Systems Museum
    New Kensington, PA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Dave McGuire on Wed Dec 13 21:54:51 2023
    On 12/13/23 20:42, Dave McGuire wrote:
      Hi Grant, thank you, I'll give this a shot.  The window title thing
    is a nice touch. :)

    Hi Dave,

    You're welcome.

    Please let me know if it works or if you have questions.



    --
    Grant. . . .

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tom Furie@21:1/5 to Grant Taylor on Thu Dec 14 06:32:28 2023
    Grant Taylor <gtaylor@tnetconsulting.net> writes:

    export LC_ALL=C
    egrep -lm1 "^Message-ID:
    <[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+-[0-9A-Za-z]+@googlegroups.com>$"

    Just an aside in case you need to do something like this again...

    A pattern like

    "^Message-ID: <([[:alnum:]]+-)+[[:alnum:]]+[[:alpha:]]"

    before the "@" should allow the hyphenated grouping to expand
    arbitrarily without intervention required to modify the pattern. I'm
    pretty certain that's all hex (probably a hash of something) until the "n@google...", and I don't think I've ever noticed anything other than
    "n" immediately preceding the "@", so I guess the pattern could be
    something like

    "^Message-ID: <([0-9a-f]+-)+[0-9a-f]+n@google..."

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dave McGuire@21:1/5 to Grant Taylor on Sun Jan 7 13:22:24 2024
    On 12/13/23 22:54, Grant Taylor wrote:
       Hi Grant, thank you, I'll give this a shot.  The window title thing
    is a nice touch. :)

    Hi Dave,

    You're welcome.

    Please let me know if it works or if you have questions.

    Hi Grant, yes it did indeed work. Thank you for your advice.

    -Dave

    --
    Dave McGuire, President/Curator
    Large Scale Systems Museum
    New Kensington, PA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Dave McGuire on Tue Jan 9 22:14:58 2024
    On 1/7/24 12:22, Dave McGuire wrote:
    Hi Grant, yes it did indeed work.  Thank you for your advice.

    Hi Dave,

    Thank you for the follow up. I'm glad that it worked for you. :-)



    --
    Grant. . . .

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)