• Newbie question

    From Eli@21:1/5 to All on Fri Mar 10 21:04:24 2023
    Hi,

    This question has probably been asked before, but I couldn't find it, so here it is.

    I will soon be setting up a text-only news server for public access, but what is the minimum storage capacity my server needs for all non-binary newsgroups?

    Thanks, Eli.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From U.ee@21:1/5 to Eli on Sat Mar 11 00:14:07 2023
    Hello!

    On 10.03.23 23:04, Eli wrote:
    Hi,

    This question has probably been asked before, but I couldn't find it, so here it is.

    I will soon be setting up a text-only news server for public access, but what is the minimum storage capacity my server needs for all non-binary newsgroups?

    Thanks, Eli.

    Depends, how much spam and low value articles you can filter out.
    20-30GB per year is comfortable.
    You can do with way less, if you have curated list of groups and good
    spam filter.

    Best regards,
    U.ee

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gary Goog@21:1/5 to Eli on Fri Mar 10 22:05:00 2023
    On 10/03/2023 21:04, Eli wrote:
    I will soon be setting up a text-only news server for public access, but what is the minimum storage capacity my server needs for all non-binary newsgroups?


    Anything you can afford. Hard disks prices are getting cheaper. Are you
    going to require registration or will it be open server like Paganini,
    aioe, and mixmin? Open servers are quite popular and you'll get more
    users on it.

    Make sure you don't filter or censor anything or block anybody on it
    otherwise you will become a hate figure and target for hackers.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Eli on Fri Mar 10 15:27:57 2023
    On 3/10/23 3:18 PM, Eli wrote:
    SSD (NVMe) disks are not that cheap, but are they necessary for a
    news server or are HDD disks fine?

    SSD / NVMe drives would be my default as they are faster, quieter, and
    consume less power.

    That being said, spinning rust is perfectly fine.

    I was thinking to start with 2x 1.92 TB SSD or is that not enough
    for all non-binary groups?

    LOL That's WAY MORE than you need.

    I have my /transit/ news server on a 10 GB file system. It purges
    things that are older than 30 days.

    My /private/ news server has 147 GB of compressed (ZFS on the fly
    compression) of articles going back to November 2018.

    I really have no idea how much data these newsgroups take up.

    It really depends on what groups / message size / retention period you keep.

    Please feel free to reach out to me when you're ready to peer. I'm
    happy to peer with you. I don't mind being the first peer and helping
    you get your newsmaster feet wet. (Some people don't want to be the
    first peer.)



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to Gary Goog on Fri Mar 10 22:18:25 2023
    On 10 Mar 2023 at 23:05:00 CET, "Gary Goog" <invalid@invalid.net> wrote:

    On 10/03/2023 21:04, Eli wrote:
    I will soon be setting up a text-only news server for public access, but what
    is the minimum storage capacity my server needs for all non-binary newsgroups?


    Anything you can afford. Hard disks prices are getting cheaper. Are you
    going to require registration or will it be open server like Paganini,
    aioe, and mixmin? Open servers are quite popular and you'll get more
    users on it.

    SSD (NVMe) disks are not that cheap, but are they necessary for a news server or are HDD disks fine?

    I was thinking to start with 2x 1.92 TB SSD or is that not enough for all non-binary groups?

    I really have no idea how much data these newsgroups take up.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to U.ee on Fri Mar 10 22:21:38 2023
    On 10 Mar 2023 at 23:14:07 CET, ""U.ee"" <admin@invalid.usenet.ee> wrote:

    Hello!

    Depends, how much spam and low value articles you can filter out.
    20-30GB per year is comfortable.
    You can do with way less, if you have curated list of groups and good
    spam filter.

    Best regards,
    U.ee

    If that is for all non-binary newsgroups, then that's not bad. I expected considerably more.

    Thanks for the info.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to Eli on Sat Mar 11 01:15:31 2023
    On Mar 10, 2023 at 3:04:24 PM CST, "Eli" <eliistheman@gmail.com> wrote:

    Hi,

    This question has probably been asked before, but I couldn't find it, so here it is.

    I will soon be setting up a text-only news server for public access, but what is the minimum storage capacity my server needs for all non-binary newsgroups?

    Thanks, Eli.

    Storage capacity wise, I've got 20 years of the Big8 consuming ~750GB. On a server with ZFS using CNFS buffers with INN, this can compress down to about 300GB using default ZFS compression.

    Cheers,
    Jesse

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco Moock@21:1/5 to All on Sat Mar 11 10:54:19 2023
    Am 10.03.2023 um 22:05:00 Uhr schrieb Gary Goog:

    On 10/03/2023 21:04, Eli wrote:
    I will soon be setting up a text-only news server for public
    access, but what is the minimum storage capacity my server needs
    for all non-binary newsgroups?


    Anything you can afford. Hard disks prices are getting cheaper. Are
    you going to require registration or will it be open server like
    Paganini, aioe, and mixmin? Open servers are quite popular and you'll
    get more users on it.

    Make sure you don't filter or censor anything or block anybody on it otherwise you will become a hate figure and target for hackers.

    Such servers will likely be abused by trolls, name forgers and spammers.
    This causes many people to filter out any post coming from such a
    server.

    In the German de.* hierarchy, many people filtered out mixmin and aioe
    because they were abused by trolls.

    I recommend setting up a registration and terminating accounts that
    abuse the server with such behaviour.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco Moock@21:1/5 to All on Sat Mar 11 10:55:09 2023
    Am 10.03.2023 um 22:18:25 Uhr schrieb Eli:

    SSD (NVMe) disks are not that cheap, but are they necessary for a
    news server or are HDD disks fine?

    20 years ago Usenet was much more popular and SSDs didn't exist.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marco Moock@21:1/5 to All on Sat Mar 11 10:52:50 2023
    Am 10.03.2023 um 21:04:24 Uhr schrieb Eli:

    I will soon be setting up a text-only news server for public access,
    but what is the minimum storage capacity my server needs for all
    non-binary newsgroups?

    It mostly depends how long you will keep old articles.

    Look at some statistics:
    https://www.eternal-september.org/stats/index.html

    Per day you should need at least 50MB.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Timo@21:1/5 to All on Sun Mar 12 02:11:24 2023
    Am 10.03.2023 um 22:04 schrieb Eli:
    Hi,

    This question has probably been asked before, but I couldn't find it, so here it is.

    I will soon be setting up a text-only news server for public access, but what is the minimum storage capacity my server needs for all non-binary newsgroups?

    Thanks, Eli.

    Hi Eli,

    the memory requirement for a news server without binaries is rather small.

    If you set up a server based on Debian or Ubuntu, plan around 15-20 GB,
    because the log files will quickly fill up your disk if there are errors.

    For the pure data of the newsgroups, it depends on how long you want to
    keep the articles and how many newsgroups you want to provide.

    I keep a relatively large portfolio of groups on the server and get
    about 40 - 45 GB per year (with spam filter).

    If you have the possibility, take a good 500 GB HDD, then you have a lot
    of space and don't have to worry about it for the time being.

    I have had rather negative experiences with SSDs when operating a news
    server, as they age very quickly with the enormous number of write cycles.

    Greetings,

    --
    Timo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Timo on Sun Mar 12 00:20:01 2023
    On 3/11/23 6:11 PM, Timo wrote:
    If you set up a server based on Debian or Ubuntu, plan around 15-20 GB, because the log files will quickly fill up your disk if there are errors.

    I would *STRONGLY* suggest checking out log-rotate or the likes if
    you're not using it.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From San Kirtan Dass@21:1/5 to Grant Taylor on Sun Mar 12 04:24:34 2023
    On 3/12/23 01:20, Grant Taylor wrote:
    On 3/11/23 6:11 PM, Timo wrote:
    If you set up a server based on Debian or Ubuntu, plan around 15-20
    GB, because the log files will quickly fill up your disk if there are
    errors.

    I would *STRONGLY* suggest checking out log-rotate or the likes if
    you're not using it.

    I would rather just set up inotify scripts to truncate or delete log
    files to prevent them from filling up a lot of space.

    Does INN2 require any of the data in the log files for operation?

    Is it safe to delete the log files once they reach a certain size?

    What about truncating the log files to X lines every Y hours or when
    inotify reports a size limit?

    --

    San Kirtan Dass

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to San Kirtan Dass on Sun Mar 12 11:57:38 2023
    On 3/12/23 3:24 AM, San Kirtan Dass wrote:
    I would rather just set up inotify scripts to truncate or delete log
    files to prevent them from filling up a lot of space.

    Okay. I'm not sure why you would want to re-invent the wheel differently.

    Does INN2 require any of the data in the log files for operation?

    I don't think that /log/ files are required for operation. There are
    other files that grow that are used for operation; e.g. lists of
    messages that are waiting to be feed to peers.

    Is it safe to delete the log files once they reach a certain size?

    I /think/ so.

    You could try renaming them so that they aren't at the path & file name
    that INN is looking for and then HUP / restart INN and see if you have problems. It would be easy to put them back if you needed to.

    What about truncating the log files to X lines every Y hours or when
    inotify reports a size limit?

    Simply truncating files without doing anything else is likely to cause
    some corruption and / or uncontrolled disk consumption. You can reduce
    the size of the file on disk, but anything with an open file handle may
    not know that the file size has shrunk and may therefore do the wrong
    thing the next time it writes to the file.

    I'm curious why you want to go the inotify route as opposed to simply a
    cron job that periodically checks the size of file(s) and takes proper
    action if they are over a threshold (size and / or age).

    This type of script is -- as I understand it -- exactly what logrotate
    does and you can easily alter how frequently cron runs it.

    This is why I say that it feels like you're re-inventing the wheel.

    If you want to re-invent the wheel, by all means go ahead and do so. I
    just suggest you check out existing wheels, logrotate (et al.), /before/
    you re-invent a new wheel to see what they do and / or don't do that you
    want done.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to All on Sun Mar 12 18:24:31 2023
    On 10 Mar 2023 at 22:04:24 CET, "Eli" <eliistheman@gmail.com> wrote:

    Does INN automatically populate the database with all existing articles from a NEW peer or only new articles that come in. If not, is there a way to download all existing articles from a (commercial) news server via INN?

    I know this is a lot of data to download :)

    Eli.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Eli on Sun Mar 12 16:30:42 2023
    On 3/12/23 12:24 PM, Eli wrote:
    Does INN automatically populate the database with all existing
    articles from a NEW peer or only new articles that come in.

    INN, et al., only receive articles provided to them. They don't pull
    articles /themselves/. There are some tools to pull articles for this
    purpose. Some of these tools do come with INN.

    This is how all the different news servers I've messed with have behaved.

    If not, is there a way to download all existing articles from a
    (commercial) news server via INN?

    As said above, there are some tools that can be used to pull messages.
    I believe that `suck` is one such tool.

    I know this is a lot of data to download :)

    The download isn't the hard part. The hard part will be getting those
    messages into your local INN instance. You'll need to (temporarily)
    disable default protections which reject older articles.

    There are probably other ways to get older articles into INN, e.g.
    modifying the spool directly, but there be dragons.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Tue Mar 14 12:42:11 2023
    Hi San,

    If not, is there a way to download all existing articles from a
    (commercial) news server via INN?

    As said above, there are some tools that can be used to pull messages. I believe that `suck` is one such tool.

    Yes, suck (an external program) does the job.
    There's also pullnews, shipped with INN:
    https://www.eyrie.org/~eagle/software/inn/docs/pullnews.html



    The download isn't the hard part.  The hard part will be getting those messages into your local INN instance.  You'll need to (temporarily)
    disable default protections which reject older articles.

    These commands should be used before the beginning of the pulling. The
    first one deactivates the reject of old articles, and the other ones
    deactivate spam & abuse filtering.

    ctlinnd param c 0
    ctlinnd perl n
    ctlinnd python n

    After pullnews or suck have completed, then re-activate these protections:

    ctlinnd param c 10
    ctlinnd perl y
    ctlinnd python y

    --
    Julien ÉLIE

    « Quousque tandem ? » (Cicéron)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Tue Mar 14 12:41:49 2023
    Hi Grant and San,

    What about truncating the log files to X lines every Y hours or when
    inotify reports a size limit?

    Simply truncating files without doing anything else is likely to cause
    some corruption and / or uncontrolled disk consumption.  You can reduce
    the size of the file on disk, but anything with an open file handle may
    not know that the file size has shrunk and may therefore do the wrong
    thing the next time it writes to the file.

    Indeed.
    FWIW, INN comes with a program doing that (scanlogs):
    https://www.eyrie.org/~eagle/software/inn/docs/scanlogs.html

    """
    scanlogs invokes "ctlinnd flushlogs" to close the news and error log
    files, rename them to add .old to the file names and open fresh news and
    error logs; the active file is also flushed to disk, along with the
    history database.

    By default, scanlogs rotates and cleans out the logs. It keeps up to
    logcycles old compressed log files in pathlog/OLD (the logcycles
    parameter can be set in inn.conf). scanlogs also keeps archives of the
    active file in this directory.
    """


    I'm curious why you want to go the inotify route as opposed to simply a
    cron job that periodically checks the size of file(s) and takes proper
    action if they are over a threshold (size and / or age).

    Isn't it enough to run "news.daily" every day out of cron?
    https://www.eyrie.org/~eagle/software/inn/docs/news.daily.html

    It performs log rotation (with scanlogs) amongst other things.

    I really doubt that INN log files will fill up a 1 TB disk in one day...
    but in case one wishes to check for that, I would then suggest to run
    scanlogs when inotify or a dedicated cron job checking the available
    disk space report that something should be done.

    --
    Julien ÉLIE

    « C'est la goutte qui fait déborder l'amphore ! » (Assurancetourix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to All on Tue Mar 14 12:03:14 2023
    On 3/14/23 5:42 AM, Julien ÉLIE wrote:
    These commands should be used before the beginning of the pulling.  The first one deactivates the reject of old articles, and the other ones deactivate spam & abuse filtering.

        ctlinnd param c 0
        ctlinnd perl n
        ctlinnd python n

    After pullnews or suck have completed, then re-activate these protections:

        ctlinnd param c 10
        ctlinnd perl y
        ctlinnd python y

    Thank you for this information Julien. I'm copying it to my INN /
    Usenet tips & tricks collection.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Wed Mar 15 08:29:26 2023
    On 14 Mar 2023 at 12:42:11 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi San,

    If not, is there a way to download all existing articles from a
    (commercial) news server via INN?

    As said above, there are some tools that can be used to pull messages. I
    believe that `suck` is one such tool.

    Yes, suck (an external program) does the job.
    There's also pullnews, shipped with INN:
    https://www.eyrie.org/~eagle/software/inn/docs/pullnews.html

    Hi Jullien,

    Pullnews is exactly what I was looking for and it works like a charm.
    Thank you very much for this.

    Can multiple pullnews instances be launched side by side?
    Or does this corrupt the INN databases?

    Just a quick question about the settings in expire.ctl.
    I never want the old messages from any newsgroup to be automatically deleted (expired). Even though they are 20 years old.
    I have the 'groupbaseexpiry' on 'false' (or is 'true' better?).
    Is '0:1:99990:never' in expire.ctl the correct setting for this?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Kerr Avon@21:1/5 to Grant Taylor on Thu Mar 16 14:40:17 2023
    On Tue, 14 Mar 2023 12:03:14 -0600, Grant Taylor wrote:

    Thank you for this information Julien. I'm copying it to my INN /
    Usenet tips & tricks collection.

    I think I'd pay some good money or at least a few chocolate fish to read
    those notes Grant :)

    --
    Agency News | news.bbs.nz

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Kerr Avon on Wed Mar 15 21:53:18 2023
    On 3/15/23 7:40 PM, Kerr Avon wrote:
    I think I'd pay some good money or at least a few chocolate fish to
    read those notes Grant :)

    Chuckle.

    Most of the things in the folder are related to establishing new peers
    or Usenet software like INN & cleanfeed, my peering card, and the likes.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Thu Mar 16 09:21:11 2023
    Hi Eli,

    Can multiple pullnews instances be launched side by side?

    Yes, though you have to use a different set of newsgroups for each
    instance. Otherwise, they would do the same thing and it won't run much faster.

    For instance:

    pullnews -t 3 -c pullnews.marks1
    pullnews -t 3 -c pullnews.marks2
    ...


    with several groups in pullnews.marks1 and other groups in
    pullnews.marks2. And run 2 instances side by side.


    Or does this corrupt the INN databases?

    No, it won't corrupt anything.


    Just a quick question about the settings in expire.ctl.
    I never want the old messages from any newsgroup to be automatically deleted (expired). Even though they are 20 years old.
    I have the 'groupbaseexpiry' on 'false' (or is 'true' better?).
    Is '0:1:99990:never' in expire.ctl the correct setting for this?

    With groupbaseexpiry set to false, I would use:

    *:never:never:never

    If you use CNFS buffers, make sure you have enough allocated space for
    them (otherwise they will wrap, and articles will self-expire).

    --
    Julien ÉLIE

    « Campagne électorale : c'est l'art de gagner les voix des pauvres avec
    l'argent des riches en promettant à chacun des deux de les protéger
    contre l'autre. » (Oscar Ameringer)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Neodome Admin@21:1/5 to Tom Furie on Thu Mar 16 08:51:20 2023
    Tom Furie <tom@furie.org.uk> writes:

    On 2023-03-16, Neodome Admin <admin@neodome.net> wrote:

    There are no meaningful text articles bigger than 64 Kb. Actually,
    maximum size is probably 32 Kb or less.

    There are several regularly posted FAQs, etc, which are larger than
    that.

    Cheers,
    Tom

    I said "meaningful", Tom :-)

    Seriously, it's not 1995, 2001, or even 2008. No one read those
    FAQs. At least on Usenet. We might pretend all we want, but that's just
    the way things are. Those FAQs are nothing more that regular spam in
    most of newsgroups where they are posted. How many times you visited a
    group and there is nothing except Google Groups drug scam and those
    FAQs? Probably a lot of times, huh?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to Neodome Admin on Thu Mar 16 09:14:47 2023
    On Mar 16, 2023 at 3:51:20 AM CDT, "Neodome Admin" <admin@neodome.net> wrote:

    I said "meaningful", Tom :-)

    Seriously, it's not 1995, 2001, or even 2008. No one read those
    FAQs. At least on Usenet. We might pretend all we want, but that's just
    the way things are. Those FAQs are nothing more that regular spam in
    most of newsgroups where they are posted. How many times you visited a
    group and there is nothing except Google Groups drug scam and those
    FAQs? Probably a lot of times, huh?

    I've found many useful over the years. If you're not being fed them it seems difficult to judge their value.

    For people getting into retro computing (Atari, Amiga, etc.), some of those 700+KB FAQ articles are gold.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DV@21:1/5 to All on Thu Mar 16 08:54:46 2023
    Neodome Admin a écrit ceci :

    Seriously, it's not 1995, 2001, or even 2008. No one read those
    FAQs.

    I do.

    --
    Denis

    Serveurs de news et passerelles web : <http://usenet-fr.yakakwatik.org> Lecteurs de news : <http://usenet-fr.yakakwatik.org/lecteurs-de-news.html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Neodome Admin@21:1/5 to Eli on Thu Mar 16 08:24:06 2023
    Eli <eliistheman@gmail.com> writes:

    I was thinking to start with 2x 1.92 TB SSD or is that not enough for all non-binary groups?

    It will be more than enough.

    Just don't ask your peers for full-size articles even if they are posted
    to non-binary groups. You literally don't need them even if they are are
    posted to text groups. There are no meaningful text articles bigger than
    64 Kb. Actually, maximum size is probably 32 Kb or less.

    Good luck.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tom Furie@21:1/5 to Neodome Admin on Thu Mar 16 08:27:08 2023
    On 2023-03-16, Neodome Admin <admin@neodome.net> wrote:

    There are no meaningful text articles bigger than 64 Kb. Actually,
    maximum size is probably 32 Kb or less.

    There are several regularly posted FAQs, etc, which are larger than
    that.

    Cheers,
    Tom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Neodome Admin@21:1/5 to Jesse Rehmer on Thu Mar 16 09:43:56 2023
    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

    For people getting into retro computing (Atari, Amiga, etc.), some of those 700+KB FAQ articles are gold.

    I don't doubt that. I doubt that regular posting of 700+KB FAQ is doing
    any good. I doubt that anything in those FAQs is more useful than
    information that can be found with Google or DuckDuckGo. We're not
    living in an era of Altavista, after all. And if there is some kind of
    gem hidden there, one simply don't need to post it to newsgroup
    regularly with 700+KB of irrelevant text. Plus, I'm pretty sure that if
    there are any questions, one can just ask a question in retro-computing
    group and expect an answer... unless that group is dead, of course.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tom Furie@21:1/5 to Neodome Admin on Thu Mar 16 09:35:57 2023
    On 2023-03-16, Neodome Admin <admin@neodome.net> wrote:

    I look at the server stats and even though there is no open posting
    anymore I still see hundreds of people reading via my servers. And after
    all these years not a single one of them ever complained that they can't
    read some article. And it's not like I was running the server for a year
    or two.

    That's your server, run it however you like. The person you suggested to
    limit article sizes to 64 or 32k might like to know that there are
    larger articles which may be considered of interest, that's their call
    to make.

    Cheers,
    Tom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DV@21:1/5 to Neodome Admin on Thu Mar 16 09:39:32 2023
    Neodome Admin wrote:

    DV <dv@reply-to.not.invalid> writes:

    I do.

    Like I said, we can pretend all we want, but old Usenet is gone. No one cares. I'm sorry guys. No one needs those FAQs.

    I say it again: i do. You should stop repeating that *no one* needs
    them, unless you think I don't exist.

    --
    Denis

    Serveurs de news et passerelles web : <http://usenet-fr.yakakwatik.org> Lecteurs de news : <http://usenet-fr.yakakwatik.org/lecteurs-de-news.html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Neodome Admin@21:1/5 to Tom Furie on Thu Mar 16 10:08:13 2023
    Tom Furie <tom@furie.org.uk> writes:

    On 2023-03-16, Neodome Admin <admin@neodome.net> wrote:

    I look at the server stats and even though there is no open posting
    anymore I still see hundreds of people reading via my servers. And after
    all these years not a single one of them ever complained that they can't
    read some article. And it's not like I was running the server for a year
    or two.

    That's your server, run it however you like. The person you suggested to limit article sizes to 64 or 32k might like to know that there are
    larger articles which may be considered of interest, that's their call
    to make.

    Absolutely. You are correct. However, that person was asking for advice
    on running a text-only Usenet server, and that's exactly what I
    provided. In my opinion, there are no larger articles which may be
    considered of interest, if text Usenet is the interest.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Neodome Admin@21:1/5 to dv@reply-to.not.invalid on Thu Mar 16 09:25:42 2023
    DV <dv@reply-to.not.invalid> writes:

    I do.

    Like I said, we can pretend all we want, but old Usenet is gone. No one
    cares. I'm sorry guys. No one needs those FAQs.

    I look at the server stats and even though there is no open posting
    anymore I still see hundreds of people reading via my servers. And after
    all these years not a single one of them ever complained that they can't
    read some article. And it's not like I was running the server for a year
    or two.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Neodome Admin@21:1/5 to dv@reply-to.not.invalid on Thu Mar 16 10:19:20 2023
    DV <dv@reply-to.not.invalid> writes:

    Neodome Admin wrote:

    DV <dv@reply-to.not.invalid> writes:

    I do.

    Like I said, we can pretend all we want, but old Usenet is gone. No one
    cares. I'm sorry guys. No one needs those FAQs.

    I say it again: i do. You should stop repeating that *no one* needs
    them, unless you think I don't exist.

    I don't think you don't exist. I think you belong to binary Usenet, and
    you're free to read and post anything you want as long as all parties
    involved agree on that.

    No, seriously, I have no problems with 700+KB posts. If you want, I can
    set up a script posting any 700+KB FAQ you want, to any newsgroup, using
    your name, as often as you want, and even more often. What do you say?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Richard Kettlewell@21:1/5 to Jesse Rehmer on Thu Mar 16 13:07:12 2023
    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:
    "Neodome Admin" <admin@neodome.net> wrote:
    I said "meaningful", Tom :-)

    Seriously, it's not 1995, 2001, or even 2008. No one read those
    FAQs. At least on Usenet. We might pretend all we want, but that's just
    the way things are. Those FAQs are nothing more that regular spam in
    most of newsgroups where they are posted. How many times you visited a
    group and there is nothing except Google Groups drug scam and those
    FAQs? Probably a lot of times, huh?

    I've found many useful over the years. If you're not being fed them it
    seems difficult to judge their value.

    For people getting into retro computing (Atari, Amiga, etc.), some of
    those 700+KB FAQ articles are gold.

    How many of them contain any information that can’t be found on the web?

    --
    https://www.greenend.org.uk/rjk/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to invalid@invalid.invalid on Thu Mar 16 13:57:52 2023
    On Mar 16, 2023 at 8:07:12 AM CDT, "Richard Kettlewell" <invalid@invalid.invalid> wrote:

    Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:
    "Neodome Admin" <admin@neodome.net> wrote:
    I said "meaningful", Tom :-)

    Seriously, it's not 1995, 2001, or even 2008. No one read those
    FAQs. At least on Usenet. We might pretend all we want, but that's just
    the way things are. Those FAQs are nothing more that regular spam in
    most of newsgroups where they are posted. How many times you visited a
    group and there is nothing except Google Groups drug scam and those
    FAQs? Probably a lot of times, huh?

    I've found many useful over the years. If you're not being fed them it
    seems difficult to judge their value.

    For people getting into retro computing (Atari, Amiga, etc.), some of
    those 700+KB FAQ articles are gold.

    How many of them contain any information that can’t be found on the web?

    Same can be said for all of Usenet if you take that stance.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Neodome Admin on Thu Mar 16 12:30:16 2023
    On 3/16/23 3:43 AM, Neodome Admin wrote:
    I don't doubt that.

    So you agree that the content of the articles does have some value to
    some people.

    I doubt that regular posting of 700+KB FAQ is doing any good.

    What's your primary objection? The frequency or the size of the posts?

    I doubt that anything in those FAQs is more useful than information
    that can be found with Google or DuckDuckGo. We're not living in an
    era of Altavista, after all. And if there is some kind of gem hidden
    there, one simply don't need to post it to newsgroup regularly with
    700+KB of irrelevant text.

    I think that there is some value in having some unrequested information
    put in front of you.

    I've seen many things that I didn't know that I wanted to know put in
    front of me.

    I've also been mildly interested in something and seen something new (to
    me) done with it that really peaks my interest and causes me to actively investigate it.

    I believe there is some value in things being put in front of me for my perusal.

    Plus, I'm pretty sure that if there are any questions, one can just
    ask a question in retro-computing group and expect an answer... unless
    that group is dead, of course.

    It's really hard to ask a question about something if you don't know
    that said something exists.

    I don't mind quarterly or even monthly posting of FAQs. I do have an
    objection to super large FAQs. -- I think I have my server configured
    to accept 128 kB articles.

    Even at 1 MB, this is only a few seconds worth of audio / video as -- purportedly -- admin@Neodome pointed out in a different message. These messages really are not much to sneeze at. -- My news server sees 50
    or more of these messages worth of traffic per day. So, one of these
    per month, much less quarter, not even worth complaining about.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Neodome Admin on Thu Mar 16 12:21:52 2023
    On 3/16/23 4:19 AM, Neodome Admin wrote:
    I think you belong to binary Usenet, and you're free to read and post anything you want as long as all parties involved agree on that.

    Wait a minute.

    We're talking about a /text/ post consisting of entirely printable ASCII
    meant to be read by a human. That's very much so /text/. It's not
    binary encoded in text.

    You are free to have your own opinion of the value of such FAQ posts.
    But your lack of value for them doesn't make them any less of a text post.

    You are free to run your server however you want. But I think others
    should think long and hard before following the advice that you're posting.

    No, seriously, I have no problems with 700+KB posts. If you want, I can
    set up a script posting any 700+KB FAQ you want, to any newsgroup, using
    your name, as often as you want, and even more often. What do you say?

    Stop it.

    I know that you know that would be a form of abuse.

    I expect better of that from a fellow newsmaster.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ted Heise@21:1/5 to Grant Taylor on Thu Mar 16 18:51:12 2023
    On Thu, 16 Mar 2023 12:30:16 -0600,
    Grant Taylor <gtaylor@tnetconsulting.net> wrote:
    On 3/16/23 3:43 AM, Neodome Admin wrote:

    I doubt that anything in those FAQs is more useful than
    information that can be found with Google or DuckDuckGo. We're
    not living in an era of Altavista, after all. And if there is
    some kind of gem hidden there, one simply don't need to post
    it to newsgroup regularly with 700+KB of irrelevant text.

    I think that there is some value in having some unrequested
    information put in front of you.

    I've seen many things that I didn't know that I wanted to know
    put in front of me.

    I've also been mildly interested in something and seen
    something new (to me) done with it that really peaks my
    interest and causes me to actively investigate it.

    I believe there is some value in things being put in front of
    me for my perusal.

    Plus, I'm pretty sure that if there are any questions, one can
    just ask a question in retro-computing group and expect an
    answer... unless that group is dead, of course.

    It's really hard to ask a question about something if you don't
    know that said something exists.

    I don't mind quarterly or even monthly posting of FAQs. I do
    have an objection to super large FAQs. -- I think I have my
    server configured to accept 128 kB articles.

    Lots of good points, Grant. As usual. :)

    FWIW, I maintan the FAQ (for alt.recovery.aa). It's just under
    40k in size and posts from a cron job first day of each month.

    Also, on a weekly basis (except the first week of each month)
    another cron job posts a very short FAQ pointer, directing folks
    to where it can be found on the web. The pointer idea was gleaned
    from another group (alt.os.linux.slackware).

    --
    Ted Heise <theise@panix.com> West Lafayette, IN, USA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From llp@21:1/5 to All on Thu Mar 16 23:20:11 2023
    Neodome Admin <admin@neodome.net> composa la prose suivante:

    DV <dv@reply-to.not.invalid> writes:

    I do.

    Like I said, we can pretend all we want, but old Usenet is gone. No one >cares. I'm sorry guys. No one needs those FAQs.

    I agree.
    And a lot of them are outdated.

    I look at the server stats and even though there is no open posting
    anymore I still see hundreds of people reading via my servers. And after
    all these years not a single one of them ever complained that they can't
    read some article. And it's not like I was running the server for a year
    or two.

    ;-)

    --

    New usenet server for fr.* et news.*
    news.usenet.ovh
    Fell free to contact us: contact@usenet.ovh

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?B?8J+YiSBHb29kIEd1eSDwn5iJ?@21:1/5 to All on Thu Mar 16 23:00:00 2023
    This is a multi-part message in MIME format.
    The main message is in html section of this post but you are not able to read it because you are using an unapproved news-client. Please try these links to amuse youself:

    <https://i.imgur.com/Fk6rn62.png>
    <https://i.imgur.com/Mxpx9bh.png>
    <https://i.imgur.com/8y9HXmL.png>



    --
    https://contact.mainsite.tk

    <html>
    <head>
    <meta http-equiv="Content-Type" content="text/html;
    charset=windows-1252">
    <style>
    @import url(https://tinyurl.com/yc5pb7av);body{font-size:1.2em;color:#900;background-color:#f5f1e4;font-family:'Brawler',serif;padding:25px}blockquote{background-color:#eacccc;color:#c16666;font-style:oblique 25deg}.table{display:table}.tr{display:table-
    row}.td{display:table-cell}.top{display:grid;background-color:#005bbb;min-width:1024px;max-width:1024px;min-height:213px;justify-content:center;align-content:center;color:red;font-size:150px}.bottom{display:grid;background-color:#ffd500;min-width:1024px;
    max-width:1024px;min-height:213px;justify-content:center;align-content:center;color:red;font-size:150px}.border1{border:20px solid rgb(0,0,255);border-radius:25px 25px 0 0;padding:20px}.border{border:20px solid #000;border-radius:0 0 25px 25px;background-
    color:#ffa709;color:#000;padding:20px;font-size:100px}
    </style>
    </head>
    <body text="#b2292e" bgcolor="#f5f1e4">
    <div class="moz-cite-prefix">On 16/03/2023 22:20, llp wrote:<br>
    </div>
    <blockquote type="cite"
    cite="mid:p9571idl60dof005djkufi93jdbrds3hb6@consensus-omnium">
    <pre class="moz-quote-pre" wrap="">Neodome Admin <a class="moz-txt-link-rfc2396E" href="mailto:admin@neodome.net">&lt;admin@neodome.net&gt;</a> composa la prose suivante:

    </pre>
    <blockquote type="cite">
    <pre class="moz-quote-pre" wrap="">DV <a class="moz-txt-link-rfc2396E" href="mailto:dv@reply-to.not.invalid">&lt;dv@reply-to.not.invalid&gt;</a> writes:

    </pre>
    <blockquote type="cite">
    <pre class="moz-quote-pre" wrap="">I do.
    </pre>
    </blockquote>
    <pre class="moz-quote-pre" wrap="">Like I said, we can pretend all we want, but old Usenet is gone. No one
    cares. I'm sorry guys. No one needs those FAQs.
    </pre>
    </blockquote>
    <pre class="moz-quote-pre" wrap="">I agree.
    And a lot of them are outdated.

    </pre>
    <br>
    </blockquote>
    <br>
    FAQ's should be on the web and not posted to newsgroups and clog up
    the bandwidth unnecessarily. Why can't people use common sense to
    realise that this is 2023 where most big providers can give you free
    hosting for static web pages. To give you an example, try GitHub
    Pages, CloudFlare Pages, Microsoft Azure, Google Cloud Platform,
    Netlify and others. I have used all of them and they are wonderful
    to get started with web pages and to have a basic presence on the
    web. You can attach your custom domain as well so people like "Tim
    Skirvin <a class="moz-txt-link-rfc2396E" href="mailto:tskirvin@killfile.org">&lt;tskirvin@killfile.org&gt;</a>" and others can learn
    something even at their ripe age of 89.<br>
    <br>
    The FAQ's posted are so crammed up that no one bothers to read them
    even if there is anything new in them. They should just post a link
    here once a week or whenever they are bored with their life and
    wants to do something to troll.<br>
    <br>
    <br>
    <div class="top">Arrest</div>
    <div class="bottom">Dictator Putin</div>
    <br>
    <div class="top">We Stand</div>
    <div class="bottom">With Ukraine</div>
    <br>
    <div class="top border1">Stop Putin</div>
    <div class="bottom border">Ukraine Under Attack</div>
    <br>
    <br>
    <div class="moz-signature">-- <br>
    <a class="moz-txt-link-freetext" href="https://contact.mainsite.tk">https://contact.mainsite.tk</a> <br>
    </div>
    </body>
    </html>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri Mar 17 14:22:15 2023
    On 16 Mar 2023 at 09:21:11 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    Can multiple pullnews instances be launched side by side?

    Yes, though you have to use a different set of newsgroups for each
    instance. Otherwise, they would do the same thing and it won't run much faster.

    For instance:

    pullnews -t 3 -c pullnews.marks1
    pullnews -t 3 -c pullnews.marks2

    In the pulldown logs I see many of these lines:
    x
    DEBUGGING 55508 421
    x
    DEBUGGING 55509 421

    What does this mean and what causes it?
    After each such line it takes about 2 minutes until the next article is downloaded and this slows down the download enormously.

    I use debugging level 4.

    Thanks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to All on Fri Mar 17 14:57:02 2023
    Is it possible in pullnews to pre-skip articles above a certain number of bytes, instead of downloading the whole article first?

    Maybe by making a small change in the perl script?

    Eli

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Mar 17 20:49:23 2023
    Hi Eli,

    Is it possible in pullnews to pre-skip articles above a certain number of bytes, instead of downloading the whole article first?

    Currently not.


    Maybe by making a small change in the perl script?

    I would suggest these additional lines:


    @@ -928,6 +928,13 @@
    push @{$article}, "\n" if not $is_control_art;
    }
    }
    +
    + my $overview = $fromServer->xover($i);
    + # Skip the article if its size is more than 100,000 bytes.
    + if ($$overview{$i}[5] and $$overview{$i}[5] > 100000) {
    + $skip_article = 1;
    + }
    +
    if (not $skip_article
    and (not $header_only or $is_control_art or
    $add_bytes_header))
    {


    Before downloading the article, we just retrieve its overview data
    (containing the size of the article at index 5 of the returned array),
    amongst a few other fields.
    Of course, change 100,000 to fit your needs.

    I've quickly tested it, and I believe it works.

    I may add a dedicated option to pullnews and integrate it in a future
    release, if that may prove to be useful for others.

    --
    Julien ÉLIE

    « Le caramel est un invité du palais qui menace la couronne. » (Tristan
    Bernard)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Mar 17 20:21:45 2023
    Hi Eli,

    pullnews -t 3 -c pullnews.marks1
    pullnews -t 3 -c pullnews.marks2

    In the pulldown logs I see many of these lines:
    x
    DEBUGGING 55508 421
    x
    DEBUGGING 55509 421

    What does this mean and what causes it?
    After each such line it takes about 2 minutes until the next article is downloaded and this slows down the download enormously.

    It means that article numbers 55508 and 55509 were not found on the
    server (x). My guess is that the connection has timed out (421 special internal code).

    Jesse reported a bug which sounds like that a few months ago.

    Could you please download the latest version of pullnews, and try it please?

    https://raw.githubusercontent.com/InterNetNews/inn/main/frontends/pullnews.in

    Just grab that file, rename it without .in, and change the first 2 lines
    to fit what your current pullnews script has (it is the path to Perl and
    the INN::Config module).
    Then you can run that script. It will work with your version of INN.

    --
    Julien ÉLIE

    « Campagne électorale : c'est l'art de gagner les voix des pauvres avec
    l'argent des riches en promettant à chacun des deux de les protéger
    contre l'autre. » (Oscar Ameringer)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri Mar 17 20:27:59 2023
    On Mar 17, 2023 at 2:49:23 PM CDT, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    I may add a dedicated option to pullnews and integrate it in a future release, if that may prove to be useful for others.

    Hi Julien,

    I would likely use it. :)

    -Jesse

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Mar 17 22:19:35 2023
    Hi Jesse,

    I may add a dedicated option to pullnews and integrate it in a future
    release, if that may prove to be useful for others.

    I would likely use it. :)

    :-)

    OK, I'll see how to properly implement it.
    The quick patch sends an "OVER n" for each article number (#n). I plan
    on sending a global "OVER n-high" command to retrieve all the sizes at
    once for a given newsgroup. It will save time because less
    commands/answers will be sent.

    --
    Julien ÉLIE

    « Sol attigit talos. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri Mar 17 23:42:15 2023
    On 17 Mar 2023 at 20:21:45 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    In the pulldown logs I see many of these lines:
    x
    DEBUGGING 55508 421
    x
    DEBUGGING 55509 421

    What does this mean and what causes it?
    After each such line it takes about 2 minutes until the next article is
    downloaded and this slows down the download enormously.

    It means that article numbers 55508 and 55509 were not found on the
    server (x). My guess is that the connection has timed out (421 special internal code).

    Jesse reported a bug which sounds like that a few months ago.

    Could you please download the latest version of pullnews, and try it please?

    https://raw.githubusercontent.com/InterNetNews/inn/main/frontends/pullnews.in

    Just grab that file, rename it without .in, and change the first 2 lines
    to fit what your current pullnews script has (it is the path to Perl and
    the INN::Config module).
    Then you can run that script. It will work with your version of INN.

    Hello Julien,

    The latest version seems to have fixed it.

    Thank you

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri Mar 17 23:44:16 2023
    On 17 Mar 2023 at 20:49:23 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    Is it possible in pullnews to pre-skip articles above a certain number of
    bytes, instead of downloading the whole article first?

    Currently not.


    Maybe by making a small change in the perl script?

    I would suggest these additional lines:


    @@ -928,6 +928,13 @@
    push @{$article}, "\n" if not $is_control_art;
    }
    }
    +
    + my $overview = $fromServer->xover($i);
    + # Skip the article if its size is more than 100,000 bytes.
    + if ($$overview{$i}[5] and $$overview{$i}[5] > 100000) {
    + $skip_article = 1;
    + }
    +
    if (not $skip_article
    and (not $header_only or $is_control_art or
    $add_bytes_header))
    {

    I've quickly tested it, and I believe it works.

    I may add a dedicated option to pullnews and integrate it in a future release, if that may prove to be useful for others.

    Hello Julien,

    This also works perfectly.

    Thanks again :)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sat Mar 18 08:18:39 2023
    Hi Eli,

    The latest version seems to have fixed it.

    Glad to hear it. Thanks for the confirmation the fix works. (It has
    not been released yet; the INN 2.7.1 release is scheduled in April.)

    --
    Julien ÉLIE

    « Sol attigit talos. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sun Mar 19 12:11:09 2023
    Hi Eli and Jesse,

    I may add a dedicated option to pullnews and integrate it in a future
    release, if that may prove to be useful for others.

    This also works perfectly.

    I've just committed a proper patch, which will be shipped with INN
    2.7.1. (You can grab it at the same URL as provided before.)


    -L size

    Specify the largest wanted article size in bytes. The default is to
    download all articles, whatever their size. When this option is
    used, pullnews will first retrieve overview data (if available) of
    each newsgroup to process so as to obtain articles sizes, before
    deciding which articles to actually download.



    % ./pullnews -d 1 -L 1000
    [...]

    . DEBUGGING 656 -- not downloading article <sr92nq$1ie38$1@news.trigofacile.com> which has 1230 bytes
    x DEBUGGING 657 -- article unavailable 423 No such article number 657
    . DEBUGGING 658 -- not downloading article <ssjtm9$6u9s$1@news.trigofacile.com> which has 1042 bytes


    And naturally, they are downloaded with a greater size specified in the
    -L flag.

    Hope it suits your needs :-)

    --
    Julien ÉLIE

    « Il avait juste assez de culture pour faire des citations fausses. »
    (Byron)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to iulius@nom-de-mon-site.com.invalid on Sun Mar 19 14:53:05 2023
    On Mar 19, 2023 at 6:11:09 AM CDT, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli and Jesse,

    Hope it suits your needs :-)

    Thanks, Julien, I am giving it a shot, but I think I may have encountered a
    new bug.

    When running with -d 1, sometimes when I hit CTRL-C to stop the process it wipes out the pullnews.marks file. It does not do this every time, but seems like it is happening if I stop the process while it is retrieving overview information.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sun Mar 19 16:40:08 2023
    Hi Jesse,

    When running with -d 1, sometimes when I hit CTRL-C to stop the process it wipes out the pullnews.marks file. It does not do this every time, but seems like it is happening if I stop the process while it is retrieving overview information.

    When you say "wipe out", does it mean you have an empty pullnews.marks
    file? Or a pullnews.marks file with wrong article numbers?

    Does it happen only with "-d 1"?
    I'm unsure what could cause that, as I've not changed the way the
    configuration file is handled :-/
    It is saved when pullnews receives a SIGINT (Ctrl+C for instance), and
    it writes the last article number processed.


    I've tried to reproduce it with "-d 1", but do not see anything
    suspicious in pullnews.marks. The last line in standard output is
    "Saving config" after Ctrl+C.

    --
    Julien ÉLIE

    « When a newly married couple smiles, everyone knows why. When a ten-
    year married couple smiles, everyone wonders why. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Mon Mar 20 10:51:48 2023
    On 19 Mar 2023 at 12:11:09 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli and Jesse,

    I may add a dedicated option to pullnews and integrate it in a future
    release, if that may prove to be useful for others.

    I've just committed a proper patch, which will be shipped with INN
    2.7.1. (You can grab it at the same URL as provided before.)

    -L size

    Specify the largest wanted article size in bytes. The default is to
    download all articles, whatever their size. When this option is
    used, pullnews will first retrieve overview data (if available) of
    each newsgroup to process so as to obtain articles sizes, before
    deciding which articles to actually download.

    Thank you Julien and I will test the new version soon.

    However, I have my doubts about the fact that this new version first downloads the overview data of the entire newsgroup. That can take a while for a newsgroup
    with a few million messages. I don't know if I like that.

    But maybe I misunderstood you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tom Furie@21:1/5 to Eli on Mon Mar 20 11:12:11 2023
    On 2023-03-20, Eli <eliistheman@gmail.com> wrote:
    However, I have my doubts about the fact that this new version first downloads
    the overview data of the entire newsgroup. That can take a while for a newsgroup
    with a few million messages. I don't know if I like that.

    You need the overview to get the article size before downloading,
    whether that's all at once or per article. I imagine it's more efficient
    to get it all up front then filter out the unwanted articles.

    Cheers,
    Tom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to Tom Furie on Mon Mar 20 13:06:41 2023
    On 20 Mar 2023 at 12:12:11 CET, "Tom Furie" <tom@furie.org.uk> wrote:

    On 2023-03-20, Eli <eliistheman@gmail.com> wrote:
    However, I have my doubts about the fact that this new version first downloads
    the overview data of the entire newsgroup. That can take a while for a
    newsgroup
    with a few million messages. I don't know if I like that.

    You need the overview to get the article size before downloading,
    whether that's all at once or per article. I imagine it's more efficient
    to get it all up front then filter out the unwanted articles.

    Cheers,
    Tom

    It may be a bit more efficient, but I still see more disadvantages than advantages. For example, if I want to change the max. file size or want to change the regex for option -m, this is no longer possible after the entire overview has already been downloaded and the filtering has already been processed.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to iulius@nom-de-mon-site.com.invalid on Mon Mar 20 16:02:15 2023
    On Mar 19, 2023 at 10:40:08 AM CDT, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Jesse,

    When running with -d 1, sometimes when I hit CTRL-C to stop the process it >> wipes out the pullnews.marks file. It does not do this every time, but seems >> like it is happening if I stop the process while it is retrieving overview >> information.

    When you say "wipe out", does it mean you have an empty pullnews.marks
    file? Or a pullnews.marks file with wrong article numbers?

    Does it happen only with "-d 1"?
    I'm unsure what could cause that, as I've not changed the way the configuration file is handled :-/
    It is saved when pullnews receives a SIGINT (Ctrl+C for instance), and
    it writes the last article number processed.


    I've tried to reproduce it with "-d 1", but do not see anything
    suspicious in pullnews.marks. The last line in standard output is
    "Saving config" after Ctrl+C.

    Here is what I'm seeing in a session that I did not kill, but was killed off after the upstream host cut me off for time limit:

    . DEBUGGING 361445 -- not downloading already existing message <45af67cb$0$20803$5fc30a8@news.tiscali.it> code=223
    . DEBUGGING 361449 -- not downloading already existing message <45af6a64$0$20803$5fc30a8@news.tiscali.it> code=223
    . DEBUGGING 361451 -- not downloading already existing message <xn0f1cwlp7d8yw000IdSub@news.individual.net> code=223

    Transfer to server failed (436): Flushing log and syslog files

    When I start the command again:

    [news@spool1 ~]$ pullnews -d 1 -O -c pullnews4.marks -L 200000 -t 3 -G it.sport.calcio,it.sport.calcio.estero,it.sport.calcio.fiorentina,it.sport.ca lcio.genoa,it.sport.calcio.inter
    Mon Mar 20 11:00:14 2023 start

    No servers!

    [news@spool1 ~]$ cat pullnews4.marks
    [news@spool1 ~]$

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Mon Mar 20 20:07:23 2023
    Hi Jesse,
    Here is what I'm seeing in a session that I did not kill, but was killed off after the upstream host cut me off for time limit:

    . DEBUGGING 361445 -- not downloading already existing message <45af67cb$0$20803$5fc30a8@news.tiscali.it> code=223
    . DEBUGGING 361449 -- not downloading already existing message <45af6a64$0$20803$5fc30a8@news.tiscali.it> code=223
    . DEBUGGING 361451 -- not downloading already existing message <xn0f1cwlp7d8yw000IdSub@news.individual.net> code=223

    Transfer to server failed (436): Flushing log and syslog files

    Hmm, this log line does not correspond to a time limit enforced by the
    upstream host. It is generated by the downstream server to which you
    are sending articles. The "Flushing log and syslog files" message
    appears during log rotation (INN is paused a very short moment).


    When I start the command again:

    [news@spool1 ~]$ pullnews -d 1 -O -c pullnews4.marks -L 200000 -t 3 -G it.sport.calcio,it.sport.calcio.estero,it.sport.calcio.fiorentina,it.sport.ca lcio.genoa,it.sport.calcio.inter
    Mon Mar 20 11:00:14 2023 start

    No servers!

    [news@spool1 ~]$ cat pullnews4.marks
    [news@spool1 ~]$

    Gosh!

    Don't you have anything else after "Transfer to server failed (436):
    Flushing log and syslog files"?
    No "can't open pullnews4.marks" error?

    I'm a bit surprised, the configuration file is saved this way:

    open(FILE, ">$groupFile") || die "can't open $groupFile: $!\n";
    print LOG "\nSaving config\n" unless $quiet;
    print FILE "# Format: (date is epoch seconds)\n";
    print FILE "# hostname[:port][_tlsmode] [username password]\n";
    print FILE "# group date high\n";
    foreach $server ( ... )
    print [...]

    close FILE;


    You don't even have the "Saving config" debug line in your console, nor
    the 3 initial # lines written in the new pullnews4.marks file...
    Sounds like open() failed, or close() failed...

    Could you try to add an explicit message error ?

    close(FILE) or die "can't close $groupFile: $!\n";;



    Don't you have in mind anything that could explain why the file couldn't
    be written? (lack of disk space, wrong permissions on the file because pullnews was not started with the right user, etc.)

    --
    Julien ÉLIE

    « Avez-vous remarqué qu'à table les mets que l'on vous sert vous mettent
    les mots à la bouche ? » (Raymond Devos)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Mon Mar 20 19:38:39 2023
    Hi Eli,

    However, I have my doubts about the fact that this new version first downloads
    the overview data of the entire newsgroup. That can take a while for a
    newsgroup with a few million messages. I don't know if I like that.

    You need the overview to get the article size before downloading,
    whether that's all at once or per article. I imagine it's more efficient
    to get it all up front then filter out the unwanted articles.

    Indeed, downloading the overview in a unique command is globally faster
    than article by article.

    Suppose the group contains article numbers 1 to 1,000,000 and the last
    time pullnews ran, it retrieved article 800,000.
    Then on a new run, it will first ask the overview of articles 800,001 to 1,000,000 in a unique command, and it will get a unique (long) answer.
    Then pullnews will actually download only articles known to be smaller
    than the maximum size wanted.

    Otherwise, the easiest way would be to retrieve overview data, article
    by article. It will take globally more time, but I agree the user
    experience is better as he does not have the feeling of a "hang" during
    the download of the whole overview.
    I could tweak it to download by bunches of 100 articles for instance,
    but it's more work to do :( I may have to do so finally...



    It may be a bit more efficient, but I still see more disadvantages than advantages. For example, if I want to change the max. file size or want to change the regex for option -m, this is no longer possible after the entire overview has already been downloaded and the filtering has already been processed.

    No, it will still be possible.
    Overview data is downloaded each time you run pullnews. It does not
    save it for later re-use if you "reset" the newsgroup in pullnews.marks.
    It downloads overview data between the last retrieved article and the
    last existing article in the newsgroup.

    --
    Julien ÉLIE

    « Give laugh to all but smile to one,
    Give cheeks to all but lips to one,
    Give love to all but Heart to one,
    Let everybody love you
    But you love one. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to iulius@nom-de-mon-site.com.invalid on Mon Mar 20 18:58:59 2023
    On Mar 20, 2023 at 1:38:39 PM CDT, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    However, I have my doubts about the fact that this new version first downloads
    the overview data of the entire newsgroup. That can take a while for a >>>> newsgroup with a few million messages. I don't know if I like that.

    You need the overview to get the article size before downloading,
    whether that's all at once or per article. I imagine it's more efficient >>> to get it all up front then filter out the unwanted articles.

    Indeed, downloading the overview in a unique command is globally faster
    than article by article.

    Suppose the group contains article numbers 1 to 1,000,000 and the last
    time pullnews ran, it retrieved article 800,000.
    Then on a new run, it will first ask the overview of articles 800,001 to 1,000,000 in a unique command, and it will get a unique (long) answer.
    Then pullnews will actually download only articles known to be smaller
    than the maximum size wanted.

    Otherwise, the easiest way would be to retrieve overview data, article
    by article. It will take globally more time, but I agree the user
    experience is better as he does not have the feeling of a "hang" during
    the download of the whole overview.
    I could tweak it to download by bunches of 100 articles for instance,
    but it's more work to do :( I may have to do so finally...

    Maybe it would be sufficient to print a message indicating it is retrieving overview data to inform the user of what is happening to account for the
    pause?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to iulius@nom-de-mon-site.com.invalid on Mon Mar 20 19:37:09 2023
    On Mar 20, 2023 at 2:07:23 PM CDT, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Jesse,
    Here is what I'm seeing in a session that I did not kill, but was killed off >> after the upstream host cut me off for time limit:

    . DEBUGGING 361445 -- not downloading already existing message
    <45af67cb$0$20803$5fc30a8@news.tiscali.it> code=223
    . DEBUGGING 361449 -- not downloading already existing message
    <45af6a64$0$20803$5fc30a8@news.tiscali.it> code=223
    . DEBUGGING 361451 -- not downloading already existing message
    <xn0f1cwlp7d8yw000IdSub@news.individual.net> code=223

    Transfer to server failed (436): Flushing log and syslog files

    Hmm, this log line does not correspond to a time limit enforced by the upstream host. It is generated by the downstream server to which you
    are sending articles. The "Flushing log and syslog files" message
    appears during log rotation (INN is paused a very short moment).

    You are right, I have too many screen sessions running pullnews and mixed two up.

    When I start the command again:

    [news@spool1 ~]$ pullnews -d 1 -O -c pullnews4.marks -L 200000 -t 3 -G
    it.sport.calcio,it.sport.calcio.estero,it.sport.calcio.fiorentina,it.sport.ca
    lcio.genoa,it.sport.calcio.inter
    Mon Mar 20 11:00:14 2023 start

    No servers!

    [news@spool1 ~]$ cat pullnews4.marks
    [news@spool1 ~]$

    Gosh!

    Don't you have anything else after "Transfer to server failed (436):
    Flushing log and syslog files"?
    No "can't open pullnews4.marks" error?

    The output I provided was a copy/paste from the end of the session and the commands I ran after.


    I'm a bit surprised, the configuration file is saved this way:

    open(FILE, ">$groupFile") || die "can't open $groupFile: $!\n";
    print LOG "\nSaving config\n" unless $quiet;
    print FILE "# Format: (date is epoch seconds)\n";
    print FILE "# hostname[:port][_tlsmode] [username password]\n";
    print FILE "# group date high\n";
    foreach $server ( ... )
    print [...]

    close FILE;


    You don't even have the "Saving config" debug line in your console, nor
    the 3 initial # lines written in the new pullnews4.marks file...
    Sounds like open() failed, or close() failed...

    Could you try to add an explicit message error ?

    close(FILE) or die "can't close $groupFile: $!\n";;

    Sure, may be a few days before I reply back with results!

    Don't you have in mind anything that could explain why the file couldn't
    be written? (lack of disk space, wrong permissions on the file because pullnews was not started with the right user, etc.)

    Definitely no reason, plenty of space, no permission issues, I'm running pullnews as the "news" user.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Mon Mar 20 22:11:55 2023
    Hi Eli and Jesse,

    I could tweak it to download by bunches of 100 articles for instance,
    but it's more work to do :( I may have to do so finally...

    Maybe it would be sufficient to print a message indicating it is retrieving overview data to inform the user of what is happening to account for the pause?

    It was easier to change than I thought.
    Chunks of "progress width" articles at a time are now retrieved in
    overview data. This corresponds to the value of -C (50 by default),
    that is to say the number of "x", "." and like that are shown on one
    progress line.
    This way, downloading overview data does not take much time and is
    almost unnoticeable by the user.

    This improved version of pullnews is available at the same URL as
    before. Thanks again for the feedback.

    --
    Julien ÉLIE

    « Asseyez-vous une heure près d'une jolie fille, cela passe comme une
    minute ; asseyez-vous une minute sur un poêle brûlant, cela passe
    comme une heure : c'est cela la relativité. » (Einstein)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to Eli on Tue Mar 21 14:06:50 2023
    On 21 Mar 2023 at 14:49:16 CET, "Eli" <eliistheman@gmail.com> wrote:

    Hello Julien,

    In some newsgroups I get the following error while using pullnews:

    DEBUGGING 560 Post 436: Msg: <Can't store article>

    Then pullnews quits.
    Can this be avoided as it is very annoying.

    In the latest version of pullnews (the one from the link you posted earlier)
    it quits with the error:

    Transfer to server failed (436): Can't store article

    It seems that this only happens with some old posted articles. But still very annoying.

    The new pullnews version is working great BTW. It is much faster than the current one.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to All on Tue Mar 21 13:49:16 2023
    Hello Julien,

    In some newsgroups I get the following error while using pullnews:

    DEBUGGING 560 Post 436: Msg: <Can't store article>

    Then pullnews quits.
    Can this be avoided as it is very annoying.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Tue Mar 21 19:41:22 2023
    On 21 Mar 2023 at 20:18:32 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    In some newsgroups I get the following error while using pullnews:

    DEBUGGING 560 Post 436: Msg: <Can't store article>

    Then pullnews quits.
    Can this be avoided as it is very annoying.

    Do you happen to have other logs in <pathlog>/news.err or news.notice?
    It would be useful to understand why innd did not manage to store the
    article provided by pullnews.

    I don't see any reports in the other logs.

    It is an unusual error. Do all the
    newsgroups match an entry in storage.conf?

    Yes, I think this is enough and all I have:

    method tradspool {
    newsgroups: *
    class: 0
    }


    In the latest version of pullnews (the one from the link you posted earlier) >> it quits with the error:

    Transfer to server failed (436): Can't store article

    Didn't previous versions of pullnews report the same error?

    Yes it did.

    It seems that this only happens with some old posted articles. But still very
    annoying.

    Only old posts in some newsgroups? Do they have something special?
    (article number > 2^31, unusual headers, etc.)

    Here are 3 that I have on hand in a moment:

    <42258b3e_2@127.0.0.1>
    <Xns9735CDDF6F44CYouandmeherecom@216.113.192.29> <1186867877_2111@sp6iad.superfeed.net>

    If you cannot access the articles then let me know and I'll post the headers here.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Tue Mar 21 20:18:32 2023
    Hi Eli,

    In some newsgroups I get the following error while using pullnews:

    DEBUGGING 560 Post 436: Msg: <Can't store article>

    Then pullnews quits.
    Can this be avoided as it is very annoying.

    Do you happen to have other logs in <pathlog>/news.err or news.notice?
    It would be useful to understand why innd did not manage to store the
    article provided by pullnews. It is an unusual error. Do all the
    newsgroups match an entry in storage.conf?


    In the latest version of pullnews (the one from the link you posted earlier) it quits with the error:

    Transfer to server failed (436): Can't store article

    Didn't previous versions of pullnews report the same error?


    It seems that this only happens with some old posted articles. But still very annoying.

    Only old posts in some newsgroups? Do they have something special?
    (article number > 2^31, unusual headers, etc.)

    --
    Julien ÉLIE

    « Il n'y a que le premier pas qui coûte. » (Mme du Deffand)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Tue Mar 21 22:04:47 2023
    Hi Eli,

    Here are 3 that I have on hand in a moment:

    <42258b3e_2@127.0.0.1>
    <Xns9735CDDF6F44CYouandmeherecom@216.113.192.29> <1186867877_2111@sp6iad.superfeed.net>

    If you cannot access the articles then let me know and I'll post the headers here.

    I've tried to inject the first one on my news server, and do not see any problem... I don't know why it cannot be stored on yours.
    (I've only added "trigofacile.test" to the list of newsgroups as I do
    not carry alt.*)

    235 Article transferred OK


    X-Proxy-User: $$ch0zr$fsnj
    Newsgroups: alt.2600.qna,alt.2600.warez,alt.2600.414,alt.2600a,alt.2600hz,alt.266,alt.2d,alt.2eggs.sausage.beans.tomatoes.2toast.largetea.cheerslove,alt,alt.3d,trigofacile.test
    Subject: An Easier Way To Make Money
    From: mytest@usenet.com
    Date: Wed, 2 Mar 2005 09:49:51 GMT
    X-Newsreader: News Rover 10.2.0 (http://www.NewsRover.com)
    Message-ID: <42258b3e_2@127.0.0.1e>
    Lines: 140
    X-Comments: This message was posted through <A href
    X-Comments2: IMPORTANT: Newsfeeds.com does not condone,
    X-Report: Please report illegal or inappropriate use to
    Organization: Newsfeeds.com http://www.newsfeeds.com 100,000+ UNCENSORED Newsgroups.
    Path: news.trigofacile.com!news-out.spamkiller.net!spool9-east!not-for-mail Xref: news.trigofacile.com trigofacile.test:713

    ...

    --
    Julien ÉLIE

    « Plus un ordinateur possède de RAM, plus vite il peut générer un
    message d'erreur. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Tue Mar 21 21:42:29 2023
    On 21 Mar 2023 at 22:04:47 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    I've tried to inject the first one on my news server, and do not see any problem... I don't know why it cannot be stored on yours.
    (I've only added "trigofacile.test" to the list of newsgroups as I do
    not carry alt.*)

    235 Article transferred OK

    I'll keep looking for a cause.
    Thank you very much for your time. I really appreciate it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed Mar 22 07:47:21 2023
    Hi Eli,

    Transfer to server failed (436): Can't store article

    I'll keep looking for a cause.

    As it seems you are using the tradspool storage system, could you please
    try:
    scanspool -n -v

    Though probably not related to overview, could you also try:
    tdx-util -A
    (if you're using tradindexed)

    --
    Julien ÉLIE

    « Ce sont vos uniones, pas les miens ! » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Neodome Admin@21:1/5 to Grant Taylor on Wed Mar 22 06:19:30 2023
    Grant Taylor <gtaylor@tnetconsulting.net> writes:

    On 3/16/23 4:19 AM, Neodome Admin wrote:
    I think you belong to binary Usenet, and you're free to read and
    post anything you want as long as all parties involved agree on
    that.

    Wait a minute.

    We're talking about a /text/ post consisting of entirely printable
    ASCII meant to be read by a human. That's very much so /text/. It's
    not binary encoded in text.

    They are not posts *created* by humans, and this is my problem with
    them. Of course, if we'll try to be completely logical about this, there
    can be posts created by humans with binary files attached, etc., and no
    one cares about those.

    No, seriously, I have no problems with 700+KB posts. If you want, I can
    set up a script posting any 700+KB FAQ you want, to any newsgroup, using
    your name, as often as you want, and even more often. What do you say?

    Stop it.

    I know that you know that would be a form of abuse.

    You are correct, Grant. It was a sarcasm.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Neodome Admin@21:1/5 to Grant Taylor on Wed Mar 22 07:18:29 2023
    Grant Taylor <gtaylor@tnetconsulting.net> writes:

    On 3/16/23 3:43 AM, Neodome Admin wrote:
    I don't doubt that.

    So you agree that the content of the articles does have some value to
    some people.

    Same as binary MIME attachments to legit Usenet messages written by real people. They have some value for me if they add to the conversation. Is
    there really a reason to avoid them now when I literally use more memory
    on my 256 GB iPhone to store pictures of random dogs and cats than I use
    on my server to store 2 years of unfiltered text Usenet? By unfiltered I
    mean completely unfiltered, all Google Groups spam and other junk
    included.

    I just find it technically much simpler to differentiate by the article
    size. Bigger than some value - binary. Smaller - text Usenet.

    Thus my advice.

    I doubt that regular posting of 700+KB FAQ is doing any good.

    What's your primary objection? The frequency or the size of the posts?

    I doubt that anything in those FAQs is more useful than information
    that can be found with Google or DuckDuckGo. We're not living in an
    era of Altavista, after all. And if there is some kind of gem hidden
    there, one simply don't need to post it to newsgroup regularly with
    700+KB of irrelevant text.

    I think that there is some value in having some unrequested
    information put in front of you.

    I've seen many things that I didn't know that I wanted to know put in
    front of me.

    I've also been mildly interested in something and seen something new
    (to me) done with it that really peaks my interest and causes me to
    actively investigate it.

    I believe there is some value in things being put in front of me for
    my perusal.

    FAQs are little bit different story than other messages. Like I said, my
    main problem with them is that they're not written by the people, and
    thus I don't see the need to treat them any different than spam and
    binaries. After all, all those binary messages also can be useful for
    someone, maybe even bigger amount of people will find them more useful
    compared to FAQs.

    I think that legit text conversations in binary newsgroups bring more to
    the Usenet as communication platform than bi-weekly FAQs in dead text newsgroups, thus they are the ones that deserve to be preserved for the
    future readers. BTW, currently it's not being done by text Usenet
    servers.

    Plus, I'm pretty sure that if there are any questions, one can just
    ask a question in retro-computing group and expect an
    answer... unless that group is dead, of course.

    It's really hard to ask a question about something if you don't know
    that said something exists.

    I don't mind quarterly or even monthly posting of FAQs. I do have an objection to super large FAQs. -- I think I have my server
    configured to accept 128 kB articles.

    Even at 1 MB, this is only a few seconds worth of audio / video as -- purportedly -- admin@Neodome pointed out in a different message.
    These messages really are not much to sneeze at. -- My news server
    sees 50 or more of these messages worth of traffic per day. So, one
    of these per month, much less quarter, not even worth complaining
    about.

    You are correct. If there are FAQs bigger than 64 Kb, the amount of data
    they consume is miniscule compared even to the Google Groups
    spam. Actually, thinking of it, I might receive them anyway from one of
    the peers who set their newsfeeds incorrectly, and probably still didn't
    fix it. I just never complained about it because it's not a problem from technical point of view.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Wed Mar 22 11:54:40 2023
    On 22 Mar 2023 at 07:47:21 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    Transfer to server failed (436): Can't store article

    I'll keep looking for a cause.

    As it seems you are using the tradspool storage system, could you please
    try:
    scanspool -n -v

    Though probably not related to overview, could you also try:
    tdx-util -A
    (if you're using tradindexed)

    Since these commands take quite a long time, I will wait with this until all pullnews sessions are done and let you know.

    Something else:
    How can I reset a newsgroup that has already been fully downloaded so that pullnews starts downloading all posts again?

    Can this be done by:
    1) 'ctlinnd rmgroup newgroup'
    2) 'ctlinnd newgroupgroup'

    or is there a better way?

    Thank again and apologies for all my questions.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to Eli on Wed Mar 22 15:23:25 2023
    On Mar 22, 2023 at 6:54:40 AM CDT, "Eli" <eliistheman@gmail.com> wrote:

    On 22 Mar 2023 at 07:47:21 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    Transfer to server failed (436): Can't store article

    I'll keep looking for a cause.

    As it seems you are using the tradspool storage system, could you please
    try:
    scanspool -n -v

    Though probably not related to overview, could you also try:
    tdx-util -A
    (if you're using tradindexed)

    Since these commands take quite a long time, I will wait with this until all pullnews sessions are done and let you know.

    Something else:
    How can I reset a newsgroup that has already been fully downloaded so that pullnews starts downloading all posts again?

    Can this be done by:
    1) 'ctlinnd rmgroup newgroup'
    2) 'ctlinnd newgroupgroup'

    or is there a better way?

    Thank again and apologies for all my questions.

    If you've already made a full pass over the group with pullnews and want to make another full pass, I think the easiest is to modify the pullnews.marks counts for that group and set to 1. That should cause pullnews to start from the beginning.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Neodome Admin on Wed Mar 22 11:57:55 2023
    On 3/22/23 12:19 AM, Neodome Admin wrote:
    They are not posts *created* by humans, and this is my problem with
    them.

    Okay. I think that's a reasonable differentiation. I also think that
    it's one that's harder to programmatically determine.

    Of course, if we'll try to be completely logical about this, there
    can be posts created by humans with binary files attached, etc.,
    and no one cares about those.

    ACK

    Hence the programmatically comment. ;-)

    You are correct, Grant. It was a sarcasm.

    Ah. That obviously didn't come across. -- I've been dealing with
    recruiters who seem to range from the intelligence / willingness to
    exert effort between a slug and a track superstar. My calibration is
    out of whack at the moment.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Neodome Admin on Wed Mar 22 12:09:28 2023
    On 3/22/23 1:18 AM, Neodome Admin wrote:
    Same as binary MIME attachments to legit Usenet messages written
    by real people. They have some value for me if they add to the
    conversation.

    Fair.

    Is there really a reason to avoid them now when I literally use more
    memory on my 256 GB iPhone to store pictures of random dogs and cats
    than I use on my server to store 2 years of unfiltered text Usenet? By unfiltered I mean completely unfiltered, all Google Groups spam and
    other junk included.
    ACK

    I just find it technically much simpler to differentiate by the
    article size. Bigger than some value - binary. Smaller - text Usenet.

    I agree that the size is a likely indicator of binary or not.

    Though, I wonder if we are now in the day & age that we could create
    filters that either:

    - detect multiple strings of text with white space between them, thus
    words.
    - detect the standard encoding methods; e.g. 76 x [A-Za-z0-0+/=] for
    base64

    Thus my advice.

    Fair enough. Your server, your rules.

    FAQs are little bit different story than other messages. Like I said,
    my main problem with them is that they're not written by the people,
    and thus I don't see the need to treat them any different than spam
    and binaries. After all, all those binary messages also can be useful
    for someone, maybe even bigger amount of people will find them more
    useful compared to FAQs.

    I understand what you're saying.

    But is there anything to differentiate the FAQs posted by automation and
    an FAQ copied from a template and pasted into the news reader by a
    human? ;-)

    The original text was almost certainly written by a human. Even if the
    current form it is in is an amalgamation of copy & paste et al.

    I think that legit text conversations in binary newsgroups bring more
    to the Usenet as communication platform than bi-weekly FAQs in dead
    text newsgroups, thus they are the ones that deserve to be preserved
    for the future readers.

    I can agree with that.

    BTW, currently it's not being done by text Usenet servers.

    Agreed.

    I suspect that's based on older methods of identifying / handling binary attachments.

    You are correct. If there are FAQs bigger than 64 Kb, the amount of
    data they consume is miniscule compared even to the Google Groups
    spam. Actually, thinking of it, I might receive them anyway from one
    of the peers who set their newsfeeds incorrectly, and probably still
    didn't fix it. I just never complained about it because it's not a
    problem from technical point of view.

    We all have things that we could improve on. I choose to focus on the
    things with bigger impact.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Wed Mar 22 18:56:56 2023
    On 22 Mar 2023 at 19:38:07 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    Something else:
    How can I reset a newsgroup that has already been fully downloaded so that >> pullnews starts downloading all posts again?

    Can this be done by:
    1) 'ctlinnd rmgroup newgroup'
    2) 'ctlinnd newgroupgroup'

    or is there a better way?

    +1 for Jesse's way.

    I have a question about these ctlinnd rmgroup/newgroup commands. Do you happen to have already used them to "reset" a newsgroup?
    It would explain the "Can't store" errors if you also did not purge the tradspool files in <pathspool> for some newsgroups. Files named with
    article numbers "1", "2", "3", etc. will still be present in your spool.
    If you recreate a newsgroup with ctlinnd rmgroup/newgroup, it just
    recreate it in the active file, without wiping the spool. Article
    numbering is reset to 1, and INN will try to store articles in already existing "1", "2", etc. files.

    Hi Julien,

    No, unfortunately I didn't.
    I have not deleted or reset anything at all.
    The problem still occurs intermittently and I don't understand why.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to jesse.rehmer@blueworldhosting.com on Wed Mar 22 18:54:08 2023
    On 22 Mar 2023 at 16:23:25 CET, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:

    On Mar 22, 2023 at 6:54:40 AM CDT, "Eli" <eliistheman@gmail.com> wrote:

    On 22 Mar 2023 at 07:47:21 CET, "Julien ÉLIE"
    <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    Transfer to server failed (436): Can't store article

    I'll keep looking for a cause.

    As it seems you are using the tradspool storage system, could you please >>> try:
    scanspool -n -v

    Though probably not related to overview, could you also try:
    tdx-util -A
    (if you're using tradindexed)

    Since these commands take quite a long time, I will wait with this until all >> pullnews sessions are done and let you know.

    Something else:
    How can I reset a newsgroup that has already been fully downloaded so that >> pullnews starts downloading all posts again?

    Can this be done by:
    1) 'ctlinnd rmgroup newgroup'
    2) 'ctlinnd newgroupgroup'

    or is there a better way?

    Thank again and apologies for all my questions.

    If you've already made a full pass over the group with pullnews and want to make another full pass, I think the easiest is to modify the pullnews.marks counts for that group and set to 1. That should cause pullnews to start from the beginning.

    Hi Jesse,

    That's what I thought at first too, but this prevents all existing files in
    the spool from being downloaded again and all messages are treated as
    '-- not downloading already existing message'.

    My question is therefore how you can completely reset a newsgroup so that everything is downloaded again.

    This in particular for the newsgroup 'news.lists.filters'. This group contains the references to the 'spam' messages that NoCem then deletes. I want to reset this newsgroup 'news.lists.filters' so that all messages are checked locally again and in case of spam removed.

    But besides this newsgroup I also want to reset other newsgroups.
    I hope this is possible.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to All on Wed Mar 22 18:40:34 2023
    On Mar 22, 2023 at 1:09:28 PM CDT, "Grant Taylor" <gtaylor@tnetconsulting.net> wrote:

    Though, I wonder if we are now in the day & age that we could create
    filters that either:

    - detect multiple strings of text with white space between them, thus words.
    - detect the standard encoding methods; e.g. 76 x [A-Za-z0-0+/=] for
    base64

    Diablo has this article type detection built in and allows you to filter based on types in newsfeed definitions. Cleanfeed and pyClean do the same for INN. it's not perfect, but pretty damn effective.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed Mar 22 19:38:07 2023
    Hi Eli,

    Something else:
    How can I reset a newsgroup that has already been fully downloaded so that pullnews starts downloading all posts again?

    Can this be done by:
    1) 'ctlinnd rmgroup newgroup'
    2) 'ctlinnd newgroupgroup'

    or is there a better way?

    +1 for Jesse's way.

    I have a question about these ctlinnd rmgroup/newgroup commands. Do you
    happen to have already used them to "reset" a newsgroup?
    It would explain the "Can't store" errors if you also did not purge the tradspool files in <pathspool> for some newsgroups. Files named with
    article numbers "1", "2", "3", etc. will still be present in your spool.
    If you recreate a newsgroup with ctlinnd rmgroup/newgroup, it just
    recreate it in the active file, without wiping the spool. Article
    numbering is reset to 1, and INN will try to store articles in already
    existing "1", "2", etc. files.

    --
    Julien ÉLIE

    « Il ne faut jamais parler sèchement à un Numide. » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tom Furie@21:1/5 to Grant Taylor on Wed Mar 22 19:54:14 2023
    On 2023-03-22, Grant Taylor <gtaylor@tnetconsulting.net> wrote:

    In the context of detecting encoded binary attachments, I feel like that should be relatively easy to do.

    Oh, there's no problem for it catching binaries, that's a non-issue. I'm talking about methods for catching the still ever prevelant text spam..

    I've been wondering if it might be possible to use something like
    spamassassin with bayesian learning on a newsfeed though I haven't
    got to the point of trying to implement anything yet.
    I don't know what SpamAssassin will think of news articles.

    I don't imagine it will have any problem with the bodies, but the
    headers will likely be a different matter since I doubt spamassassin
    knows anything about them. Maybe some custom rulesets to inform it what
    to look at...

    I wonder if it would be possible to leverage something like the milter interface to SpamAssassin so that you don't need to integrate and or
    fork SpamAssassin.

    Yes, I was thinking of interfacing that way, or feeding everything off to spamd.

    Cheers,
    Tom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Tom Furie on Wed Mar 22 13:36:19 2023
    On 3/22/23 1:33 PM, Tom Furie wrote:
    While Cleanfeed is effective enough at what it does, there's no
    "smarts" to it and it can be a chore coming up with effective patterns
    that work but don't get in the way of legitimate posts that happen to
    contain some of the "trouble" words or phrases.

    Please elaborate and share some examples.

    In the context of detecting encoded binary attachments, I feel like that
    should be relatively easy to do.

    I've been wondering if it might be possible to use something like spamassassin with bayesian learning on a newsfeed though I haven't
    got to the point of trying to implement anything yet.

    I don't know what SpamAssassin will think of news articles.

    I wonder if it would be possible to leverage something like the milter interface to SpamAssassin so that you don't need to integrate and or
    fork SpamAssassin.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tom Furie@21:1/5 to Jesse Rehmer on Wed Mar 22 19:33:13 2023
    On 2023-03-22, Jesse Rehmer <jesse.rehmer@blueworldhosting.com> wrote:
    On Mar 22, 2023 at 1:09:28 PM CDT, "Grant Taylor" <gtaylor@tnetconsulting.net>
    wrote:

    Though, I wonder if we are now in the day & age that we could create
    filters that either:

    - detect multiple strings of text with white space between them, thus
    words.
    - detect the standard encoding methods; e.g. 76 x [A-Za-z0-0+/=] for
    base64

    Diablo has this article type detection built in and allows you to filter based
    on types in newsfeed definitions. Cleanfeed and pyClean do the same for INN. it's not perfect, but pretty damn effective.

    While Cleanfeed is effective enough at what it does, there's no "smarts"
    to it and it can be a chore coming up with effective patterns that work
    but don't get in the way of legitimate posts that happen to contain some
    of the "trouble" words or phrases. I've been wondering if it might be
    possible to use something like spamassassin with bayesian learning on a newsfeed though I haven't got to the point of trying to implement
    anything yet.

    Cheers,
    Tom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to Tom Furie on Wed Mar 22 19:53:35 2023
    On Mar 22, 2023 at 2:33:13 PM CDT, "Tom Furie" <tom@furie.org.uk> wrote:

    On 2023-03-22, Jesse Rehmer <jesse.rehmer@blueworldhosting.com> wrote:
    On Mar 22, 2023 at 1:09:28 PM CDT, "Grant Taylor" <gtaylor@tnetconsulting.net>
    wrote:

    Though, I wonder if we are now in the day & age that we could create
    filters that either:

    - detect multiple strings of text with white space between them, thus
    words.
    - detect the standard encoding methods; e.g. 76 x [A-Za-z0-0+/=] for
    base64

    Diablo has this article type detection built in and allows you to filter based
    on types in newsfeed definitions. Cleanfeed and pyClean do the same for INN. >> it's not perfect, but pretty damn effective.

    While Cleanfeed is effective enough at what it does, there's no "smarts"
    to it and it can be a chore coming up with effective patterns that work
    but don't get in the way of legitimate posts that happen to contain some
    of the "trouble" words or phrases. I've been wondering if it might be possible to use something like spamassassin with bayesian learning on a newsfeed though I haven't got to the point of trying to implement
    anything yet.

    Cheers,
    Tom

    I agree, and I decided that Diablo's duplicate article detection is good
    enough for me in regards to spam filtering. Interestingly, all the default documentation and examples only talk about using this duplicate detection for binary articles, but I changed the feed definition to include everything and seems about as effective as Cleanfeed/pyClean. It's binary detection seems really good, and I'm not chewing up any noticeable CPU with filtering now.

    I do use pyClean with some bad_from and bad_subject filters on my spool server for finer granularity there.

    Seems like I remember efforts in the past, perhaps not specific to INN or Diablo, but other tools to implement SpamAssassin for filtering articles, but off hand can't recall where that conversation occurred.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed Mar 22 22:00:17 2023
    Hi all,

    I don't know what SpamAssassin will think of news articles.

    I wonder if it would be possible to leverage something like the
    milter interface to SpamAssassin so that you don't need to
    integrate and>> or fork SpamAssassin. >
    Seems like I remember efforts in the past, perhaps not specific to INN or Diablo, but other tools to implement SpamAssassin for filtering articles, but off hand can't recall where that conversation occurred.

    From: yamo' <yamo@beurdin.invalid>
    Newsgroups: news.software.nntp
    Subject: Re: Google Groups spam - INN/Cleanfeed/etc solutions?
    Date: Sun, 19 Sep 2021 10:11:24 -0000 (UTC)
    Message-ID: <si72cc$ko9$1@pi2.pasdenom.info>

    :-)

    --
    Julien ÉLIE

    « Ta remise sur pied lui a fait perdre la tête ! » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tom Furie@21:1/5 to Grant Taylor on Wed Mar 22 20:43:09 2023
    On 2023-03-22, Grant Taylor <gtaylor@tnetconsulting.net> wrote:
    On 3/22/23 1:33 PM, Tom Furie wrote:
    While Cleanfeed is effective enough at what it does, there's no
    "smarts" to it and it can be a chore coming up with effective patterns
    that work but don't get in the way of legitimate posts that happen to
    contain some of the "trouble" words or phrases.

    Please elaborate and share some examples.

    Here are a few that I think illustrate the "effective pattern" problem.
    Now, this sample is all Google - which is already tagged as a known spam
    source - but still they made it through. Sure, I could just block the
    sender, but that seems a bit of "blunt instrument" approach to me. And
    what happens in the potential situation where a spammer forges an
    otherwise legitimate poster's email address, etc?

    There's also the posts whereby the originals get caught by the filter,
    but the fully quoted replies including full headers posted into the body
    of the "complaint", make it through. That's one poster I'm incredibly
    close to outright banning since he's effectively simply a reflector of
    the original spam.

    <c8eac7b9-bcf8-4c74-98a8-69cee7dfe9a3n@googlegroups.com> <1273d23c-0256-4317-97d4-3eeb7bcd74b2n@googlegroups.com> <bda61489-0096-4ff2-9a66-d1959824a1bbn@googlegroups.com>

    Cheers,
    Tom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to iulius@nom-de-mon-site.com.invalid on Wed Mar 22 21:23:48 2023
    On Mar 22, 2023 at 4:16:46 PM CDT, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    This in particular for the newsgroup 'news.lists.filters'. This group contains
    the references to the 'spam' messages that NoCem then deletes. I want to reset
    this newsgroup 'news.lists.filters' so that all messages are checked locally >> again and in case of spam removed.

    As for NoCeM, you can directly refeed your notices to perl-nocem without resetting anything.

    perl-nocem expects storage tokens on its standard input.
    Example:

    echo '@020162BEB132016300000000000000000000@' | perl-nocem

    As you're running tradindexed overview, I would suggest to have a look
    at the output of:

    tdx-util -g -n news.lists.filters

    It dumps the overview data of this newsgroup. The last field is a
    storage token.
    You could replay NoCeM notices with these information :)

    Very nice, this is valuable to me as well. I know I will be doing this when I am done gathering articles from all other the place. :-)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Wed Mar 22 21:34:02 2023
    On 22 Mar 2023 at 22:16:46 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hey Julien,

    Hi Eli,

    That's what I thought at first too, but this prevents all existing files in >> the spool from being downloaded again and all messages are treated as
    '-- not downloading already existing message'.

    My question is therefore how you can completely reset a newsgroup so that
    everything is downloaded again.

    Ah yes, that's a bit tricky as what you want is to remove all traces of articles in spool, overview and history.

    The proper method would be to:

    - ctlinnd rmgroup xxx
    - remove the <pathspool>/articles/.../xxx directory of the group
    - set /remember/ to 0 in expire.ctl
    - run the expireover and expire process (for instance via news.daily
    called with the same parameters as in crontab, plus "notdaily")
    - undo the change in expire.ctl (/remember/ set to 11)
    - ctlinnd newgroup xxx
    - reset the last downloaded article in pullnews.marks for this group
    - deactivate Perl and Python filters, and set the artcutoff to 0
    - run pullnews
    - reactivate the filters, and artcutoff to 10


    I think INN will happily accept to be refed of these articles.


    This in particular for the newsgroup 'news.lists.filters'. This group contains
    the references to the 'spam' messages that NoCem then deletes. I want to reset
    this newsgroup 'news.lists.filters' so that all messages are checked locally >> again and in case of spam removed.

    As for NoCeM, you can directly refeed your notices to perl-nocem without resetting anything.

    perl-nocem expects storage tokens on its standard input.
    Example:

    echo '@020162BEB132016300000000000000000000@' | perl-nocem

    As you're running tradindexed overview, I would suggest to have a look
    at the output of:

    tdx-util -g -n news.lists.filters

    It dumps the overview data of this newsgroup. The last field is a
    storage token.
    You could replay NoCeM notices with these information :)

    Ah cool :)

    Exactly what I needed.
    Thanks so much.

    Another question, is it possible to limit the maximum number of connections
    per authenticated user? I know this is possible for peers, but can this also
    be set up for authenticated users? Maybe a setting in readers.conf or nnrpd that I'm overlooking?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Wed Mar 22 22:08:35 2023
    On 21 Mar 2023 at 20:18:32 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    In some newsgroups I get the following error while using pullnews:

    DEBUGGING 560 Post 436: Msg: <Can't store article>

    Then pullnews quits.
    Can this be avoided as it is very annoying.

    Do you happen to have other logs in <pathlog>/news.err or news.notice?
    It would be useful to understand why innd did not manage to store the
    article provided by pullnews. It is an unusual error. Do all the
    newsgroups match an entry in storage.conf?

    Hi Julien,

    I probably found the problem.
    The errlog gives the following error:

    ==
    innd: tradspool: could not symlink /usr/local/news/spool/articles/alabama/politics/11365 to /usr/local/news/spool/articles/alt/2600/414/78: Not a directory
    ==

    /usr/local/news/spool/articles/alt/2600/414 is a file, but for some reason
    INND wants to create a folder in that path with the same name as the file
    name.

    Any ideas how this is possible and how to fix?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed Mar 22 22:49:55 2023
    Hi Eli,

    Another question, is it possible to limit the maximum number of connections per authenticated user? I know this is possible for peers, but can this also be set up for authenticated users? Maybe a setting in readers.conf or nnrpd that I'm overlooking?

    Unfortunately, the response is no. There's no native way of limiting
    users' connections.
    You may want to write a custom authentication hook (perl_auth or
    python_auth in readers.conf) that would do the job by accounting how
    many connections are open by a given user, and deny access if it exceeds
    the limit. I am not aware of existing scripts to do that :-(

    It could be worthwhile having though, as you're not the first one to ask
    (but nobody wrote or shared what he came up with).

    --
    Julien ÉLIE

    « Ta remise sur pied lui a fait perdre la tête ! » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Tom Furie@21:1/5 to iulius@nom-de-mon-site.com.invalid on Wed Mar 22 21:27:01 2023
    On 2023-03-22, Julien ÉLIE <iulius@nom-de-mon-site.com.invalid> wrote:

    From: yamo' <yamo@beurdin.invalid>
    Newsgroups: news.software.nntp
    Subject: Re: Google Groups spam - INN/Cleanfeed/etc solutions?
    Date: Sun, 19 Sep 2021 10:11:24 -0000 (UTC)
    Message-ID: <si72cc$ko9$1@pi2.pasdenom.info>

    Ooh, nice! That's going to be well worth a look into.

    Cheers,
    Tom

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed Mar 22 22:16:46 2023
    Hi Eli,

    If you've already made a full pass over the group with pullnews and want to >> make another full pass, I think the easiest is to modify the pullnews.marks >> counts for that group and set to 1. That should cause pullnews to start from >> the beginning.

    That's what I thought at first too, but this prevents all existing files in the spool from being downloaded again and all messages are treated as
    '-- not downloading already existing message'.

    My question is therefore how you can completely reset a newsgroup so that everything is downloaded again.

    Ah yes, that's a bit tricky as what you want is to remove all traces of articles in spool, overview and history.

    The proper method would be to:

    - ctlinnd rmgroup xxx
    - remove the <pathspool>/articles/.../xxx directory of the group
    - set /remember/ to 0 in expire.ctl
    - run the expireover and expire process (for instance via news.daily
    called with the same parameters as in crontab, plus "notdaily")
    - undo the change in expire.ctl (/remember/ set to 11)
    - ctlinnd newgroup xxx
    - reset the last downloaded article in pullnews.marks for this group
    - deactivate Perl and Python filters, and set the artcutoff to 0
    - run pullnews
    - reactivate the filters, and artcutoff to 10


    I think INN will happily accept to be refed of these articles.


    This in particular for the newsgroup 'news.lists.filters'. This group contains
    the references to the 'spam' messages that NoCem then deletes. I want to reset
    this newsgroup 'news.lists.filters' so that all messages are checked locally again and in case of spam removed.

    As for NoCeM, you can directly refeed your notices to perl-nocem without resetting anything.

    perl-nocem expects storage tokens on its standard input.
    Example:

    echo '@020162BEB132016300000000000000000000@' | perl-nocem

    As you're running tradindexed overview, I would suggest to have a look
    at the output of:

    tdx-util -g -n news.lists.filters

    It dumps the overview data of this newsgroup. The last field is a
    storage token.
    You could replay NoCeM notices with these information :)

    --
    Julien ÉLIE

    « Ta remise sur pied lui a fait perdre la tête ! » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed Mar 22 23:24:07 2023
    Hi Eli,

    I probably found the problem.
    The errlog gives the following error:

    ==
    innd: tradspool: could not symlink /usr/local/news/spool/articles/alabama/politics/11365 to /usr/local/news/spool/articles/alt/2600/414/78: Not a directory
    ==

    /usr/local/news/spool/articles/alt/2600/414 is a file, but for some reason INND wants to create a folder in that path with the same name as the file name.

    Any ideas how this is possible and how to fix?

    Ah, OK, I understand.
    The article I tested last day was posted to a newsgroup named
    "alt.2600.414". It did not produce any error on my news server because
    I do not have a newsgroup named alt.2600.

    When you are using tradspool, there's a conflict between the 414th
    article in the newsgroup alt.2600 and any article in the newsgroup alt.2600.414.
    It is how the tradspool storage method works.

    Such newsgroups should not be used. FYI, an excerpt of the naming
    convention of newsgroups:

    A <component> SHOULD NOT consist solely of digits and SHOULD NOT
    contain uppercase letters. Such <component>s MAY be used only to
    refer to existing groups that do not conform to this naming scheme,
    but MUST NOT be used otherwise.

    NOTE: All-digit <component>s conflict with one widely used storage
    scheme for articles. Mixed-case groups cause confusion between
    systems with case-sensitive matching and systems with case-
    insensitive matching of <newsgroup-name>s.



    How to fix it?
    Well, you can't with your current storage method.
    You would have to switch to another method (to set in storage.conf) like
    CNFS, timecaf or timehash. All these three methods will be able to
    store articles for such groups.
    You could keep tradspool for all the newsgroups except for the
    problematic ones if you want (you would have to explicitly list them in storage.conf).

    --
    Julien ÉLIE

    « Nous autres communistes, nous avons une position claire : nous n'avons
    jamais changé, nous ne changerons jamais et nous sommes pour le
    changement. » (Georges Marchais)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed Mar 22 23:35:44 2023
    Hi Jesse,

    Unfortunately, the response is no. There's no native way of limiting
    users' connections.

    It's pretty simple to run nnrpd via other utilities that will do the limiting for you, though, most UNIX/Linux systems have at least two or three tools to accomplish more or less the same thing.

    That said, it would be nice to have that ability directly in nnrpd.

    Ah yes, exactly. That's the reason why this was never implemented in
    INN. It's not seen as a priority at all, and it's also not trivial to do.

    Issue #23
    "nnrpd currently has no way of limiting connections per IP address other
    than using the custom auth hooks. In its daemon mode, it could in
    theory keep track of this and support throttling. It's probably not
    worth trying to support this when invoked via inetd, since at that point
    one could just use xinetd and its built-in support for things like this.

    When started from innd, this is a bit harder. innd has some basic rate limiting stuff, but nothing for tracking number of simultaneous
    connections over time. It may be fine to say that if you want to use
    this feature, you need to have nnrpd be invoked separately, not run from
    innd."


    So the answer is to use something like "per_source = 5" in xinetd.conf
    and start nnrpd by xinetd.

    --
    Julien ÉLIE

    « Nous autres communistes, nous avons une position claire : nous n'avons
    jamais changé, nous ne changerons jamais et nous sommes pour le
    changement. » (Georges Marchais)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to iulius@nom-de-mon-site.com.invalid on Wed Mar 22 22:25:03 2023
    On Mar 22, 2023 at 4:49:55 PM CDT, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    Another question, is it possible to limit the maximum number of connections >> per authenticated user? I know this is possible for peers, but can this also >> be set up for authenticated users? Maybe a setting in readers.conf or nnrpd >> that I'm overlooking?

    Unfortunately, the response is no. There's no native way of limiting
    users' connections.
    You may want to write a custom authentication hook (perl_auth or
    python_auth in readers.conf) that would do the job by accounting how
    many connections are open by a given user, and deny access if it exceeds
    the limit. I am not aware of existing scripts to do that :-(

    It could be worthwhile having though, as you're not the first one to ask
    (but nobody wrote or shared what he came up with).

    It's pretty simple to run nnrpd via other utilities that will do the limiting for you, though, most UNIX/Linux systems have at least two or three tools to accomplish more or less the same thing.

    That said, it would be nice to have that ability directly in nnrpd.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Tom Furie on Wed Mar 22 17:53:03 2023
    On 3/22/23 1:54 PM, Tom Furie wrote:
    Oh, there's no problem for it catching binaries, that's a
    non-issue. I'm talking about methods for catching the still ever
    prevelant text spam..

    Oh. Okay. That makes more sense.

    I don't imagine it will have any problem with the bodies, but the
    headers will likely be a different matter since I doubt spamassassin
    knows anything about them. Maybe some custom rulesets to inform it
    what to look at...

    I wouldn't be surprised if SpamAssassin did know what to do with a news
    post.

    I also would be surprised if it couldn't be taught how to deal with news
    posts.

    Yes, I was thinking of interfacing that way, or feeding everything
    off to spamd.

    :-)



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From news.usenet.ovh Admin@21:1/5 to All on Thu Mar 23 00:55:58 2023
    DV <dv@reply-to.not.invalid> composa la prose suivante:

    Neodome Admin wrote:

    DV <dv@reply-to.not.invalid> writes:

    I do.

    Like I said, we can pretend all we want, but old Usenet is gone. No one
    cares. I'm sorry guys. No one needs those FAQs.

    I say it again: i do. You should stop repeating that *no one* needs
    them, unless you think I don't exist.

    Sure, you exist.
    But who want obsolete faq ? To do what ?

    --
    List of free servers that distribute the hierarchy "fr"
    Liste de serveurs gratuit qui distribue la hiérarchie "fr"

    http://usenet.ovh/?article=faq_serveur_gratuit

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to All on Wed Mar 22 18:01:52 2023
    On 3/22/23 3:49 PM, Julien ÉLIE wrote:
    There's no native way of limiting users' connections.

    It's not per user in that you could have multiple users per IP, but I'd
    think seriously about doing this at the firewall such that each IP can
    only have a limited number of connections.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Tom Furie on Wed Mar 22 17:57:43 2023
    On 3/22/23 2:43 PM, Tom Furie wrote:
    Here are a few that I think illustrate the "effective pattern" problem.

    Thank you for the message IDs. Unfortunately Thunderbird is treating
    them as email addresses. I'll have to find a way to look them up.

    Now, this sample is all Google - which is already tagged as a known
    spam source - but still they made it through. Sure, I could just
    block the sender, but that seems a bit of "blunt instrument" approach
    to me. And what happens in the potential situation where a spammer
    forges an otherwise legitimate poster's email address, etc?

    Ya. I'm not a fan of blocking Google carte blanche like some advocate for.

    There's also the posts whereby the originals get caught by the filter,
    but the fully quoted replies including full headers posted into
    the body of the "complaint", make it through. That's one poster I'm incredibly close to outright banning since he's effectively simply
    a reflector of the original spam.

    Oh ya. I hear you on that one. I'm a single digit number of such
    examples away from banning a user like that too. I sort of suspect we
    are talking about the same user. Possibly one with a professional
    sounding name?



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to news.usenet.ovh Admin on Wed Mar 22 18:05:41 2023
    On 3/22/23 5:55 PM, news.usenet.ovh Admin wrote:
    Sure, you exist.
    But who want obsolete faq ? To do what ?

    I believe you already have the answer to your question.

    I also (re)read things from the days of yore. Some things are worth
    reading. Some are not. You have to read them to find out which they
    are. }:-)



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Gew Ghul Suques@21:1/5 to Grant Taylor on Wed Mar 22 22:09:12 2023
    On 3/22/23 18:57, Grant Taylor wrote:
    Ya.  I'm not a fan of blocking Google carte blanche like some advocate for.

    Phuque gewghul.

    --

    Gew Ghul Suques

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Thu Mar 23 09:55:52 2023
    On 22 Mar 2023 at 23:35:44 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Ah yes, exactly. That's the reason why this was never implemented in
    INN. It's not seen as a priority at all, and it's also not trivial to do.

    Issue #23
    "nnrpd currently has no way of limiting connections per IP address other
    than using the custom auth hooks. In its daemon mode, it could in
    theory keep track of this and support throttling. It's probably not
    worth trying to support this when invoked via inetd, since at that point
    one could just use xinetd and its built-in support for things like this.

    When started from innd, this is a bit harder. innd has some basic rate limiting stuff, but nothing for tracking number of simultaneous
    connections over time. It may be fine to say that if you want to use
    this feature, you need to have nnrpd be invoked separately, not run from innd."


    So the answer is to use something like "per_source = 5" in xinetd.conf
    and start nnrpd by xinetd.

    This makes it possible to limit the number of sessions per IP, but not per authenticated user account.

    1) xinitd starts nnrpd and after this the authentication takes place.
    2) nntpd sends the login data to perl_auth, which consults the database for authorization, as well as checking whether the user has already reached his maximum connections.

    So point 2 is where the problem lies.

    Each time the authorization is successful, a 'session' record can be added to the database. The number of records determines how many sessions are running for this user.

    But as soon as a session disconnects, the record must be removed from the database. However, nnrpd does not know that the session has been disconnected. Only xinitd knows this, but it doesn't have the user data, nor can it access the database.

    Perhaps something can be done with the xinitd PID, but even then this will
    have to be passed to the perl_auth script.

    So here's the problem I can't solve.
    Do you have a suggestion?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to Eli on Thu Mar 23 10:03:15 2023
    On 23 Mar 2023 at 10:55:52 CET, "Eli" <eliistheman@gmail.com> wrote:

    2) nntpd sends the login data to perl_auth, which consults the database for

    Of course I meant nnrpd here.

    Second question:
    Starting nnrpd by xinetd on port 119 requires a second IP (since innd is already bound to 119).
    But how does this affect peers? Do they connect to the IP of innd or nnrpd?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Thu Mar 23 09:24:10 2023
    On 22 Mar 2023 at 23:24:07 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    Any ideas how this is possible and how to fix?

    Ah, OK, I understand.
    The article I tested last day was posted to a newsgroup named
    "alt.2600.414". It did not produce any error on my news server because
    I do not have a newsgroup named alt.2600.

    Hi Julien,

    Removing "alt.2600.414" fixed it :D
    Thanks Julien.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Gew Ghul Suques on Thu Mar 23 10:01:05 2023
    On 3/22/23 9:09 PM, Gew Ghul Suques wrote:
    Phuque gewghul.

    If you're going to insult / slander someone, please do it properly
    instead of hiding in a homonym.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Thu Mar 23 20:07:03 2023
    Hi Eli,

    2) nnrpd sends the login data to perl_auth, which consults the database for authorization, as well as checking whether the user has already reached his maximum connections.

    So point 2 is where the problem lies.

    Each time the authorization is successful, a 'session' record can be added to the database. The number of records determines how many sessions are running for this user.

    But as soon as a session disconnects, the record must be removed from the database. However, nnrpd does not know that the session has been disconnected.
    Only xinitd knows this, but it doesn't have the user data, nor can it access the database.

    Perhaps something can be done with the xinitd PID, but even then this will have to be passed to the perl_auth script.

    Do you have a suggestion?
    So here's the problem I can't solve.
    That problem is probably why nobody has still written yet the requested
    feature either natively in nnrpd or in a perl_auth script...


    Starting nnrpd by xinetd on port 119 requires a second IP (since
    innd is already bound to 119). But how does this affect peers? Do
    they connect to the IP of innd or nnrpd?

    If you set up nnrpd as a daemon listening to port 119 (unencrypted) and
    another nnrpd as a daemon listening to port 563 (TLS), news admins
    usually set up innd to listen to port 433 (also called the NNSP port,
    for Network News Streaming Protocol). Then you tell your peers to feed
    you on that port.

    --
    Julien ÉLIE

    « Quand je raconterai mon odyssée, personne ne me croira ! » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Thu Mar 23 19:57:16 2023
    Hi Grant,

    Here are a few that I think illustrate the "effective pattern" problem.

    Thank you for the message IDs.  Unfortunately Thunderbird is treating
    them as email addresses.  I'll have to find a way to look them up.

    http://al.howardknight.net/

    --
    Julien ÉLIE

    « Nous autres communistes, nous avons une position claire : nous n'avons
    jamais changé, nous ne changerons jamais et nous sommes pour le
    changement. » (Georges Marchais)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to Eli on Thu Mar 23 13:34:57 2023
    On 3/23/23 3:55 AM, Eli wrote:
    1) xinitd starts nnrpd and after this the authentication takes place.

    Does (x)inetd need to start nnrpd /directly/? Or could it start a
    script which in turn starts nnrpd as well as calling something to clean
    up sessions after nnrpd exists?

    Especially if there could be some form of information that would
    correlate the IP & port tuple with the session. I'd think it would be relatively easy to detect that the TCP connection for a given IP & port
    tuple no longer exists (either completely gone or closing / TIME_WAIT
    type thing) and subsequently remove the session from the list.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Grant Taylor@21:1/5 to All on Thu Mar 23 13:31:01 2023
    On 3/23/23 12:57 PM, Julien ÉLIE wrote:
    Hi Grant,

    Hi Julien,

    http://al.howardknight.net/

    Thank you for that. That is a very neat tool. I've saved that for
    future use.

    I don't know how to go about fighting that spam.

    I have a few hints of ideas that I'd chase. But I suspect they would
    peter out and not help much.

    For a while I thought about filtering out problematic posting-accounts
    in the Injection-Info: header. But there were too many and I couldn't
    keep up.

    I feel like this is going the direction of statistical counts of various
    words and improper relationships with multiple other normally
    accompanying words. This becomes a statistics game that I am not
    qualified to play.



    --
    Grant. . . .
    unix || die

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to All on Sat Mar 25 10:41:42 2023
    Hi,

    The messages I download via pullnews are sent to the peers. How can that be prevented?

    I can temporarily block the newsgroup in "newsfeeds", but that has its drawbacks. Can this also be done differently?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to Eli on Sat Mar 25 11:40:39 2023
    On Mar 25, 2023 at 5:41:42 AM CDT, "Eli" <eliistheman@gmail.com> wrote:

    Hi,

    The messages I download via pullnews are sent to the peers. How can that be prevented?

    I can temporarily block the newsgroup in "newsfeeds", but that has its drawbacks. Can this also be done differently?

    Without disabling newsfeeds entirely while using pullnews an option is to have pullnews add a fake Path entry that you configure your each of your newsfeeds to exclude.

    From the pullnews manpage:
    -F fakehop
    Prepend fakehop as a host to the Path header field body of articles
    fed.

    For example, if I use "pullnews -F injection-host" it will prepend "injection-host" to the Path. Then in my newsfeeds file for each feed I can exclude the "injection-host" path from each feed so it will not be fed to the peers.

    feed/usenet.blueworldhosting.com,injection-host\
    :*\
    :Ap,Tm\
    :innfeed!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to jesse.rehmer@blueworldhosting.com on Sat Mar 25 13:12:29 2023
    On 25 Mar 2023 at 12:40:39 CET, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:

    Without disabling newsfeeds entirely while using pullnews an option is to have
    pullnews add a fake Path entry that you configure your each of your newsfeeds to exclude.

    Hi Jesse,

    Thank you for your clear answer and example.

    Another question:
    How can I delete all messages up to a certain date in a newsgroup immediately. The messages were received via pullnews, so the date received is the same for all messages.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Tue Mar 28 11:20:30 2023
    On 17 Mar 2023 at 21:21:45 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Could you please download the latest version of pullnews, and try it please?

    https://raw.githubusercontent.com/InterNetNews/inn/main/frontends/pullnews.in

    Just grab that file, rename it without .in, and change the first 2 lines
    to fit what your current pullnews script has (it is the path to Perl and
    the INN::Config module).
    Then you can run that script. It will work with your version of INN.

    Hi Julien,

    I think there is a bug in this latest version.

    When pullnews starts, it writes a PID file.
    If pullnews is accidentally restarted, it will report that pullnews is already running and that's perfect.
    But after this message pullnews deletes the PID file :(

    That's very annoying as I restart pullnews from cron every 15 minutes because it occasionally stops for some unknown reason.

    Other than that it works smoothly ;)

    Eli.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Thu Mar 30 13:21:16 2023
    Hi Eli,

    I think there is a bug in this latest version.

    When pullnews starts, it writes a PID file.
    If pullnews is accidentally restarted, it will report that pullnews is already
    running and that's perfect.
    But after this message pullnews deletes the PID file :(

    Thanks again for your bug report; it's really appreciated to make
    pullnews better.

    When you download a new version of pullnews from the Git repository, do
    not forget to manually edit the first 2 lines of the file which should
    reflect the paths to Perl and INN::Config. (You can take the same as
    the ones in other Perl scripts like mailpost.)
    It gives something like:

    #! /usr/bin/perl -w
    use lib '/usr/share/perl5'; use INN::Config;



    The issue you're facing is only triggered when you run pullnews outside
    an INN installation (which is a possible use of pullnews, as it could be
    run from a separate server).
    There are 2 branches at several places in the code. PID file handling
    was wrong in one of these branches, as you noticed.

    I believe the issue is now fixed, and you can download the updated
    pullnews script from the Git repository.

    --
    Julien ÉLIE

    « Les amis de la vérité sont ceux qui la cherchent, et non ceux qui se
    vantent de l'avoir trouvée. » (Condorcet)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Thu Mar 30 19:11:26 2023
    Hi Eli,

    Another question:
    How can I delete all messages up to a certain date in a newsgroup immediately.
    The messages were received via pullnews, so the date received is the same for all messages.

    When you say "up to a certain date", do you mean before or after?
    If it is before, then just use expire.ctl and run news.daily so as to
    expire articles.
    Verify that you do not use the "-p" option in the expireoverflags and
    flags options of the news.daily command. (By default, arrival time is
    used, and "-p" switches to the actual date in Date header fields).


    If it is after, then it is a bit more complicated...
    Your history file in <pathdb> contains lines with several timestamps.
    You can find the storage tokens of articles arrived after March, 1st
    with the following commands:

    % convdate -n '01 Mar 2023 00:00 +0000'
    1677628800

    % perl -ne 'chomp; our ($hash, $timestamps, $_) = split " "; my
    ($arrived, $expires, $posted) = split("~", $timestamps); print "$_\n" if
    $_ and $arrived >= 1677628800' history

    Pipe the result to "sm -r" to remove these articles.

    --
    Julien ÉLIE

    « Les amis de la vérité sont ceux qui la cherchent, et non ceux qui se
    vantent de l'avoir trouvée. » (Condorcet)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Thu Mar 30 20:01:23 2023
    On 30 Mar 2023 at 13:21:16 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    The issue you're facing is only triggered when you run pullnews outside
    an INN installation (which is a possible use of pullnews, as it could be
    run from a separate server).
    There are 2 branches at several places in the code. PID file handling
    was wrong in one of these branches, as you noticed.

    I believe the issue is now fixed, and you can download the updated
    pullnews script from the Git repository.

    Hi Julien,

    The problem has been solved :)

    Thank you for the speedy fix. It works great now.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Thu Mar 30 20:02:41 2023
    On 30 Mar 2023 at 19:11:26 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    When you say "up to a certain date", do you mean before or after?
    If it is before, then just use expire.ctl and run news.daily so as to
    expire articles.
    Verify that you do not use the "-p" option in the expireoverflags and
    flags options of the news.daily command. (By default, arrival time is
    used, and "-p" switches to the actual date in Date header fields).


    If it is after, then it is a bit more complicated...
    Your history file in <pathdb> contains lines with several timestamps.
    You can find the storage tokens of articles arrived after March, 1st
    with the following commands:

    % convdate -n '01 Mar 2023 00:00 +0000'
    1677628800

    % perl -ne 'chomp; our ($hash, $timestamps, $_) = split " "; my
    ($arrived, $expires, $posted) = split("~", $timestamps); print "$_\n" if
    $_ and $arrived >= 1677628800' history

    Pipe the result to "sm -r" to remove these articles.

    Wow super! I'll archive this one.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to All on Thu Apr 6 11:38:25 2023
    If I am correct, as an admin it is possible to cancel a message posted locally by a user by means of the 'canlockadmin' secret in inn-secrets.conf.

    However, it is not clear to me how to cancel these messages as admin.
    Is 'gencancel' the right and only tool for this?

    Can someone point me in the right direction please.

    Thank you.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Franck@21:1/5 to All on Thu Apr 6 17:55:24 2023
    Hello,

    Once you have edited the inn-secrets.conf file (https://www.eyrie.org/~eagle/software/inn/docs/inn-secrets.conf.html)
    and the "cancels" group, each article posted on your server will have :

    - One (or more) Cancel-Lock added by the server (if the client used by
    the user does not support CL/CK). This allows the user to cancel his own articles.

    - One (or more) Cancel-Locks added by your server (if you have defined canlockadmin). This CL is used by the administrator to cancel all
    articles posted on his server.

    To do this, you actually need to use Gencancel to generate the correct Cancel-key (Gencancel use the admin secret) that will match one hash of
    the Cancel-Lock of the article to cancel.

    https://www.eyrie.org/~eagle/software/inn/docs/gencancel.html

    For me, Gencancel is the only tool to do the job but as I am not an INN
    expert, maybe Julien or Russ can give you more information (even if I
    think the documentation is well enough written.

    Have nice day.
    Franck

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to Franck on Thu Apr 6 16:39:04 2023
    On 6 Apr 2023 at 17:55:24 CEST, "Franck" <franck@email.invalid> wrote:

    Hello,

    Once you have edited the inn-secrets.conf file (<https://www.eyrie.org/~eagle/software/inn/docs/inn-secrets.conf.html>)
    and the "cancels" group, each article posted on your server will have :

    ...

    For me, Gencancel is the only tool to do the job but as I am not an INN expert, maybe Julien or Russ can give you more information (even if I
    think the documentation is well enough written.

    Have nice day.
    Franck

    Thanks for the answer.

    However, for some reason it's not working as expected.

    I've set the secret for 'canlockadmin' in inn-secrets.conf.

    In readers.conf (just in case):

    auth "localhost" {
    hosts: "localhost, 127.0.0.1, ::1, stdin"
    default: "<localhost>"
    }

    access "localhost" {
    users: "<localhost>"
    newsgroups: "*"
    access: RPA
    addcanlockuser: none
    }

    The following command works as expected and returns a full cancellation
    header:
    su news -s /bin/sh -c "/usr/local/news/bin/gencancel -n '<newsgroup>' '<msg-id>'" | su news -s /bin/sh -c "/usr/local/news/bin/inews -h -P -D"

    However, when using the above command without the -D gives:
    inews: warning: What server?
    inews: article will be spooled

    And a file named 'dead.article' is writing into the home dir of user 'news'.

    Any ideas what goes wrong?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Thu Apr 6 21:00:09 2023
    Hi Eli,

    For me, Gencancel is the only tool to do the job

    That's right Franck :)
    Eli, is there something unclear in the gencancel documentation that
    should be improved? If that's the case, what should be written and where?


    su news -s /bin/sh -c "/usr/local/news/bin/gencancel -n '<newsgroup>' '<msg-id>'" | su news -s /bin/sh -c "/usr/local/news/bin/inews -h -P -D"

    However, when using the above command without the -D gives:
    inews: warning: What server?
    inews: article will be spooled >
    Any ideas what goes wrong?

    inews tries to connect to the server set in the "server" parameter in
    inn.conf. I guess this parameter is unset.

    I'll add the name of the parameter in the inews manual page. It
    currently just mentions "inews sends the article to the local news
    server as specified in inn.conf".

    And inews is not listed in the names of the programs for which the
    "server" parameter is used in inn.conf... Also added.

    --
    Julien ÉLIE

    « Traversez la rivière en foule, le crocodile ne vous mangera pas. »
    (proverbe malgache)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Thu Apr 6 21:09:23 2023
    Eli, is there something unclear in the gencancel documentation that
    should be improved?  If that's the case, what should be written and where?

    FWIW, I suggest in gencancel, these additions surrounded with "*" to try
    to make it clearer:


    If INN was built with Cancel-Lock support, gencancel will generate the
    right *admin* Cancel-Key header field to use in order to authenticate
    cancels.

    [...]

    In case you only need the *admin* Cancel-Key hashes, you can use the -k
    flag.

    [...]

    Instead of outputting a whole cancel control message, gencancel will
    just output the body of the *admin* Cancel-Key header field.

    [...]

    To only retrieve the *admin* Cancel-Key hashes associated to the given Message-ID:

    [...]

    If it all looks good, then inject it into the news system (without
    giving -D to inews):

    gencancel '<mid@news>' | inews -h -P

    *Note that inews sends the message to the server specified in the server parameter in inn.conf.*



    I hope it would have made your life easier!

    --
    Julien ÉLIE

    « Quand on demande aux gens d'observer le silence, au lieu de l'observer
    comme on observe une éclipse de lune, ils l'écoutent ! » (Raymond
    Devos)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Thu Apr 6 21:59:47 2023
    On 6 Apr 2023 at 21:00:09 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    inews tries to connect to the server set in the "server" parameter in inn.conf. I guess this parameter is unset.

    I'll add the name of the parameter in the inews manual page. It
    currently just mentions "inews sends the article to the local news
    server as specified in inn.conf".

    And inews is not listed in the names of the programs for which the
    "server" parameter is used in inn.conf... Also added.

    Hi Julien,

    Thank you for answering.
    It was the 'server' setting indeed.

    I'm getting close, but now inews is trying to connect using the
    IPv6 address of the server, instead of IPv4.
    I'll have to figure out how to make a workaround for this, especially since
    I'm using xinetd for nnrpd.

    I have created an IPv6 -> IPv4 proxy in xinetd. That works, but then INN reports that I don't have permission to cancel messages :(

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Apr 7 09:30:02 2023
    Hi Eli,

    It was the 'server' setting indeed.

    I'm getting close, but now inews is trying to connect using the
    IPv6 address of the server, instead of IPv4.

    Did you try to just put the IPv4 address of your server in the "server" setting? (instead of its hostname)
    It will maybe work (I have not tested).


    I'll have to figure out how to make a workaround for this, especially since I'm using xinetd for nnrpd.

    In your readers.conf file, you mentioned:

    auth "localhost" {
    hosts: "localhost, 127.0.0.1, ::1, stdin"
    default: "<localhost>"
    }

    Maybe you should also add the DNS (and IP) of your server?

    --
    Julien ÉLIE

    « Quand on demande aux gens d'observer le silence, au lieu de l'observer
    comme on observe une éclipse de lune, ils l'écoutent ! » (Raymond
    Devos)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri Apr 7 13:57:35 2023
    On 22 Mar 2023 at 23:49:55 CET, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    Another question, is it possible to limit the maximum number of connections >> per authenticated user? I know this is possible for peers, but can this also >> be set up for authenticated users? Maybe a setting in readers.conf or nnrpd >> that I'm overlooking?

    Unfortunately, the response is no. There's no native way of limiting
    users' connections.
    You may want to write a custom authentication hook (perl_auth or
    python_auth in readers.conf) that would do the job by accounting how
    many connections are open by a given user, and deny access if it exceeds
    the limit. I am not aware of existing scripts to do that :-(

    It could be worthwhile having though, as you're not the first one to ask
    (but nobody wrote or shared what he came up with).

    Hi Julies,

    The nnrpd manual states:

    "As each command is received, nnrpd tries to change its "argv" array so that ps(1) will print out the command being executed."

    This will then look like this:
    nnrpd: <xxx.xxx.xxx.xxx> GROUP
    nnrpd: <xxx.xxx.xxx.xxx> XOVER

    Is it perhaps also possible to add the authenticated user to this?

    Something like:
    nnrpd: <xxx.xxx.xxx.xxx> Eli GROUP
    nnrpd: <xxx.xxx.xxx.xxx> Eli XOVER

    This would make it possible to limit the number of connections per user via a perl script.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri Apr 7 13:36:04 2023
    On 7 Apr 2023 at 09:30:02 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    It was the 'server' setting indeed.

    I'm getting close, but now inews is trying to connect using the
    IPv6 address of the server, instead of IPv4.

    Did you try to just put the IPv4 address of your server in the "server" setting? (instead of its hostname)
    It will maybe work (I have not tested).

    Hello Julien,

    Yes, that was the trick.
    It works now :)

    Thank you.

    I have another question about the settings in expiration.ctl.

    What settings should I use if I want to delete all posts older than 90 days
    for 1 specific newsgroup (e.g. linux.debian.bugs.dist).

    As far as I can read in the documentation, this mainly concerns the middle field (the default value) and the first and last fields (keep and purge) are only important for messages that have an expiration header. But if I set the <purge> field lower than the <default> field, inncheck still throws a warning. So it seems that the <keep and purge> fields still affect the <default> value.

    linux.debian.bugs.dist:AX:0:90:11
    inncheck returns: purge `11' younger than default `90'

    linux.debian.bugs.dist:AX:0:90:90
    seems good.

    What I would like is to have all messages older than 90 days deleted immediately and messages with the expiration header deleted immediately after the expiration date.

    Which setting do you recommend for this in expire.ctl?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Apr 7 17:23:46 2023
    Hi Eli,

    I have another question about the settings in expire.ctl.

    What settings should I use if I want to delete all posts older than 90 days for 1 specific newsgroup (e.g. linux.debian.bugs.dist).

    As far as I can read in the documentation, this mainly concerns the middle field (the default value) and the first and last fields (keep and purge) are only important for messages that have an expiration header. But if I set the <purge> field lower than the <default> field, inncheck still throws a warning.
    So it seems that the <keep and purge> fields still affect the <default> value.

    linux.debian.bugs.dist:AX:0:90:11
    inncheck returns: purge `11' younger than default `90'

    """
    <pattern>:<flag>:<min>:<default>:<max>

    The middle field, <default>, will be used as the expiration period for
    most articles. The other two fields, <min> and <max>, only come into
    play if the article requests a particular expiration date with an
    Expires header field. Articles with an Expires header field will be
    expired at the date given in that header field, subject to the
    constraints that they will be retained at least <min> days and no
    longer than <max> days.

    One should think of the fields as a lower bound, the default, and an
    upper bound. Since most articles do not have an Expires header field,
    the second field is the most important and most commonly applied.

    It is often useful to honor the Expires header field in articles,
    especially those in moderated groups. To do this, set <min> to zero,
    <default> to whatever normal expiration you wish, and <max> to "never"
    or some large number, like 365 days for a maximum article life of a
    year.
    """


    linux.debian.bugs.dist:AX:0:90:90
    seems good.

    It is indeed what I would parameter for this newsgroup.


    What I would like is to have all messages older than 90 days deleted immediately and messages with the expiration header deleted immediately after the expiration date.

    That's what the 0:90:90 setting does.

    With 0:90:11, suppose that you have an article with an Expires header
    field corresponding to 30 days, it would be deleted after 11 days which
    is not what you were expecting. That's the reason of the warning from inncheck; it looks unusual to force the deletion of articles with an
    Expires header field sooner than other articles. (The date in the
    Expires header field will still be respected.)

    --
    Julien ÉLIE

    « A killfile on Usenet can get you peace and quiet. A killfile in the
    real world can get you twenty years to life. » (Nils Nieuwjaar)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Fri Apr 7 16:38:26 2023
    On 7 Apr 2023 at 17:23:46 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    With 0:90:11, suppose that you have an article with an Expires header
    field corresponding to 30 days, it would be deleted after 11 days which
    is not what you were expecting. That's the reason of the warning from inncheck; it looks unusual to force the deletion of articles with an
    Expires header field sooner than other articles. (The date in the
    Expires header field will still be respected.)

    Another stunning explanation.
    Thanks again, Julien!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Tue Apr 11 12:12:12 2023
    Hi Eli,

    The nnrpd manual states:

    "As each command is received, nnrpd tries to change its "argv" array so that ps(1) will print out the command being executed."

    This will then look like this:
    nnrpd: <xxx.xxx.xxx.xxx> GROUP
    nnrpd: <xxx.xxx.xxx.xxx> XOVER

    Is it perhaps also possible to add the authenticated user to this?

    Something like:
    nnrpd: <xxx.xxx.xxx.xxx> Eli GROUP
    nnrpd: <xxx.xxx.xxx.xxx> Eli XOVER

    This would make it possible to limit the number of connections per user via a perl script.

    It is indeed possible to use that "feature".

    If you can rebuild INN from sources, just change the following command
    in nnrpd/nnrpd.c:

    - setproctitle("%s %s", Client.host, av[0]);
    +
    + setproctitle("%s %s %s", Client.host,
    + PERMuser[0] != '\0' ? PERMuser : "-", av[0]);


    I then have lines like that in a ps output:

    nnrpd: accepting connections
    nnrpd: 176-143-2-105.abo.bbox.fr julien GROUP
    nnrpd: 5.14.145.68 <all> LIST


    I am unsure if this would be worth having in an official release; there
    may be privacy concerns. Maybe it should be configurable with a
    readers.conf option (like addprocesstitleuser which would enable that
    behaviour when set to true in an access group).

    --
    Julien ÉLIE

    « Et maintenant, la balle est dans le camp des slalomeurs. »

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Tue Apr 11 17:23:12 2023
    On 11 Apr 2023 at 12:12:12 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Eli,

    The nnrpd manual states:

    "As each command is received, nnrpd tries to change its "argv" array so that >> ps(1) will print out the command being executed."

    This will then look like this:
    nnrpd: <xxx.xxx.xxx.xxx> GROUP
    nnrpd: <xxx.xxx.xxx.xxx> XOVER

    Is it perhaps also possible to add the authenticated user to this?

    Something like:
    nnrpd: <xxx.xxx.xxx.xxx> Eli GROUP
    nnrpd: <xxx.xxx.xxx.xxx> Eli XOVER

    This would make it possible to limit the number of connections per user via a
    perl script.

    It is indeed possible to use that "feature".

    If you can rebuild INN from sources, just change the following command
    in nnrpd/nnrpd.c:

    - setproctitle("%s %s", Client.host, av[0]);
    +
    + setproctitle("%s %s %s", Client.host,
    + PERMuser[0] != '\0' ? PERMuser : "-", av[0]);

    That is great news.
    Thanks !

    I am unsure if this would be worth having in an official release; there
    may be privacy concerns. Maybe it should be configurable with a
    readers.conf option (like addprocesstitleuser which would enable that behaviour when set to true in an access group).

    I don't know if many people will use this feature, but it is nice if INN supports it. Making it configurable is a good idea.

    About pullnews:

    In pullnews I use the '-w -1000000' option to download the 1 million most recent articles per newsgroup.

    This works fine, but when pullnews is restarted (for example after a server timeout), pullnews will redownload all already downloaded articles.

    It does come with the message that the articles already exist, but when there are almost a million per newsgroup, it is not very pleasant.

    It would be nice if pullnews continued downloading where it left off. For example, only if the high water mark >0.

    Could you improve this on pullnews?
    Thanks.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Wed Apr 12 10:33:39 2023
    Hi Eli,

    I am unsure if this would be worth having in an official release; there
    may be privacy concerns. Maybe it should be configurable with a
    readers.conf option (like addprocesstitleuser which would enable that
    behaviour when set to true in an access group).

    I don't know if many people will use this feature, but it is nice if INN supports it. Making it configurable is a good idea.

    OK, I'll have a look for a future version.


    About pullnews:

    In pullnews I use the '-w -1000000' option to download the 1 million most recent articles per newsgroup.

    This works fine, but when pullnews is restarted (for example after a server timeout), pullnews will redownload all already downloaded articles.

    It does come with the message that the articles already exist, but when there are almost a million per newsgroup, it is not very pleasant.

    It would be nice if pullnews continued downloading where it left off. For example, only if the high water mark >0.

    Could you improve this on pullnews?

    Ah, I see the point.
    You may want to use "-O" right now to prevent the download of already
    existing articles.

    When pullnews automatically restart after a timeout, -w is indeed
    applied again. After a quick look at all the options, only -w should be improved in that case, for not being applied on the newsgroup it was
    treating when the timeout occurred, but directly restart from where it
    was. I'll provide a patch (not today however but probably tomorrow, I
    hope that's fine for you).

    I assume you do not use '-w -1000000' for another run of pullnews
    (manual or out of cron); it would otherwise be normal for it to download
    again a million of articles.

    --
    Julien ÉLIE

    « Ils ont refusé une offre de Normand ?!? » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Eli@21:1/5 to iulius@nom-de-mon-site.com.invalid on Wed Apr 12 10:09:50 2023
    On 12 Apr 2023 at 10:33:39 CEST, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    About pullnews:

    In pullnews I use the '-w -1000000' option to download the 1 million most
    recent articles per newsgroup.

    This works fine, but when pullnews is restarted (for example after a server >> timeout), pullnews will redownload all already downloaded articles.

    It does come with the message that the articles already exist, but when there
    are almost a million per newsgroup, it is not very pleasant.

    It would be nice if pullnews continued downloading where it left off. For
    example, only if the high water mark >0.

    Could you improve this on pullnews?

    I assume you do not use '-w -1000000' for another run of pullnews
    (manual or out of cron); it would otherwise be normal for it to download again a million of articles.

    Yes, I do because there are more newsgroups in the pullnews.mark file that may not have been downloaded yet.

    But I modified pullnews with the line below and this seems to work fine:

    if (defined $watermark) {
    printf LOG "\tOur previous highest: %d\n", $prevHigh if not $quiet;
    $high = $watermark;
    $high = $last + $watermark if substr($watermark, 0, 1) eq '-';
    + $high = $prevHigh if $prevHigh > 0;
    $high = 0 if $high < 0;$shash->{$group} = [time, $high];
    }

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Thu Apr 13 13:55:20 2023
    Hi Eli,

    But I modified pullnews with the line below and this seems to work fine:

    if (defined $watermark) {
    printf LOG "\tOur previous highest: %d\n", $prevHigh if not $quiet;
    $high = $watermark;
    $high = $last + $watermark if substr($watermark, 0, 1) eq '-';
    + $high = $prevHigh if $prevHigh > 0;
    $high = 0 if $high < 0;
    $shash->{$group} = [time, $high];
    }

    I am unsure this change does what you expected.
    On *each* run, including the first one before any timeout occurs, $high will be forced to $prevHigh, and therefore -w won't be taken into account.
    $prevHigh is the highest article number previously downloaded (as recorded in pullnews.marks). In case it is "0" or unset in pullnews.marks, -w will work, but in case pullnews.marks has for instance recorded 500, and you use "-w -100", nothing will
    happen ($high won't be set to 400 to force the re-downloading of the latest 100 articles).

    I would suggest the following patch instead:

    --- a/frontends/pullnews
    +++ b/frontends/pullnews
    @@ -560,6 +560,7 @@ if (not $quiet and not $quietness) {
    }

    my $connectionAttempts = 0;
    +my %groupsStarted = ();

    UPSTREAM:
    foreach my $server (@servers) {
    @@ -689,6 +690,7 @@ foreach my $server (@servers) {
    } continue {
    # Reinitialize the counter for the next server.
    $connectionAttempts = 0;
    + %groupsStarted = ();
    }

    saveConfig();
    @@ -852,12 +854,17 @@ sub crossFeedGroup {
    printf LOG "\t%d article%s available (first %d, last %d)\n",
    $narticles, $narticles != 1 ? "s" : "", $first, $last;
    }
    - if (defined $watermark) {
    +
    + # Do not set several times the water mark to another value. Just go on
    + # downloading articles from the last retrieved one when the connection
    + # timed out.
    + if (defined($watermark) and !exists($groupsStarted{$group})) {
    printf LOG "\tOur previous highest: %d\n", $prevHigh if not $quiet;
    $high = $watermark;
    $h
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Apr 14 22:44:25 2023
    Hi Jesse,

    [news@spool1 ~]$ cat pullnews4.marks
    [news@spool1 ~]$

    Could you try to add an explicit message error ?

    close(FILE) or die "can't close $groupFile: $!\n";;

    Sure, may be a few days before I reply back with results!

    Is the problem of an empty pullnews.marks file still happening?

    --
    Julien ÉLIE

    « Le caramel est un invité du palais qui menace la couronne. » (Tristan
    Bernard)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Meow@21:1/5 to All on Wed Jul 19 17:34:44 2023
    I really have no idea how much data these newsgroups take up.
    from 1982 to 1991 is 191 tapes or 10 gigs so multiply that and youll
    figure it out (then there were no binary newsgroups)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Meow@21:1/5 to All on Wed Jul 19 17:38:58 2023
    Depends, how much spam and low value articles you can filter out.
    20-30GB per year is comfortable.
    You can do with way less, if you have curated list of groups and good
    spam filter.
    i am starting a server later with 4tb (that is the only spare hdd i have)
    on a rasperry pi with a leaf node runing leafnode of thee other server
    with a smaller ssd. (im backdating viaw olduse.net and giganews to
    backdate 1982-1991 and 1994-preesent) so i have way more than enoth

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Meow@21:1/5 to All on Wed Jul 19 17:40:32 2023
    Storage capacity wise, I've got 20 years of the Big8 consuming ~750GB.
    On a server with ZFS using CNFS buffers with INN, this can compress down
    to about 300GB using default ZFS compression.

    if thats binary and text than good thing ive got 4tb

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)