• Inject articles

    From Nigel Reed@21:1/5 to All on Sun Jun 26 03:29:38 2022
    Hi all,

    I posed the question about adding old articles to my news server and
    after receiving various suggestions and a nice conversation with Jesse
    Rehmer, I think I am going to go the route of signing up with a
    provider and sucking down their feed.

    Having never done this before, I'm after some advice.

    First, I'd like to pick and choose which groups I want to get, for a
    test, and then grab whole hierarchies such as soc.*

    Second, I know there's a few programs that'll do this, I believe INN
    comes with one and there's suck, I think. What would be the best one
    to use?

    Finally, I absolutely positively do not want to propagate these new
    (but really old) articles to my peers. Last thing I want is every news
    admin calling for my head on a block. What would be the best/proper way
    to ensure these articles don't get sent out once they're injected?

    Finally finally...anything else I haven't thought of that I should
    consider before doing this? (apart from the obvious amount of bandwidth
    and disk space) etc?

    I appreciate the feedback and knowledge of those who've delved into
    this more than I have.

    Thanks,
    Nigel

    --
    End Of The Line BBS - Plano, TX
    telnet endofthelinebbs.com 23

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sun Jun 26 13:51:43 2022
    Hi Nigel,

    First, I'd like to pick and choose which groups I want to get, for a
    test, and then grab whole hierarchies such as soc.*

    OK, you can begin with a few messages from a newsgroup to ensure
    everything is OK (accepted by your news server, and not propagated to
    others). Then all the history of a whole newsgroup, and then a whole hierarchy, and then other hierarchies.


    Second, I know there's a few programs that'll do this, I believe INN
    comes with one and there's suck, I think. What would be the best one
    to use?

    INN comes with pullnews:
    https://www.eyrie.org/~eagle/software/inn/docs/pullnews.html

    I've not compared how fast pullnews and suck perform each other. I
    believe both of them will do the job for you.


    Finally, I absolutely positively do not want to propagate these new
    (but really old) articles to my peers. Last thing I want is every news
    admin calling for my head on a block. What would be the best/proper way
    to ensure these articles don't get sent out once they're injected?

    Good question.
    I assume your news server is already receiving and transferring news
    with peers.

    Maybe other people will have a better suggestion. I would just use
    something like "pullnews -F pulled" to add "pulled" in the Path header
    field of articles you're pulling. And for every outgoing feed
    parametered in your newsfeeds file, add "pulled" in the exclusion pattern:

    news.server.com/pulled:*:Tm:innfeed!


    Finally finally...anything else I haven't thought of that I should
    consider before doing this? (apart from the obvious amount of bandwidth
    and disk space) etc?

    I assume you've read the beginning of:
    https://www.eyrie.org/~eagle/faqs/inn.html#S6.4

    notably saying to disable Perl and Python filter hooks, and parametring
    INN not to reject articles older than 10 days (which is the default).
    Also make sure to correctly configure expire.ctl not to expire the
    articles :-)

    ... and choose well your overview and storage methods :-)
    If you're using INN 2.6.x, maybe tradindexed (overview) and CNFS
    (storage) would be the best. Make sure to create the right amount of
    CNFS buffers so that they do not wrap and erase old articles. You can
    add new ones whenever you want.

    --
    Julien ÉLIE

    « Le café est un breuvage qui fait dormir quand on n'en prend pas. »
    (Alphonse Allais)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nigel Reed@21:1/5 to iulius@nom-de-mon-site.com.invalid on Mon Jun 27 00:34:52 2022
    On Sun, 26 Jun 2022 13:51:43 +0200
    Julien ÉLIE <iulius@nom-de-mon-site.com.invalid> wrote:

    INN comes with pullnews:
    https://www.eyrie.org/~eagle/software/inn/docs/pullnews.html

    I've not compared how fast pullnews and suck perform each other. I
    believe both of them will do the job for you.

    I will probably go with pullnews since it comes with INN. Seems a good
    place to start.

    Good question.
    I assume your news server is already receiving and transferring news
    with peers.

    That is a correct assumption'


    Maybe other people will have a better suggestion. I would just use something like "pullnews -F pulled" to add "pulled" in the Path
    header field of articles you're pulling. And for every outgoing feed parametered in your newsfeeds file, add "pulled" in the exclusion
    pattern:

    news.server.com/pulled:*:Tm:innfeed!


    I think that is what Jesse suggested so 2 votes for that method.

    I assume you've read the beginning of:
    https://www.eyrie.org/~eagle/faqs/inn.html#S6.4

    Never assume :) I will take a look.

    notably saying to disable Perl and Python filter hooks, and
    parametring INN not to reject articles older than 10 days (which is
    the default). Also make sure to correctly configure expire.ctl not to
    expire the articles :-)

    Ah yes, that bit I am aware of.

    *:A:never:never:never

    I do believe that should keep articles for a long time.

    ... and choose well your overview and storage methods :-)
    If you're using INN 2.6.x, maybe tradindexed (overview) and CNFS
    (storage) would be the best. Make sure to create the right amount of
    CNFS buffers so that they do not wrap and erase old articles. You
    can add new ones whenever you want.

    I am using 2.6.x. Just regular disk storage, will be easier for me to
    throw more disk space at it, if I need it.

    I appreciate the suggestions, I think they were in line with what I was expecting.

    Thanks,
    Nigel




    --
    End Of The Line BBS - Plano, TX
    telnet endofthelinebbs.com 23

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Mon Jun 27 19:47:31 2022
    Hi Nigel,

    *:A:never:never:never

    I do believe that should keep articles for a long time.

    A loooong time :)


    ... and choose well your overview and storage methods :-)
    If you're using INN 2.6.x, maybe tradindexed (overview) and CNFS
    (storage) would be the best. Make sure to create the right amount of
    CNFS buffers so that they do not wrap and erase old articles. You
    can add new ones whenever you want.

    I am using 2.6.x. Just regular disk storage, will be easier for me to
    throw more disk space at it, if I need it.

    When speaking of CNFS for storage, I meant regular disk storage with
    "method cnfs" in storage.conf instead of "method tradspool" (the default). There's 1 file per article with tradspool whereas CNFS uses large buffer
    files containing lots of articles (even millions if the buffer is large enough).
    If you never expire articles, tradspool will consume more and more inodes.

    (timecaf and timehash are other storage methods, but less used than the
    above two ones.)

    --
    Julien ÉLIE

    « – Vous refusez de porter pilum ?
    – Ben… On préfère se faire porter pâles… » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to All on Mon Jun 27 18:51:43 2022
    ALSO, BEWARE - If you're using a commercial provider they are going to have *tons* of binary articles in the Big8 groups.

    While its recommended to turn filtering off, I opted to disable basically all the checks inside pyClean exclude the misplaced binary check. If you don't, you'll waste terabytes of space with crap that doesn't belong.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to All on Mon Jun 27 18:35:56 2022
    Another tip, pulling lots of articles over a single connection, regardless how fast the upstream server, is very slow. The nice thing about pullnews is that it only uses one config file, and you can create many configuration files and have multiple pullnews instances running at the same time.

    What I did was create many pullnewsXX.marks files that initially contained the first line (server user password), got a list of groups to copy/paste (example below prints all comp.* groups from the db/active file separated by commas):

    grep -E '^comp\.' db/active | sort | awk '{printf "%s%s",(NR>1?",":""),$1} END{print ""}'

    Then took chunks of groups at a time and ran pullnews inside a screen, doing
    up to 30 at a time with something like this:

    screen -S comp1 "pullnews -c pullnews01.marks -F fakepathname -G comp.admin.policy,comp.ai,comp.ai.alife,comp.ai.doc-analysis.misc,comp.ai.doc -analysis.ocr"

    screen -S comp2 "pullnews -c pullnews02.marks -F fakepathname -G comp.ai.edu,comp.ai.fuzzy,comp.ai.games,comp.ai.genetic,comp.ai.nat-lang,comp .ai.neural-nets,comp.ai.philosophy,comp.ai.shells,comp.ai.vision"

    Happy slurping :)

    -Jesse

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Nigel Reed@21:1/5 to iulius@nom-de-mon-site.com.invalid on Mon Jun 27 18:33:20 2022
    On Mon, 27 Jun 2022 19:47:31 +0200
    Julien ÉLIE <iulius@nom-de-mon-site.com.invalid> wrote:


    When speaking of CNFS for storage, I meant regular disk storage with
    "method cnfs" in storage.conf instead of "method tradspool" (the
    default). There's 1 file per article with tradspool whereas CNFS uses
    large buffer files containing lots of articles (even millions if the
    buffer is large enough).
    If you never expire articles, tradspool will consume more and more
    inodes.

    Ah right, sorry. This isn't my day job ;)

    Is it possible to convert from tradspool to cnfs or do I need to start
    from scratch? I guess this might be in the faq so I'll look there
    anyway.

    I'm concerned about allocating enough space for all the articles I'm
    about to drag down v over allocating and wasting space that won't be
    used.


    --
    End Of The Line BBS - Plano, TX
    telnet endofthelinebbs.com 23

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Tue Jun 28 19:08:51 2022
    Hi Nigel,

    Is it possible to convert from tradspool to cnfs or do I need to start
    from scratch? I guess this might be in the faq so I'll look there
    anyway.

    I'm unfortunately not aware of such a conversion tool.
    I guess you have to re-feed all your articles to another INN instance
    which will store the articles in CNFS.
    Or you could also keep your existing articles in tradspool and start
    using CNFS for new ones (just update storage.conf).


    I'm concerned about allocating enough space for all the articles I'm
    about to drag down v over allocating and wasting space that won't be
    used.

    You may want to create several buffers in sequence (mode SEQUENTIAL in cycbuff.conf), and then remove useless buffers untouched at the end of
    the sequence if you created too many of them.

    --
    Julien ÉLIE

    « Non licet omnibus adire Corinthum. » (proverbe issu du grec)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)