• buffindexed overview

    From User@21:1/5 to All on Wed Jan 13 18:10:32 2021
    Hi.

    ovmethod is buffindexed. How much size should have buffer files?
    INN 2.6.3

    --
    User <user@host.net>

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jan 15 23:16:39 2021
    Hi,

    ovmethod is buffindexed. How much size should have buffer files?
    INN 2.6.3

    It depends on how many articles you'll store.
    The "inndf -no" command will tell you how much your buffer files are filed.

    Just done a few tests with buffindexed for your information:

    buffindexed
    -----------

    % inndf -no
    3277740 overview records stored
    37.73% overview space used

    I set up 5 buffers of 1.5GB so it means 2,83 GB of space used.

    It corresponds to about 2,852,200 articles (each newsgroup an article is crossposted to counts for 1 overview record).

    overview rebuild in 32 mn
    expireover in 41 mn (it is the slowest method for expiry)
    space on disk: fixed, and equal to the sum of the whole buffers (though
    in this case, used space is only 2,83 GB)
    + 200 kb for group.index in <pathdb>




    With about the same number of articles, on the same news server hardware
    (a slow Intel Atom CPU N2800 1.86GHz with 2GB RAM):

    tradindexed
    -----------

    (overview rebuild duration not noted, I'll measure it again)
    expireover in 5 mn
    space on disk: 2,9 GB



    ovdb
    ----

    overview rebuild in 42 mn
    expireover in 6 mn
    space on disk: 5,8 GB



    compressed ovdb
    ---------------

    overview rebuild in 50 mn
    expireover in 5 mn
    space on disk: 3,4 GB (small overview data are kept uncompressed)



    ovsqlite (new storage method in INN 2.7.0)
    --------

    (overview rebuild duration not noted, I'll measure it again)
    expireover in 10 mn
    space on disk: 5,1 GB



    compressed ovsqlite (new storage method in INN 2.7.0)
    -------------------

    (durations not noted, I'll measure them again)
    space on disk: 1,6 GB (all overview data is compressed)



    I hope that information will be of help.

    --
    Julien ÉLIE

    « I think it's a new feature. Don't tell anyone it was an accident. »
    (Larry Wall)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Fri Jan 15 23:50:34 2021
    % inndf -no
    3277740 overview records stored
    37.73% overview space used

    I set up 5 buffers of 1.5GB so it means 2,83 GB of space used.

    It corresponds to about 2,852,200 articles (each newsgroup an article
    is crossposted to counts for 1 overview record).
    Hmmm, it's worth noticing that I have extra overview fields advertised,
    which almost double the size of each record.
    So, with only mandatory overview fields, you should consider my numbers
    for a spool of 5,5 millions articles or so.

    Hope it still helps.

    --
    Julien ÉLIE

    « Il n'y a pas moyen de contenter ceux qui veulent savoir le pourquoi
    des pourquoi. » (Leibniz)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From yamo'@21:1/5 to All on Sat Jan 16 10:21:54 2021
    Hi,

    Julien ÉLIE a tapoté le 15/01/2021 23:50:
    % inndf -no
    3277740 overview records stored
    37.73% overview space used

    I set up 5 buffers of 1.5GB so it means 2,83 GB of space used.

    It corresponds to about 2,852,200 articles (each newsgroup an article
    is crossposted to counts for 1 overview record).
    Hmmm, it's worth noticing that I have extra overview fields advertised,
    which almost double the size of each record.
    So, with only mandatory overview fields, you should consider my numbers
    for a spool of 5,5 millions articles or so.

    Hope it still helps.

    On my little server :


    $ inndf -no
    4294936 overview records stored
    Space used is meaningless for the tradindexed method

    --
    Stéphane

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=c3=89LIE?=@21:1/5 to All on Sun Jan 17 11:18:48 2021
    Hi,

    ovmethod is buffindexed. How much size should have buffer files?
    INN 2.6.3

    An update to my previous message.
    I've now done the comparison using standard overview date. Here are a
    few elements that will permit to choose the appropriate size and
    overview storage method.

    In 2020, the volume for a full-text Usenet feed is about 18,000 articles
    / day, with peaks to 1,200 articles / hour.
    Article storage size is about 65 MB / day.

    Source:
    https://www.eternal-september.org/stats/
    https://news.aioe.org/stats/innreport-reports/


    As for overview storage size, if you store more header fields in
    overview data than standard ones, the space needed to store overview
    data will be superior than what follows. It is configured in the extraoverviewadvertised and extraoverviewhidden inn.conf parameters.



    Benchwark with:
    - 3,278,095 overview records stored
    - corresponding to about 2,852,200 news articles (each newsgroup an
    article is crossposted to counts for 1 overview record)
    - for 620 newsgroups (I do not carry a full-text Usenet feed)
    - on a slow Intel Atom CPU N2800 1.86GHz with 2GB RAM

    With better hardware, time of overview rebuild and expiration will of
    course be faster.


    buffindexed
    -----------

    % inndf -no
    3278095 overview records stored
    58.31% overview space used

    I set up 2 buffers of 1,57 GB, so the fixed used space on disk is 3,15 GB.

    overview rebuild in 26 mn
    expireover in 14:30 mn
    used space on disk is fixed, and equal to the space allocated to buffers
    (3,15 GB in this case, though only 1,83 GB is really used)

    On a previous test with 5 buffers of 1,57 GB, for a total space of 7,86
    GB, overview rebuild lasted 32 mn and expireover 41 mn. The advice is therefore not to allocate too much spare space for buffindexed, and add
    new buffers when needed.



    tradindexed
    -----------

    overview rebuild in 17 mn
    expireover in 5:00 mn
    used space on disk is 2,01 GB after rebuild, and 1,77 GB after first run
    of expireover



    ovdb
    ----

    overview rebuild in 38 mn
    expireover in 4:00 mn
    used space on disk is 3,19 GB



    compressed ovdb
    ---------------

    overview rebuild in 40 mn
    expireover in 3:55 mn
    used space on disk is 2,61 GB (overview data < 600 bytes are kept
    uncompressed)



    ovsqlite (new storage method in INN 2.7.0)
    --------

    overview rebuild in 34 mn
    expireover in 6:30 mn
    used space on disk is 2,67 GB



    compressed ovsqlite (new storage method in INN 2.7.0)
    -------------------

    overview rebuild in 36 mn
    expireover in 5:20 mn
    used space on disk is 1,12 GB (all overview data is compressed)




    From inn.conf documentation:

    - buffindexed, which stores overview data and index information
    into preconfigured large files like CNFS. Fast at writing, the
    buffindexed overview storage method can keep up with a large feed
    more easily and never consumes additional disk space beyond that
    allocated to these buffers. The downside is that these buffers
    are hard to recover in case of corruption and somewhat slower for
    readers and the expiry process;

    - ovdb, which stores overview information into a Berkeley DB database,
    whose development pace has stalled these last years. This method
    is fast and very robust, but may require more disk space, unless
    compression is enabled. Overview data is fetched one article at a
    time, which makes this method a bit slower than ovsqlite for readers;

    - ovsqlite, which stores overview information into an SQLite database,
    known for its long-term stability and compatibility. Robust and
    faster than ovdb at reading ranges of overview data (since overview
    data is transferred in 128-kilobyte chunks between ovsqlite-server
    and nnrpd) but somewhat slower at writing, this method may require
    more disk space, unless compression is enabled;

    - tradindexed, which uses two files per newsgroup, one containing
    the overview data and one containing the index. Fast for readers,
    but slow to write to because it has to update two files for each
    incoming article. Its main advantage is to be the best tested,
    the most reliable and the method with the best recovery tools.

    --
    Julien ÉLIE

    « Il n'y a pas moyen de contenter ceux qui veulent savoir le pourquoi
    des pourquoi. » (Leibniz)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)