• Migrating tradindexed overview to ovsqlite

    From Ray Banana@21:1/5 to All on Sat Oct 28 05:05:42 2023
    Hi,

    the current flood of spam from Google Groups and the corresponding
    amount of NoCeM messages made me consider an overview method that
    does not rely on expireover to remove the overview data for cancelled
    articles. From what I have seen on a test server, ovsqlite seems to
    remove overview data immediately. I have rebuilt the ovdb of a test
    server with ~1,5 million articles in just a couple of minutes and am now considering a migration of my main reader server (~50 million articles)
    to ovsqlite. Does anyone have experience and an estimate how long such a migration might take on ordinary SATA disks with EXT4 filesystems?


    --
    Пу́тін — хуйло́
    http://www.eternal-september.org

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to jesse.rehmer@blueworldhosting.com on Sat Oct 28 08:49:58 2023
    On Oct 28, 2023 at 3:32:52 AM CDT, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:

    On Oct 27, 2023 at 10:05:42 PM CDT, "Ray Banana" <rayban@raybanana.net> wrote:


    Hi,

    the current flood of spam from Google Groups and the corresponding
    amount of NoCeM messages made me consider an overview method that
    does not rely on expireover to remove the overview data for cancelled
    articles. From what I have seen on a test server, ovsqlite seems to
    remove overview data immediately. I have rebuilt the ovdb of a test
    server with ~1,5 million articles in just a couple of minutes and am now
    considering a migration of my main reader server (~50 million articles)
    to ovsqlite. Does anyone have experience and an estimate how long such a
    migration might take on ordinary SATA disks with EXT4 filesystems?

    Not quite the same use case, but I'm rebuilding history and moving to ovsqlite
    on 1.5TB / 500 million articles on NVMe storage and it is taking over a week.

    I should add more flavor, initially I tried rebuilding history+overview as-is (tradspool + tradindexed) and ran into a filesystem issue (filled the ZFS pool too full and performance suffered greatly). That attempt ran from October 14th through the 24th before I ended it.

    Added storage, rebalanced the ZFS pool, and decided to move to ovsqlite since others had commented it is faster to rebuild than tradindexed. That rebuild
    has been running since 10/23 and appears to be about half done based on the size of the new history file.

    I've been able to rsync and ZFS send/receive the entire data set to other servers four times during this period, so this slow behavior isn't a
    limitation of my system or storage, jacking up the number of lines makehistory processes does not seem to make any difference for me either.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to Ray Banana on Sat Oct 28 08:32:52 2023
    On Oct 27, 2023 at 10:05:42 PM CDT, "Ray Banana" <rayban@raybanana.net> wrote:


    Hi,

    the current flood of spam from Google Groups and the corresponding
    amount of NoCeM messages made me consider an overview method that
    does not rely on expireover to remove the overview data for cancelled articles. From what I have seen on a test server, ovsqlite seems to
    remove overview data immediately. I have rebuilt the ovdb of a test
    server with ~1,5 million articles in just a couple of minutes and am now considering a migration of my main reader server (~50 million articles)
    to ovsqlite. Does anyone have experience and an estimate how long such a migration might take on ordinary SATA disks with EXT4 filesystems?

    Not quite the same use case, but I'm rebuilding history and moving to ovsqlite on 1.5TB / 500 million articles on NVMe storage and it is taking over a week.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to jesse.rehmer@blueworldhosting.com on Sat Oct 28 15:55:39 2023
    On Oct 28, 2023 at 3:49:58 AM CDT, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:

    On Oct 28, 2023 at 3:32:52 AM CDT, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:

    On Oct 27, 2023 at 10:05:42 PM CDT, "Ray Banana" <rayban@raybanana.net> wrote:


    Hi,

    the current flood of spam from Google Groups and the corresponding
    amount of NoCeM messages made me consider an overview method that
    does not rely on expireover to remove the overview data for cancelled
    articles. From what I have seen on a test server, ovsqlite seems to
    remove overview data immediately. I have rebuilt the ovdb of a test
    server with ~1,5 million articles in just a couple of minutes and am now >>> considering a migration of my main reader server (~50 million articles)
    to ovsqlite. Does anyone have experience and an estimate how long such a >>> migration might take on ordinary SATA disks with EXT4 filesystems?

    Not quite the same use case, but I'm rebuilding history and moving to ovsqlite
    on 1.5TB / 500 million articles on NVMe storage and it is taking over a week.

    I should add more flavor, initially I tried rebuilding history+overview as-is (tradspool + tradindexed) and ran into a filesystem issue (filled the ZFS pool
    too full and performance suffered greatly). That attempt ran from October 14th
    through the 24th before I ended it.

    Added storage, rebalanced the ZFS pool, and decided to move to ovsqlite since others had commented it is faster to rebuild than tradindexed. That rebuild has been running since 10/23 and appears to be about half done based on the size of the new history file.

    I've been able to rsync and ZFS send/receive the entire data set to other servers four times during this period, so this slow behavior isn't a limitation of my system or storage, jacking up the number of lines makehistory
    processes does not seem to make any difference for me either.

    If you aren't rebuilding the history file, you'll have a better experience. Spent some time watching system calls and filesystem activity, and the
    majority of what makehistory is spending time on in my case is dbz related stuff. The ovsqlite-server process is barely doing anything in comparison.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Sat Oct 28 22:38:56 2023
    Hi Jesse,

    If you aren't rebuilding the history file, you'll have a better experience. Spent some time watching system calls and filesystem activity, and the majority of what makehistory is spending time on in my case is dbz related stuff.

    Just to be sure, did you run makehistory with the -s flag to provide the estimated number of articles?

    It reminds me the issue we recently discussed in this newsgroup about
    makedbz. I then reworded INSTALL this way:

    """
    Next, you need to create an empty history database. To do this, type:

    cd <pathdb in inn.conf>
    touch history
    makedbz -i -o

    makedbz will then create a database optimized for handling about
    6,000,000 articles (or 500,000 if the slower tagged hash format is
    used). If you expect to inject more articles than that, use the "-s"
    flag to specify the number of entries to size the initial history file
    for. To pre-size it for 100,000,000 articles, type:

    makedbz -i -o -s 100000000

    This initial size does not limit the number of articles the news server
    will accept. It will just get slower when that size is exceeded, until
    the next run of news.daily which will appropriately resize it.
    """


    I'm wondering whether you're not running into that issue with
    makehistory. I should also update its manual page to emphasize the use
    of the "-s" flag :)

    --
    Julien ÉLIE

    « Dans toute statistique, l'inexactitude du nombre est compensée par la
    précision des décimales. » (Alfred Sauvy)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to iulius@nom-de-mon-site.com.invalid on Sat Oct 28 21:53:05 2023
    On Oct 28, 2023 at 3:38:56 PM CDT, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Jesse,

    If you aren't rebuilding the history file, you'll have a better experience. >> Spent some time watching system calls and filesystem activity, and the
    majority of what makehistory is spending time on in my case is dbz related >> stuff.

    Just to be sure, did you run makehistory with the -s flag to provide the estimated number of articles?

    It reminds me the issue we recently discussed in this newsgroup about makedbz. I then reworded INSTALL this way:

    """
    Next, you need to create an empty history database. To do this, type:

    cd <pathdb in inn.conf>
    touch history
    makedbz -i -o

    makedbz will then create a database optimized for handling about
    6,000,000 articles (or 500,000 if the slower tagged hash format is
    used). If you expect to inject more articles than that, use the "-s"
    flag to specify the number of entries to size the initial history file
    for. To pre-size it for 100,000,000 articles, type:

    makedbz -i -o -s 100000000

    This initial size does not limit the number of articles the news server
    will accept. It will just get slower when that size is exceeded, until
    the next run of news.daily which will appropriately resize it.
    """


    I'm wondering whether you're not running into that issue with
    makehistory. I should also update its manual page to emphasize the use
    of the "-s" flag :)

    Hi Julien,

    I've had mixed results sizing the history file with makedbz or using the -s flag with makehistory. It seems no matter if appropriately sized or not, once the history file is around 10GB in size performance suffers dramatically.

    In the previous case, where we were talking about sizing the history file
    prior to injecting large numbers of articles to a new server, I had a similar experience. Performance started off better *initially* when sized, but after some time in the process it falls off. In that scenario I can start off injecting 10,000+ articles per second at the beginning, but it falls to around 2,000 per second once the history file reaches about 10GB or so.

    I see basically the same thing with makehistory. The first run I used the -s flag and noticed about a third of the way, according to the previous history file's size, the I/O rate drops off. I can see this in my monitoring software, the metrics of I/O trend downward over the entire process.

    On round two of makehistory rebuild I did not use the -s flag, I still see the same downward trend in performance, but I saw more I/O happening in the beginning than the first run before it slowed down.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Sun Oct 29 14:00:28 2023
    Hi Jesse,

    I've had mixed results sizing the history file with makedbz or using the -s flag with makehistory. It seems no matter if appropriately sized or not, once the history file is around 10GB in size performance suffers dramatically.

    Thanks for the feedback.
    I unfortunately do not have in mind any other setting to test, nor the
    time to audit and try to improve the performance of dbz. I guess the
    best step would be to implement a second storage method for the history
    file, for instance based on SQLite, but that's quite a work too...

    --
    Julien ÉLIE

    « Et cette même nuit, c'est-à-dire trois semaines plus tard… » (Astérix)

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jesse Rehmer@21:1/5 to iulius@nom-de-mon-site.com.invalid on Sun Oct 29 23:03:42 2023
    On Oct 29, 2023 at 8:00:28 AM CDT, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

    Hi Jesse,

    I've had mixed results sizing the history file with makedbz or using the -s >> flag with makehistory. It seems no matter if appropriately sized or not, once
    the history file is around 10GB in size performance suffers dramatically.

    Thanks for the feedback.
    I unfortunately do not have in mind any other setting to test, nor the
    time to audit and try to improve the performance of dbz. I guess the
    best step would be to implement a second storage method for the history
    file, for instance based on SQLite, but that's quite a work too...

    Understood. I think makehistory will be running for at least a few more days, if there is anything I can do/provide, let me know. I have a clone (or two or three) of this dataset if you ever need a large dataset to play with.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)