Forum: >>> Magnum BBS <<<

Migrating tradindexed overview to ovsqlite

From Ray Banana@21:1/5 to All on Sat Oct 28 05:05:42 2023

Hi,

the current flood of spam from Google Groups and the corresponding
amount of NoCeM messages made me consider an overview method that
does not rely on expireover to remove the overview data for cancelled
articles. From what I have seen on a test server, ovsqlite seems to
remove overview data immediately. I have rebuilt the ovdb of a test
server with ~1,5 million articles in just a couple of minutes and am now considering a migration of my main reader server (~50 million articles)
to ovsqlite. Does anyone have experience and an estimate how long such a migration might take on ordinary SATA disks with EXT4 filesystems?

--
Пу́тін — хуйло́
http://www.eternal-september.org

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to jesse.rehmer@blueworldhosting.com on Sat Oct 28 08:49:58 2023

On Oct 28, 2023 at 3:32:52 AM CDT, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:

On Oct 27, 2023 at 10:05:42 PM CDT, "Ray Banana" <rayban@raybanana.net> wrote:

Hi,

the current flood of spam from Google Groups and the corresponding
amount of NoCeM messages made me consider an overview method that
does not rely on expireover to remove the overview data for cancelled
articles. From what I have seen on a test server, ovsqlite seems to
remove overview data immediately. I have rebuilt the ovdb of a test
server with ~1,5 million articles in just a couple of minutes and am now
considering a migration of my main reader server (~50 million articles)
to ovsqlite. Does anyone have experience and an estimate how long such a
migration might take on ordinary SATA disks with EXT4 filesystems?

Not quite the same use case, but I'm rebuilding history and moving to ovsqlite
on 1.5TB / 500 million articles on NVMe storage and it is taking over a week.

I should add more flavor, initially I tried rebuilding history+overview as-is (tradspool + tradindexed) and ran into a filesystem issue (filled the ZFS pool too full and performance suffered greatly). That attempt ran from October 14th through the 24th before I ended it.

Added storage, rebalanced the ZFS pool, and decided to move to ovsqlite since others had commented it is faster to rebuild than tradindexed. That rebuild
has been running since 10/23 and appears to be about half done based on the size of the new history file.

I've been able to rsync and ZFS send/receive the entire data set to other servers four times during this period, so this slow behavior isn't a
limitation of my system or storage, jacking up the number of lines makehistory processes does not seem to make any difference for me either.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to Ray Banana on Sat Oct 28 08:32:52 2023

On Oct 27, 2023 at 10:05:42 PM CDT, "Ray Banana" <rayban@raybanana.net> wrote:

Hi,

the current flood of spam from Google Groups and the corresponding
amount of NoCeM messages made me consider an overview method that
does not rely on expireover to remove the overview data for cancelled articles. From what I have seen on a test server, ovsqlite seems to
remove overview data immediately. I have rebuilt the ovdb of a test
server with ~1,5 million articles in just a couple of minutes and am now considering a migration of my main reader server (~50 million articles)
to ovsqlite. Does anyone have experience and an estimate how long such a migration might take on ordinary SATA disks with EXT4 filesystems?

Not quite the same use case, but I'm rebuilding history and moving to ovsqlite on 1.5TB / 500 million articles on NVMe storage and it is taking over a week.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to jesse.rehmer@blueworldhosting.com on Sat Oct 28 15:55:39 2023

On Oct 28, 2023 at 3:49:58 AM CDT, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:

On Oct 28, 2023 at 3:32:52 AM CDT, "Jesse Rehmer" <jesse.rehmer@blueworldhosting.com> wrote:

On Oct 27, 2023 at 10:05:42 PM CDT, "Ray Banana" <rayban@raybanana.net> wrote:

Hi,

the current flood of spam from Google Groups and the corresponding
amount of NoCeM messages made me consider an overview method that
does not rely on expireover to remove the overview data for cancelled
articles. From what I have seen on a test server, ovsqlite seems to
remove overview data immediately. I have rebuilt the ovdb of a test
server with ~1,5 million articles in just a couple of minutes and am now >>> considering a migration of my main reader server (~50 million articles)
to ovsqlite. Does anyone have experience and an estimate how long such a >>> migration might take on ordinary SATA disks with EXT4 filesystems?

Not quite the same use case, but I'm rebuilding history and moving to ovsqlite
on 1.5TB / 500 million articles on NVMe storage and it is taking over a week.

I should add more flavor, initially I tried rebuilding history+overview as-is (tradspool + tradindexed) and ran into a filesystem issue (filled the ZFS pool
too full and performance suffered greatly). That attempt ran from October 14th
through the 24th before I ended it.

Added storage, rebalanced the ZFS pool, and decided to move to ovsqlite since others had commented it is faster to rebuild than tradindexed. That rebuild has been running since 10/23 and appears to be about half done based on the size of the new history file.

I've been able to rsync and ZFS send/receive the entire data set to other servers four times during this period, so this slow behavior isn't a limitation of my system or storage, jacking up the number of lines makehistory
processes does not seem to make any difference for me either.

If you aren't rebuilding the history file, you'll have a better experience. Spent some time watching system calls and filesystem activity, and the
majority of what makehistory is spending time on in my case is dbz related stuff. The ovsqlite-server process is barely doing anything in comparison.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Sat Oct 28 22:38:56 2023

Hi Jesse,

If you aren't rebuilding the history file, you'll have a better experience. Spent some time watching system calls and filesystem activity, and the majority of what makehistory is spending time on in my case is dbz related stuff.

Just to be sure, did you run makehistory with the -s flag to provide the estimated number of articles?

It reminds me the issue we recently discussed in this newsgroup about
makedbz. I then reworded INSTALL this way:

"""
Next, you need to create an empty history database. To do this, type:

cd <pathdb in inn.conf>
touch history
makedbz -i -o

makedbz will then create a database optimized for handling about
6,000,000 articles (or 500,000 if the slower tagged hash format is
used). If you expect to inject more articles than that, use the "-s"
flag to specify the number of entries to size the initial history file
for. To pre-size it for 100,000,000 articles, type:

makedbz -i -o -s 100000000

This initial size does not limit the number of articles the news server
will accept. It will just get slower when that size is exceeded, until
the next run of news.daily which will appropriately resize it.
"""

I'm wondering whether you're not running into that issue with
makehistory. I should also update its manual page to emphasize the use
of the "-s" flag :)

--
Julien ÉLIE

« Dans toute statistique, l'inexactitude du nombre est compensée par la
précision des décimales. » (Alfred Sauvy)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to iulius@nom-de-mon-site.com.invalid on Sat Oct 28 21:53:05 2023

On Oct 28, 2023 at 3:38:56 PM CDT, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

Hi Jesse,

If you aren't rebuilding the history file, you'll have a better experience. >> Spent some time watching system calls and filesystem activity, and the
majority of what makehistory is spending time on in my case is dbz related >> stuff.

Just to be sure, did you run makehistory with the -s flag to provide the estimated number of articles?

It reminds me the issue we recently discussed in this newsgroup about makedbz. I then reworded INSTALL this way:

"""
Next, you need to create an empty history database. To do this, type:

cd <pathdb in inn.conf>
touch history
makedbz -i -o

makedbz will then create a database optimized for handling about
6,000,000 articles (or 500,000 if the slower tagged hash format is
used). If you expect to inject more articles than that, use the "-s"
flag to specify the number of entries to size the initial history file
for. To pre-size it for 100,000,000 articles, type:

makedbz -i -o -s 100000000

This initial size does not limit the number of articles the news server
will accept. It will just get slower when that size is exceeded, until
the next run of news.daily which will appropriately resize it.
"""

I'm wondering whether you're not running into that issue with
makehistory. I should also update its manual page to emphasize the use
of the "-s" flag :)

Hi Julien,

I've had mixed results sizing the history file with makedbz or using the -s flag with makehistory. It seems no matter if appropriately sized or not, once the history file is around 10GB in size performance suffers dramatically.

In the previous case, where we were talking about sizing the history file
prior to injecting large numbers of articles to a new server, I had a similar experience. Performance started off better *initially* when sized, but after some time in the process it falls off. In that scenario I can start off injecting 10,000+ articles per second at the beginning, but it falls to around 2,000 per second once the history file reaches about 10GB or so.

I see basically the same thing with makehistory. The first run I used the -s flag and noticed about a third of the way, according to the previous history file's size, the I/O rate drops off. I can see this in my monitoring software, the metrics of I/O trend downward over the entire process.

On round two of makehistory rebuild I did not use the -s flag, I still see the same downward trend in performance, but I saw more I/O happening in the beginning than the first run before it slowed down.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From =?UTF-8?Q?Julien_=C3=89LIE?=@21:1/5 to All on Sun Oct 29 14:00:28 2023

Hi Jesse,

I've had mixed results sizing the history file with makedbz or using the -s flag with makehistory. It seems no matter if appropriately sized or not, once the history file is around 10GB in size performance suffers dramatically.

Thanks for the feedback.
I unfortunately do not have in mind any other setting to test, nor the
time to audit and try to improve the performance of dbz. I guess the
best step would be to implement a second storage method for the history
file, for instance based on SQLite, but that's quite a work too...

--
Julien ÉLIE

« Et cette même nuit, c'est-à-dire trois semaines plus tard… » (Astérix)

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Jesse Rehmer@21:1/5 to iulius@nom-de-mon-site.com.invalid on Sun Oct 29 23:03:42 2023

On Oct 29, 2023 at 8:00:28 AM CDT, "Julien ÉLIE" <iulius@nom-de-mon-site.com.invalid> wrote:

Hi Jesse,

I've had mixed results sizing the history file with makedbz or using the -s >> flag with makehistory. It seems no matter if appropriately sized or not, once
the history file is around 10GB in size performance suffers dramatically.

Thanks for the feedback.
I unfortunately do not have in mind any other setting to test, nor the
time to audit and try to improve the performance of dbz. I guess the
best step would be to implement a second storage method for the history
file, for instance based on SQLite, but that's quite a work too...

Understood. I think makehistory will be running for at least a few more days, if there is anything I can do/provide, let me know. I have a clone (or two or three) of this dataset if you ever need a large dataset to play with.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	58:42:49
Calls:	6,712
Files:	12,243
Messages:	5,355,631

Migrating tradindexed overview to ovsqlite

Who's Online

System Info