• Database speed comparison

    From kdtop@21:1/5 to All on Sun Jul 18 18:34:05 2021
    I came across a post on Hackernews (https://news.ycombinator.com/item?id=27872575) about the time needed to insert "1 billion" rows into a relational database (SQLite) -- although it seems 1 billion was too ambitious and the article actually only gives
    results for 0.1 billion (100 M) writes.

    Here is the actual article: https://avi.im/blag/2021/fast-sqlite-inserts/

    The data stored is:
    key -- integer
    s -- 6 character string
    age - integer // 5, 10, or 15
    active - integer //0 or 1

    The author used various languages with widely ranging speed results. It looks like hist best speed was achieved by preparing 50 rows and posting them all at once, achieving 100 million rows in 34 .3 seconds. I didn't scrutinize his code, so may have
    some details wrong here. His machine was: MacBook Pro, 2019 (2.4 GHz Quad Core i5, 8GB, 256GB SSD, Big Sur 11.1)

    I am wondering how fast yottadb could achieve this? Perhaps this is a fool's errand, but it interests me.

    I would think this code could be used. Since the author reported using "prepared rows", I am assuming he was not generating the random values for each row each time. That would really be only a CPU test, not a database test.

    new startH set startH=$h
    new arr
    new ct set ct=0
    ;//Set up array with 100 lines of random data
    for quit:(ct>99) do
    . set ct=ct+1
    . new st set st=""
    . new j for j=1:1:6 set st=st_$char($r(25)+65)
    . set arr(ct)=st_"^"_($r(3)*5)+5_"^"_$r(2) ;//I think this would be ~11 bytes ;//arr should be ~11 bytes x 100 =1,100 bytes
    ;
    ;//merge 1 million instances of 100 lines each
    for quit:(ct>1000000) do
    . set ct=ct+1
    . merge ^TMP(ct)=arr
    write "Time= ",startH," --> ",$h,!

    Anyone interested in trying this on a test system? I don't have one I am willing to run this in right now.

    I think the total size would be 100 million x 11 bytes = 1,100,000,000 or ~1.1 gb unless I have my math wrong.

    My gut feeling is that the limiting factor is going to the be the speed the operating system is able to put data out to the filesystem. The difference between SQLite, with all the optimizations the author could find and yottadb would come down to CPU
    cycles. And I suspect that is not the bottleneck.

    Any thoughts?

    Kevin T

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From rtweed@21:1/5 to All on Mon Jul 19 03:45:17 2021
    Please see this: https://github.com/robtweed/global_storage/blob/master/Performance.md#basic-global-set-write-performance-test

    Read the full blog for context: https://github.com/robtweed/global_storage

    Note that performance using raw M code will be even faster: 1 million key/value pair sets per second should be obtainable from YottaDB on even relatively modest hardware.

    A key conclusion of the blog article is that such performance significantly exceeds even the more well-known embedded databases that are designed and considered to be ultra-fast

    Whether anyone in the mainstream of IT knows or cares about this is more difficult to assess. I've seen passing interest so far, but that's about as far as it goes. I guess most people don't worry about the performance of the databases they use, at
    least not to the extent that they'll look beyond the usual culprits?

    Still, we can only try to wake them up.

    Rob

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From kdtop@21:1/5 to rtweed on Mon Jul 19 10:03:07 2021
    On Monday, July 19, 2021 at 6:45:18 AM UTC-4, rtweed wrote:
    Please see this: https://github.com/robtweed/global_storage/blob/master/Performance.md#basic-global-set-write-performance-test

    Read the full blog for context: https://github.com/robtweed/global_storage

    Note that performance using raw M code will be even faster: 1 million key/value pair sets per second should be obtainable from YottaDB on even relatively modest hardware.

    A key conclusion of the blog article is that such performance significantly exceeds even the more well-known embedded databases that are designed and considered to be ultra-fast

    Whether anyone in the mainstream of IT knows or cares about this is more difficult to assess. I've seen passing interest so far, but that's about as far as it goes. I guess most people don't worry about the performance of the databases they use, at
    least not to the extent that they'll look beyond the usual culprits?

    Still, we can only try to wake them up.

    Rob


    This is a valuable write-up. Thanks for making it!

    Kevin

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From K.S. Bhaskar@21:1/5 to kdtop on Thu Jul 22 14:33:31 2021
    On Monday, July 19, 2021 at 1:03:08 PM UTC-4, kdtop wrote:
    On Monday, July 19, 2021 at 6:45:18 AM UTC-4, rtweed wrote:
    Please see this: https://github.com/robtweed/global_storage/blob/master/Performance.md#basic-global-set-write-performance-test

    Read the full blog for context: https://github.com/robtweed/global_storage

    Note that performance using raw M code will be even faster: 1 million key/value pair sets per second should be obtainable from YottaDB on even relatively modest hardware.

    A key conclusion of the blog article is that such performance significantly exceeds even the more well-known embedded databases that are designed and considered to be ultra-fast

    Whether anyone in the mainstream of IT knows or cares about this is more difficult to assess. I've seen passing interest so far, but that's about as far as it goes. I guess most people don't worry about the performance of the databases they use, at
    least not to the extent that they'll look beyond the usual culprits?

    Still, we can only try to wake them up.

    Rob
    This is a valuable write-up. Thanks for making it!

    Kevin

    Since programming can be therapeutic, and I felt like therapy, I decided to play a little. See https://gitlab.com/ksbhaskar/fastinsert/-/blob/main/fastinsert.m

    Regards
    – Bhaskar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From rtweed@21:1/5 to All on Fri Jul 23 07:40:13 2021
    That's a wee bit fast, Bhaskar!! :-) Are you going to publish some of those figures anywhere? They pretty much blow any of the "mainstream" databases clean out of the water.

    Rob

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From K.S. Bhaskar@21:1/5 to rtweed on Fri Jul 23 08:28:25 2021
    On Friday, July 23, 2021 at 10:40:14 AM UTC-4, rtweed wrote:
    That's a wee bit fast, Bhaskar!! :-) Are you going to publish some of those figures anywhere? They pretty much blow any of the "mainstream" databases clean out of the water.

    Rob

    Thanks Rob. I too was pleasantly surprised by the numbers, especially on the Raspberry Pi Zero. I would like to publish benchmarks somewhere, but this is not a realistic benchmark, and one that does not lend itself to apples-to-apples comparisons. I'd
    like to find a nice key-value NoSQL benchmark, and make that run on YottaDB.

    In any case, suggestions welcome. I did Tweet them.

    Regards
    – Bhaskar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Akabouncue@21:1/5 to All on Fri Jul 23 19:19:36 2021
    Pada Jumat, 23 Juli 2021 pukul 22.28.27 UTC+7, K.S. Bhaskar menulis:
    On Friday, July 23, 2021 at 10:40:14 AM UTC-4, rtweed wrote:
    That's a wee bit fast, Bhaskar!! :-) Are you going to publish some of those figures anywhere? They pretty much blow any of the "mainstream" databases clean out of the water.

    Rob
    Thanks Rob. I too was pleasantly surprised by the numbers, especially on the Raspberry Pi Zero. I would like to publish benchmarks somewhere, but this is not a realistic benchmark, and one that does not lend itself to apples-to-apples comparisons. I'd
    like to find a nice key-value NoSQL benchmark, and make that run on YottaDB.

    In any case, suggestions welcome. I did Tweet them.

    Regards
    – Bhaskar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From kdtop@21:1/5 to solo...@gmail.com on Tue Jul 27 19:17:48 2021
    Bhaskar,

    These are very impressive numbers. I'd love to see them highlighted on Hackernews. I would post them, but I think to get traction, there would need to be a write-up. Here is another post I found of someone else jumping on the speed-test bandwagon.
    https://blog.metaobject.com/2021/07/inserting-130m-sqlite-rows-per.html

    I don't have the skill or forum to write up such an evaluation of yottadb. Any takers?

    And again, Bhaksar, thanks for working on this. I love that a Raspberry pi holds it's own in terms of speed!

    Kevin




    On Friday, July 23, 2021 at 10:19:37 PM UTC-4, solo...@gmail.com wrote:
    Pada Jumat, 23 Juli 2021 pukul 22.28.27 UTC+7, K.S. Bhaskar menulis:
    On Friday, July 23, 2021 at 10:40:14 AM UTC-4, rtweed wrote:
    That's a wee bit fast, Bhaskar!! :-) Are you going to publish some of those figures anywhere? They pretty much blow any of the "mainstream" databases clean out of the water.

    Rob
    Thanks Rob. I too was pleasantly surprised by the numbers, especially on the Raspberry Pi Zero. I would like to publish benchmarks somewhere, but this is not a realistic benchmark, and one that does not lend itself to apples-to-apples comparisons. I'
    d like to find a nice key-value NoSQL benchmark, and make that run on YottaDB.

    In any case, suggestions welcome. I did Tweet them.

    Regards
    – Bhaskar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From K.S. Bhaskar@21:1/5 to kdtop on Tue Jul 27 19:50:36 2021
    On Tuesday, July 27, 2021 at 10:17:49 PM UTC-4, kdtop wrote:
    Bhaskar,

    These are very impressive numbers. I'd love to see them highlighted on Hackernews. I would post them, but I think to get traction, there would need to be a write-up. Here is another post I found of someone else jumping on the speed-test bandwagon.
    https://blog.metaobject.com/2021/07/inserting-130m-sqlite-rows-per.html

    I don't have the skill or forum to write up such an evaluation of yottadb. Any takers?

    And again, Bhaksar, thanks for working on this. I love that a Raspberry pi holds it's own in terms of speed!

    Kevin
    On Friday, July 23, 2021 at 10:19:37 PM UTC-4, solo...@gmail.com wrote:
    Pada Jumat, 23 Juli 2021 pukul 22.28.27 UTC+7, K.S. Bhaskar menulis:
    On Friday, July 23, 2021 at 10:40:14 AM UTC-4, rtweed wrote:
    That's a wee bit fast, Bhaskar!! :-) Are you going to publish some of those figures anywhere? They pretty much blow any of the "mainstream" databases clean out of the water.

    Rob
    Thanks Rob. I too was pleasantly surprised by the numbers, especially on the Raspberry Pi Zero. I would like to publish benchmarks somewhere, but this is not a realistic benchmark, and one that does not lend itself to apples-to-apples comparisons.
    I'd like to find a nice key-value NoSQL benchmark, and make that run on YottaDB.

    In any case, suggestions welcome. I did Tweet them.

    Regards
    – Bhaskar

    Kevin –

    I'm working on a YottaDB blog post about it. And I'm hoping to actually set a billion nodes (i.e., insert a billion rows) in under one minute, albeit on an x86_64 PC. I suppose I could set a billion nodes on a Raspberry Pi Zero if I had the patience!

    Regards
    – Bhaskar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pahihu@21:1/5 to All on Wed Jul 28 03:59:38 2021
    K.S. Bhaskar ezt írta (2021. július 22., csütörtök, 23:33:33 UTC+2):
    Since programming can be therapeutic, and I felt like therapy, I decided to play a little. See https://gitlab.com/ksbhaskar/fastinsert/-/blob/main/fastinsert.m


    Hi,

    The linked M routine does not run processes in parallel:

    for i=1:1:nproc do setdata(i)

    The time reported is for the last setdata() call only, so increasing
    the number of processes decreases the reported elapsed time.

    The corrected code:

    set start=$zut,end=0
    for i=1:1:nproc do
    . set:^ctrl(i,"start")<start start=^ctrl(i,"start")
    . set:end<^ctrl(i,"end") end=^ctrl(i,"end")

    Regards,
    pahihu

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From K.S. Bhaskar@21:1/5 to pahihu on Wed Jul 28 11:47:38 2021
    On Wednesday, July 28, 2021 at 6:59:39 AM UTC-4, pahihu wrote:
    K.S. Bhaskar ezt írta (2021. július 22., csütörtök, 23:33:33 UTC+2):
    Since programming can be therapeutic, and I felt like therapy, I decided to play a little. See https://gitlab.com/ksbhaskar/fastinsert/-/blob/main/fastinsert.m

    Hi,

    The linked M routine does not run processes in parallel:

    for i=1:1:nproc do setdata(i)

    The time reported is for the last setdata() call only, so increasing
    the number of processes decreases the reported elapsed time.

    The corrected code:

    set start=$zut,end=0
    for i=1:1:nproc do
    . set:^ctrl(i,"start")<start start=^ctrl(i,"start")
    . set:end<^ctrl(i,"end") end=^ctrl(i,"end")

    Regards,
    pahihu

    Pahihu –

    Thanks for the correction. You are right. Although the “starter's pistol” of the M lock release means that all child processes start at essentially the same time, and that in typical cases, the difference is likely to be in the millisecond range.

    I will fix it in the next iteration. Thanks again.

    Regards
    – Bhaskar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From K.S. Bhaskar@21:1/5 to K.S. Bhaskar on Wed Jul 28 19:05:45 2021
    On Wednesday, July 28, 2021 at 2:47:39 PM UTC-4, K.S. Bhaskar wrote:
    On Wednesday, July 28, 2021 at 6:59:39 AM UTC-4, pahihu wrote:
    K.S. Bhaskar ezt írta (2021. július 22., csütörtök, 23:33:33 UTC+2):
    Since programming can be therapeutic, and I felt like therapy, I decided to play a little. See https://gitlab.com/ksbhaskar/fastinsert/-/blob/main/fastinsert.m

    Hi,

    The linked M routine does not run processes in parallel:

    for i=1:1:nproc do setdata(i)

    The time reported is for the last setdata() call only, so increasing
    the number of processes decreases the reported elapsed time.

    The corrected code:

    set start=$zut,end=0
    for i=1:1:nproc do
    . set:^ctrl(i,"start")<start start=^ctrl(i,"start")
    . set:end<^ctrl(i,"end") end=^ctrl(i,"end")

    Regards,
    pahihu
    Pahihu –

    Thanks for the correction. You are right. Although the “starter's pistol” of the M lock release means that all child processes start at essentially the same time, and that in typical cases, the difference is likely to be in the millisecond range.

    I will fix it in the next iteration. Thanks again.

    Regards
    – Bhaskar

    OK, I have a considerable amount of egg on my face. Not only was the program wrong, but the time I reported for a Raspberry Pi Zero W was actually on a Raspberry Pi 3. I have corrected the program and uploaded it. Here are the current numbers, which are
    still respectable, but not knock-your-socks-off numbers.

    On a Raspberry Pi Zero W (32-bit Debian Bullseye):

    $ yottadb -run fastinsert 1E6
    Set 1,000,000 nodes in 75.457866 seconds using 1 processes at 13,252 nodes/second
    $

    On a Raspberry Pi 3 (64-bit Debian Bullseye):

    $ yottadb -run fastinsert 1E7
    Set 10,000,000 nodes in 46.736418 seconds using 4 processes at 213,966 nodes/second
    $

    On the home-brew (not overclocked) AMD Ryzen-7 3700X (64-bit Ubuntu 21.04):

    $ yottadb -run fastinsert 1E8
    Set 100,000,000 nodes in 51.999496 seconds using 16 processes at 1,923,096 nodes/second
    $

    I have no excuses to offer. Thank you for keeping me honest Pahihu. Now back to wiping the egg off my face!

    Regards
    – Bhaskar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)