• load data infile somewhat flaky

    From DFS@21:1/5 to All on Tue Aug 17 13:11:14 2021
    Finally got a big file (3 columns x 612K rows) of messy data loaded.

    load data infile 'file.csv'
    into table
    fields terminated by ','
    enclosed by '"';

    I noticed it tells you an error occurred at a certain line, but the
    offending data (a text field ending in \ or "\") 5 lines later was the
    real issue.

    On one run 5 lines didn't post, and it turns out they were the next 5
    lines after a field ending in "\". Those 5 lines got concatenated with
    the offending line, so I had one large clump of data in one row.

    Moral: watch for fields ending in \ or "\".

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Johann Klammer@21:1/5 to DFS on Wed Aug 18 10:10:26 2021
    On 08/17/2021 07:11 PM, DFS wrote:
    Finally got a big file (3 columns x 612K rows) of messy data loaded.

    load data infile 'file.csv'
    into table
    fields terminated by ','
    enclosed by '"';

    I noticed it tells you an error occurred at a certain line, but the offending data (a text field ending in \ or "\") 5 lines later was the real issue.

    On one run 5 lines didn't post, and it turns out they were the next 5 lines after a field ending in "\". Those 5 lines got concatenated with the offending line, so I had one large clump of data in one row.

    Moral: watch for fields ending in \ or "\".

    I seem to recall it also has problems with whitespace around the commata.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to Johann Klammer on Wed Aug 18 09:43:20 2021
    On 8/18/2021 4:10 AM, Johann Klammer wrote:
    On 08/17/2021 07:11 PM, DFS wrote:
    Finally got a big file (3 columns x 612K rows) of messy data loaded.

    load data infile 'file.csv'
    into table
    fields terminated by ','
    enclosed by '"';

    I noticed it tells you an error occurred at a certain line, but the offending data (a text field ending in \ or "\") 5 lines later was the real issue.

    On one run 5 lines didn't post, and it turns out they were the next 5 lines after a field ending in "\". Those 5 lines got concatenated with the offending line, so I had one large clump of data in one row.

    Moral: watch for fields ending in \ or "\".

    I seem to recall it also has problems with whitespace around the commata.


    Yes, I saw that too.

    It was a fiasco. Wasted a fair amt of my time. A couple times it would
    spend 10 minutes reading a file (Stage 1 of 1...) then tell me it bombed
    on row 1 (which was valid data).
    Near the end of my db cloning process (15 tables, 30M rows total, SQLite
    to MariaDB) I wrote a little python program to copy the data.

    The python code did - no joking - nearly 2700+ inserts/second into
    MariaDB. I copied 1.5M rows (~4GB) in under 10 minutes.

    'load data infile' isn't worthless - it worked decently fast on simple
    data - but there are better ways of getting data into MariaDB.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dr Eberhard Lisse@21:1/5 to DFS on Thu Aug 19 12:33:47 2021
    I find that using tools like CSVQ helps with checking CSV data
    integrity.

    Turning off indexes and such during the load and generating them
    afterwards speeds loading of data up significantly.

    However,

    https://www.google.com/search?q=sqlite+to+mariadb

    has 1.5 Million results :-)-O

    I have used

    https://pypi.org/project/sqlite3-to-mysql/

    in the past successfully


    el

    On 18/08/2021 15:43, DFS wrote:
    On 8/18/2021 4:10 AM, Johann Klammer wrote:
    On 08/17/2021 07:11 PM, DFS wrote:
    Finally got a big file (3 columns x 612K rows) of messy data loaded.

    load data infile 'file.csv'
    into table
    fields terminated by ','
    enclosed by '"';

    I noticed it tells you an error occurred at a certain line, but the
    offending data (a text field ending in \ or "\") 5 lines later was
    the real issue.

    On one run 5 lines didn't post, and it turns out they were the next
    5 lines after a field ending in "\". Those 5 lines got concatenated
    with the offending line, so I had one large clump of data in one
    row.

    Moral: watch for fields ending in \ or "\".

    I seem to recall it also has problems with whitespace around the
    commata.


    Yes, I saw that too.

    It was a fiasco. Wasted a fair amt of my time. A couple times it
    would spend 10 minutes reading a file (Stage 1 of 1...) then tell me
    it bombed on row 1 (which was valid data). Near the end of my db
    cloning process (15 tables, 30M rows total, SQLite to MariaDB) I wrote
    a little python program to copy the data.

    The python code did - no joking - nearly 2700+ inserts/second into
    MariaDB. I copied 1.5M rows (~4GB) in under 10 minutes.

    'load data infile' isn't worthless - it worked decently fast on simple
    data - but there are better ways of getting data into MariaDB.

    --
    To email me replace 'nospam' with 'el'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to Dr Eberhard Lisse on Thu Aug 19 08:54:12 2021
    On 8/19/2021 6:33 AM, Dr Eberhard Lisse wrote:

    I find that using tools like CSVQ helps with checking CSV data
    integrity.

    The data came out of a SQLite db and was fairly clean, but encoding
    issues meant the SQLite data had 'dupes':

    this field has a question mark ? in it
    this field has a question mark � in it


    Turning off indexes and such during the load and generating them
    afterwards speeds loading of data up significantly.

    However,

         https://www.google.com/search?q=sqlite+to+mariadb

    has 1.5 Million results :-)-O


    www.google.com/search?q=mariadb+to+sqlite

    has 1.4M results

    SQLite has been very robust for this db of 30M rows. Easy to backup and administer, never crashed, fast querying... but it's very slow to alter
    big tables, and I wanted to see what all the fuss about MySQL/MariaDB
    was about.

    It (MariaDB) is a nice dbms, I must say. Especially for no cost/open
    source.

    I also looked at Oracle Express Edition, but it was limited to 10GB of
    data, and my database with indexes is at least 15GB.



    I have used

         https://pypi.org/project/sqlite3-to-mysql/

    in the past successfully


    Thanks.

    I'll look at it. I wrote my own source to dest python code, and the
    INSERT rate into MariaDB was fast, as high as 8600 per second on some
    tables.

    Overall: 30.6M rows copied in 5770 seconds = 5304 per second (on my
    11-year-old i5-750 computer). That's still 96 minutes, so I wonder if
    there's a better way.



    el

    On 18/08/2021 15:43, DFS wrote:
    On 8/18/2021 4:10 AM, Johann Klammer wrote:
    On 08/17/2021 07:11 PM, DFS wrote:
    Finally got a big file (3 columns x 612K rows) of messy data loaded.

    load data infile 'file.csv'
    into table
    fields terminated by ','
    enclosed by '"';

    I noticed it tells you an error occurred at a certain line, but the
    offending data (a text field ending in \ or "\") 5 lines later was
    the real issue.

    On one run 5 lines didn't post, and it turns out they were the next
    5 lines after a field ending in "\".  Those 5 lines got concatenated
    with the offending line, so I had one large clump of data in one
    row.

    Moral: watch for fields ending in \ or "\".

    I seem to recall it also has problems with whitespace around the
    commata.


    Yes, I saw that too.

    It was a fiasco.  Wasted a fair amt of my time.  A couple times it
    would spend 10 minutes reading a file (Stage 1 of 1...)  then tell me
    it bombed on row 1 (which was valid data).  Near the end of my db
    cloning process (15 tables, 30M rows total, SQLite to MariaDB) I wrote
    a little python program to copy the data.

    The python code did - no joking - nearly 2700+ inserts/second into
    MariaDB. I copied 1.5M rows (~4GB) in under 10 minutes.

    'load data infile' isn't worthless - it worked decently fast on simple
    data - but there are better ways of getting data into MariaDB.


    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From J.O. Aho@21:1/5 to DFS on Thu Aug 19 15:23:21 2021
    On 19/08/2021 14.54, DFS wrote:
    On 8/19/2021 6:33 AM, Dr Eberhard Lisse wrote:

    I find that using tools like CSVQ helps with checking CSV data
    integrity.

    The data came out of a SQLite db and was fairly clean, but encoding
    issues meant the SQLite data had 'dupes':

    this field has a question mark ? in it
    this field has a question mark � in it

    As long as it's the same charset used in the sqlite, it shouldn't be too
    much work to fix it, but a completely different thing is it's a mix.


    Turning off indexes and such during the load and generating them
    afterwards speeds loading of data up significantly.

    However,

          https://www.google.com/search?q=sqlite+to+mariadb

    has 1.5 Million results :-)-O


    www.google.com/search?q=mariadb+to+sqlite

    has 1.4M results

    SQLite has been very robust for this db of 30M rows.  Easy to backup and administer, never crashed, fast querying... but it's very slow to alter
    big tables, and I wanted to see what all the fuss about MySQL/MariaDB
    was about.

    It (MariaDB) is a nice dbms, I must say.  Especially for no cost/open source.

    PostgreSQL is worth to mention (I know shouldn't say it here), also open
    source and do have some nice features missing in MariaDB/MySQL.


    --

    //Aho

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Axel Schwenke@21:1/5 to DFS on Thu Aug 19 16:49:00 2021
    On 19.08.2021 14:54, DFS wrote:
    On 8/19/2021 6:33 AM, Dr Eberhard Lisse wrote:

    The data came out of a SQLite db and was fairly clean, but encoding issues meant the SQLite data had 'dupes':

    That makes no sense. If the data source has other encoding (charset) than
    the destination, it is the DBAs fault. He has to make sure that destination encoding allows all characters from the source.

    On one run 5 lines didn't post, and it turns out they were the next
    5 lines after a field ending in "\".  Those 5 lines got concatenated >>>>> with the offending line, so I had one large clump of data in one
    row.

    Invalid data in exported files is also the DBAs fault. He must use proper quoting if the data contains special characters like '\'

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From The Natural Philosopher@21:1/5 to DFS on Thu Aug 19 17:00:51 2021
    On 19/08/2021 13:54, DFS wrote:
    and my database with indexes is at least 15GB.
    ...indices.... ^^^^^^^

    --
    Outside of a dog, a book is a man's best friend. Inside of a dog it's
    too dark to read.

    Groucho Marx

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DFS@21:1/5 to The Natural Philosopher on Thu Aug 19 12:31:53 2021
    On 8/19/2021 12:00 PM, The Natural Philosopher wrote:
    On 19/08/2021 13:54, DFS wrote:
    and my database with indexes is at least 15GB.
    ...indices....         ^^^^^^^


    or indexes.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jerry Stuckle@21:1/5 to The Natural Philosopher on Fri Aug 20 13:58:51 2021
    On 8/19/2021 12:00 PM, The Natural Philosopher wrote:
    On 19/08/2021 13:54, DFS wrote:
    and my database with indexes is at least 15GB.
    ...indices....         ^^^^^^^


    Both are acceptable. https://www.dictionary.com/browse/index

    --
    ==================
    Remove the "x" from my email address
    Jerry Stuckle
    jstucklex@attglobal.net
    ==================

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)