Finally got a big file (3 columns x 612K rows) of messy data loaded.
load data infile 'file.csv'
into table
fields terminated by ','
enclosed by '"';
I noticed it tells you an error occurred at a certain line, but the offending data (a text field ending in \ or "\") 5 lines later was the real issue.
On one run 5 lines didn't post, and it turns out they were the next 5 lines after a field ending in "\". Those 5 lines got concatenated with the offending line, so I had one large clump of data in one row.
Moral: watch for fields ending in \ or "\".
On 08/17/2021 07:11 PM, DFS wrote:
Finally got a big file (3 columns x 612K rows) of messy data loaded.I seem to recall it also has problems with whitespace around the commata.
load data infile 'file.csv'
into table
fields terminated by ','
enclosed by '"';
I noticed it tells you an error occurred at a certain line, but the offending data (a text field ending in \ or "\") 5 lines later was the real issue.
On one run 5 lines didn't post, and it turns out they were the next 5 lines after a field ending in "\". Those 5 lines got concatenated with the offending line, so I had one large clump of data in one row.
Moral: watch for fields ending in \ or "\".
On 8/18/2021 4:10 AM, Johann Klammer wrote:
On 08/17/2021 07:11 PM, DFS wrote:
Finally got a big file (3 columns x 612K rows) of messy data loaded.I seem to recall it also has problems with whitespace around the
load data infile 'file.csv'
into table
fields terminated by ','
enclosed by '"';
I noticed it tells you an error occurred at a certain line, but the
offending data (a text field ending in \ or "\") 5 lines later was
the real issue.
On one run 5 lines didn't post, and it turns out they were the next
5 lines after a field ending in "\". Those 5 lines got concatenated
with the offending line, so I had one large clump of data in one
row.
Moral: watch for fields ending in \ or "\".
commata.
Yes, I saw that too.
It was a fiasco. Wasted a fair amt of my time. A couple times it
would spend 10 minutes reading a file (Stage 1 of 1...) then tell me
it bombed on row 1 (which was valid data). Near the end of my db
cloning process (15 tables, 30M rows total, SQLite to MariaDB) I wrote
a little python program to copy the data.
The python code did - no joking - nearly 2700+ inserts/second into
MariaDB. I copied 1.5M rows (~4GB) in under 10 minutes.
'load data infile' isn't worthless - it worked decently fast on simple
data - but there are better ways of getting data into MariaDB.
I find that using tools like CSVQ helps with checking CSV data
integrity.
Turning off indexes and such during the load and generating them
afterwards speeds loading of data up significantly.
However,
https://www.google.com/search?q=sqlite+to+mariadb
has 1.5 Million results :-)-O
I have used
https://pypi.org/project/sqlite3-to-mysql/
in the past successfully
el
On 18/08/2021 15:43, DFS wrote:
On 8/18/2021 4:10 AM, Johann Klammer wrote:
On 08/17/2021 07:11 PM, DFS wrote:
Finally got a big file (3 columns x 612K rows) of messy data loaded.I seem to recall it also has problems with whitespace around the
load data infile 'file.csv'
into table
fields terminated by ','
enclosed by '"';
I noticed it tells you an error occurred at a certain line, but the
offending data (a text field ending in \ or "\") 5 lines later was
the real issue.
On one run 5 lines didn't post, and it turns out they were the next
5 lines after a field ending in "\". Those 5 lines got concatenated
with the offending line, so I had one large clump of data in one
row.
Moral: watch for fields ending in \ or "\".
commata.
Yes, I saw that too.
It was a fiasco. Wasted a fair amt of my time. A couple times it
would spend 10 minutes reading a file (Stage 1 of 1...) then tell me
it bombed on row 1 (which was valid data). Near the end of my db
cloning process (15 tables, 30M rows total, SQLite to MariaDB) I wrote
a little python program to copy the data.
The python code did - no joking - nearly 2700+ inserts/second into
MariaDB. I copied 1.5M rows (~4GB) in under 10 minutes.
'load data infile' isn't worthless - it worked decently fast on simple
data - but there are better ways of getting data into MariaDB.
On 8/19/2021 6:33 AM, Dr Eberhard Lisse wrote:
I find that using tools like CSVQ helps with checking CSV data
integrity.
The data came out of a SQLite db and was fairly clean, but encoding
issues meant the SQLite data had 'dupes':
this field has a question mark ? in it
this field has a question mark � in it
Turning off indexes and such during the load and generating them
afterwards speeds loading of data up significantly.
However,
https://www.google.com/search?q=sqlite+to+mariadb
has 1.5 Million results :-)-O
www.google.com/search?q=mariadb+to+sqlite
has 1.4M results
SQLite has been very robust for this db of 30M rows. Easy to backup and administer, never crashed, fast querying... but it's very slow to alter
big tables, and I wanted to see what all the fuss about MySQL/MariaDB
was about.
It (MariaDB) is a nice dbms, I must say. Especially for no cost/open source.
On 8/19/2021 6:33 AM, Dr Eberhard Lisse wrote:
The data came out of a SQLite db and was fairly clean, but encoding issues meant the SQLite data had 'dupes':
On one run 5 lines didn't post, and it turns out they were the next
5 lines after a field ending in "\". Those 5 lines got concatenated >>>>> with the offending line, so I had one large clump of data in one
row.
and my database with indexes is at least 15GB....indices.... ^^^^^^^
On 19/08/2021 13:54, DFS wrote:
and my database with indexes is at least 15GB....indices.... ^^^^^^^
On 19/08/2021 13:54, DFS wrote:
and my database with indexes is at least 15GB....indices.... ^^^^^^^
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (2 / 14) |
Uptime: | 31:00:34 |
Calls: | 6,648 |
Calls today: | 3 |
Files: | 12,193 |
Messages: | 5,328,481 |