The ability to import data in parallel is now a feature of
the new MySQL Shell. And you can run it from a command line
instead of from within an interactive shell.
How Does It Work?
The importTable utility, new on MySQL Shell 8.0.17,
analyzes the input file, divides that file into chunks, and then
uploads that data using parallel connections. This is much
faster than the LOAD DATA statement. You get the ability to
define how the data is delimited as far a field and lines are
defined. And it works on DOS CSVs, Unix CSVs, TSVs, and
JSON if that JSON is in one document per line mode. You also
can adjust the number of threads, number of bytes sent per each
chunk, and the maximum rate of data transfer per thread so you
can balance the load on the network and the speed of data …
The Question Recently, a customer asked us:
After importing a new section of user data into our Tungsten
cluster, we are seeing perpetually rising replication lag. We are
sitting at 8.5hrs estimated convergence time after importing
around 50 million rows and this lag is climbing continuously. We
are currently migrating some of our users from a NoSQL database
into our Tungsten cluster. We have a procedure to write out a
bunch of CSV files after translating our old data into columns
and then we recursively send them to the write master using the
mysql client. Specifically our import SQL is doing LOAD
DATA LOCAL INFILE
and the reading in a large CSV file to
do the import. We have 20k records per CSV file and we have 12
workers which insert them in parallel.
Simple Overview The Skinny
In cases like this, the slaves are having trouble with the database unable to keep up with the apply stage …
[Read more]
Peter Zaitsev of Percona posted a Tweet about an interactive
book Select Star SQL which 'aims to be the best place
on the internet for learning SQL'. This interactive book
uses a data set that can be downloaded or you could work the
queries online. I showed the website to a neighbor who was
asking 'what is the best way to learn SQL'. The
neighbor wanted their own copy of the data and asked how do you
turn a raw CSV file into a table.
Well, that can get complicated.
CSV
CSV files use commas to designate each field in the file.
This is a great idea until someone plants a comma in the middle
of a field but it is sort of a least common denominator way of
passing data between systems.
But a CSV file does not directly fit into a relational
table. You could …
create table csv_date (date varchar(20)); insert into csv_date values ('2030-05-18 14:57:19'); insert into csv_date values ('2030-05-18 14:55:15'); insert into csv_date values ('2019-05-18 04:15:15'); insert into csv_date values ('2018-05-18 02:11:53'); insert into csv_date values ('2017-05-18 22:14:24'); Convert two digit to four digit in MySQL: Create a temporary table same as the original table but make the column …
The post How To Convert MySQL Two Digit Year To Four Digit Year appeared first on SQLgossip.
I’ve been lately compiling and generating .deb packages for several MySQL and MariaDB recent versions, and I wanted to try them more in depth -specially MySQL 8.0 and MariaDB 10.2, both of which are still in development.
Several people have already given their first impressions (or will do soon), and testing early is the best way to catch bugs and regressions, and get them fixed before the official release. In fact, as I will comment later, I ran into breaking bugs on both MySQL 8.0 and MariaDB 10.2, which I …
[Read more]Making The Numbers Table Useful
It’s not easy to find a solution to a very simple problem in MySQL: converting a comma separated list of values into rows. Oracle database users find gobs of tutorials on using REGEXP and CONNECT BY LEVEL to make this happen. MySQL doesn’t have that. So, use the numbers table from the previous post!
The transposing is made possible by (ab)using the SUBSTRING_INDEX function. I love this function. It is right up there with GROUP_CONCAT when mixing NULL and non-null strings.
I will be using comma separated values. You can use any delimiter character you like.
Bonus: This works with empty strings, strings with one value only, and empty delimiters (i.e. “my value,,previous is empty”). No extra code needed.
Making Magic Happen
SET @mycsv = …[Read more]
As I mentioned on my last post, where I compared the default configurations options in 5.6 and 5.7, I have been doing some testing for a particular load in several versions of MySQL. What I have been checking is different ways to load a CSV file (the same file I used for testing the compression tools) into MySQL. For those seasoned MySQL DBAs and programmers, you probably know the answer, so you can jump over to my 5.6 versus 5.7 results. However, the first part of this post is dedicated for developers and MySQL beginners that want to know the answer to the title question, in a step-by-step fashion. I must say I also learned something, as I under- and over-estimated some of the effects of certain …
[Read more]On my post last week, I analysed some of the most common compression tools and formats, and its compression speed and ratio. While that could give us a good idea of the performance of those tools, the analysis would be incomplete without researching the decompression. This is particularly true for database backups as, for those cases where the compression process is performed outside of the production boxes, you may not care too much about compression times. In that case, even if it is relatively slow, it will not affect the performance of your MySQL server (or whatever you are using). The decompression time, however, can be critical, as it may influence in many cases the MTTR of your whole system.
Testing …
[Read more]There was once a big hooplah about the MySQL Storage Engine Architecture and how it was easy to just slot in some other method of storage instead of the provided ones. Over the years I’ve repeatedly mentioned how this wasn’t really …
[Read more]If you are importing large CSV or SQL dumps to MySQL, chances are you were looking for ways to see how far the import has gone. If you know how many rows there are from the file being imported, you can do a SELECT COUNT(*) but that would take sometime for the query to finish especially on really big imports.
Using lsof, you can monitor the current file offset to which a process is reading from using the -o option. Knowing the size of the file and some snapshots of the offset, you can get a somewhat rough idea of how fast the import goes. Note though that this is only file-read-pace not actual import speed as MySQL import can vary depending on a number of conditions i.e. table growth, secondary indexes, etc.
Let’s say I am importing a 1.1G CSV file into a table.
[revin@forge msb_5_5_300]$ ls -al /wok/dta/samples/ft_history.csv -rw-rw-r--. 1 revin revin 1075456654 Nov 8 23:25 /wok/dta/samples/ft_history.csv …[Read more]