Showing entries 1 to 10 of 11
1 Older Entries »
Displaying posts with tag: insert (reset)
Fast data import trick

A few weeks ago my friend Frank de Jonge told me he managed to improve an import into a MySQL server down from more than 10 hours to 16 minutes. According to him it had to do with several field types (too long fields to really small data), the amount of indexes, and constraints on the tables. We were talking about 1 million records here. He wondered if it was possible to make it even faster.

The basics

Turns out there are many ways of importing data into a database, it all depends where are you getting the data from and where you want to put it. Let me give you a bit more context: you may want to get data from a legacy application that exports into CSV to your database server or even data from different servers.

If you are pulling data from a MySQL table into another MySQL table (lets assume they are into different servers) you might as well use …

[Read more]
Speedup mysql inserts

By default, when connected to MySQL server, every command is executed as a distinct transaction. As a result, after each INSERT SQL statement MYSQL saves data to the database and commits transaction. When your application inserts hundreds and thousands of records, your database is busy in saving new records all the time and you are loosing in speed.

The solution for this problem is very easy.

How to simulate MySQL’s INSERT statement extensions

I have previously posted about the SQL MERGE statement, and how powerful it is here: https://blog.jooq.org/arcane-magic-with-the-sql2003-merge-statement/ Unfortunately, not all databases support this statement. Also, very often it is quite a bit of overkill, when what you want to do is to simply INSERT or UPDATE a single record, depending on whether it already exists. MySQL's … Continue reading How to simulate MySQL’s INSERT statement extensions →

1 Billion Insertions – The Wait is Over!

iiBench measures the rate at which a database can insert new rows while maintaining several secondary indexes. We ran this for 1 billion rows with TokuDB and InnoDB starting last week, right after we launched TokuDB v5.2. While TokuDB completed it in 15 hours, InnoDB took 7 days.

The results are shown below. At the end of the test, TokuDB’s insertion rate remained at 17,028 inserts/second whereas InnoDB had dropped to 1,050 inserts/second. That is a difference of over 16x. Our complete set of benchmarks for TokuDB v5.2 can be found here.

[Read more]
Database Insights from Archimedes to the Houston Rockets

Archimedes, the first DBA

According to a recent MIT Sloan Management Review study, top performing organizations use analytics 5 times more than lower performers. That’s pretty astounding. And while we all know about the ocean/lake/waves/(your favorite water analogy) of Big Data we struggle with everyday, information is not knowledge. So how can we get insight from data? Recent articles from O’Reilly and HBR offered some …

[Read more]
MySQL Partitioning: A Flow Chart

In Part 1, and Part 2 of this series, I presented some thoughts on partitioning. I heard some great feedback on why people use partitioning. Here, I present a flow chart that summarizes what I’ve learned. In summary: with TokuDB in the picture there’s almost no reason to use partitioning. Or I should say, there are almost always better (higher performing, more robust, lower maintenance) alternatives to partitioning.

Here goes:

  1. Spindle contention? In other words, are you partitioning in order to spread your query work load across many disks? I’ve yet to see a compelling technical case that RAIDing your disks doesn’t do this as well, with much less setup and maintenance.
[Read more]
Improving MySQL Insert thoughput

There are three simple practices that can improve general INSERT throughput. Each requires consideration on how the data is collected and what is acceptable data loss in a disaster.

General inserting of rows can be performed as single INSERT’s for example.

INSERT INTO table (col1, col2, col3) VALUES (?, ?, ?);
INSERT INTO table (col1, col2, col3) VALUES (?, ?, ?);
INSERT INTO table (col1, col2, col3) VALUES (?, ?, ?);

While this works, there are two scalability limitations. First is the network overhead of the back and forth of each SQL statement, the second is the synchronous nature, that is your code can not continue until your INSERT is successfully completed.

The first improvement is to use MySQL’s multi values capability with INSERT. That is you can insert multiple rows with a single INSERT statement. For example:

INSERT INTO table (col1, col2, col3) VALUES (?, ?, ?),  (?, ?, ?), (?, ?, ?);
[Read more]
Why “insert … on duplicate key update” May Be Slow, by Incurring Disk Seeks

In my post on June 18th, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. I previously explained why it would be better to use “replace into” or to use “insert ignore” over normal inserts. In this post, I explain why another alternative to normal inserts, “insert … on duplicate key update” is no better in MySQL, because the command incurs disk seeks.

The reason “insert ignore” and “replace into” can be made fast with TokuDB’s fractal trees is that the semantics of what to do in case a duplicate key is found is simple. …

[Read more]
Making “Insert Ignore” Fast, by Avoiding Disk Seeks

In my post from three weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. Towards the end of the post, I claimed that it would be better to use “replace into” or “insert ignore” over normal inserts, because the semantics of these statements do NOT require disk seeks. In my post last week, I explained how the command “replace into” can be fast with TokuDB’s fractal trees. Today, I explain how “insert ignore” can be fast, using a strategy that is very similar to what we do with “replace into”.

The semantics …

[Read more]
Making “Replace Into” Fast, by Avoiding Disk Seeks

In this post two weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. Towards the end of the post, I claimed that it would be better to use “replace into” or “insert ignore” over normal inserts, because the semantics of these statements do NOT require disk seeks. In this post, I explain how the command “replace into” can be fast with fractal trees.

The semantics of “replace into” are as follows:


  • if the primary (or unique) key does not exist, insert the new row
  • if the primary (or unique) key does exist, overwrite the existing row with the new row

The slow, expensive way B-trees use to implement these semantics are:

[Read more]
Showing entries 1 to 10 of 11
1 Older Entries »