Showing entries 31 to 40 of 102
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: partitioning (reset)
Benchmarking single-row insert performance on Amazon EC2

I have been working for a customer benchmarking insert performance on Amazon EC2, and I have some interesting results that I wanted to share. I used a nice and effective tool iiBench which has been developed by Tokutek. Though the “1 billion row insert challenge” for which this tool was originally built is long over, but still the tool serves well for benchmark purposes.

OK, let’s start off with the configuration details.

Configuration

First of all let me describe the EC2 instance type that I used.

EC2 Configuration

I chose m2.4xlarge instance as that’s the instance type with highest memory available, and memory is what really really matters.

High-Memory Quadruple Extra Large Instance
68.4 GB of memory
26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute …
[Read more]
MySQL Partitioning and its Confusing Syntax

While looking at partitioning I recently made a mistake which I guess can happen to others. Often this is due to not fully reading the documentation or scanning it too quickly and misunderstanding what’s being said.

So this post is to complain about the MySQL partitioning syntax and to warn others who may easily make the same mistake without realising.

First we probably need to ask why we are partitioning a table in the first place. The main reasons for this are I think:

  • to improve query performance
  • to reduce individual .ibd file sizes for large tables (if using innodb_file_per_table)

In my case I wanted to do both. I had a several tables which store a large number of rows (batches of data) based on an incremental batch number. One of these tables was around 40 GB and had about 500,000,000 rows in it.  When processing data in this table often all the data from a particular batch …

[Read more]
Proper handling of insert-mostly, select-recently datasets

Some kinds of large tables such as chat messages, blog entries, etc have the following characteristics.

* huge number of records, huge data and index size
* insert and select mostly
* select from only recent data
* select by secondary index (i.e. user_id)
* secondary index entries are inserted in random order

What are optimal ways to handle these tables? The below single large table does not perform well.

CREATE TABLE message (
message_id BINGINT UNSIGNED PRIMARY KEY,
user_id INT UNSIGNED,
body VARCHAR(255),
...
created DATETIME,
INDEX(user_id)
) ENGINE=InnoDB;


The cause of poor performance is a secondary index on user_id. user_id is inserted in random order. Index size grows, and sooner or later it will exceed RAM size. Once index size on user_id exceeds RAM size, inserting into message table causes massive random read disk i/o, …

[Read more]
Partitioning the WordPress Comments Table

WordPress sites can get big. Really big. When you’re looking at a site of Cheezburger, Engadget or Techcrunch proportions, you get hundreds of comments per post, on dozens of posts per day, which adds up to millions of comments per year.

In order to keep your site running in top condition, you don’t want to be running queries against tables with lots of rarely accessed rows, which is what happens with most comments – after the post drops off the front page, readership drops, so the comments are viewed much less frequently. So, what we want to do is remove these old comments from the primary comment table, but keep them handy, for when people read the archives.

Enter partitioning.

The idea of MySQL partitioning is that it splits tables up into multiple logical tablespaces, based on your criteria. Running a query on a single partition of a large table is much faster than running it across the entire table, even with …

[Read more]
Replication and sandbox talks on the road - San Francisco, Santa Clara, Orlando

In a few weeks, I will be on the road, for an intense set of presentations in the USA.

San Francisco, April 7thI will start the tour at the San Francisco MySQL User Group. On April 7, at 6pm I will talk about Advanced MySQL replication for the masses. This talk will explore topics such as bi-directional replication, multiple sources, parallel replication, seamless failover, with the help of Tungsten replicator.




Santa Clara, April 11, 12The MySQL conference starts on Monday, April 11th, with several three-hour tutorials.

[Read more]
MySQL Partitioning: A Flow Chart

In Part 1, and Part 2 of this series, I presented some thoughts on partitioning. I heard some great feedback on why people use partitioning. Here, I present a flow chart that summarizes what I’ve learned. In summary: with TokuDB in the picture there’s almost no reason to use partitioning. Or I should say, there are almost always better (higher performing, more robust, lower maintenance) alternatives to partitioning.

Here goes:

  1. Spindle contention? In other words, are you partitioning in order to spread your query work load across many disks? I’ve yet to see a compelling technical case that RAIDing your disks doesn’t do this as well, with much less setup and maintenance.
[Read more]
Partitioning, Free Lunches, & Indexing, Part 2

Review

In part one, I presented a very brief and particular view of partitioning. I covered what partitioning is, with hardly a mention of why one would use partitioning. In this post, I’ll talk about a few use cases often cited as justification for using partitions.

Lots of disks → Lots of partitioning of tables

One use case for justifying partitions is that each partition can be placed on a separate disk to avoid spindle contention. I have to say that on this one, I agree with Kevin Burton, who makes the point that if you want to distribute I/O load across several disks, you can use a RAID configuration on the disks. In this case, he says that partitioning is not worth the trouble. [NB. He makes the point that this …

[Read more]
Data Warehousing Best Practices: Comparing Oracle to MySQL pt 2

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

See part 1 for the introduction and talking about power and hardware. This part will go over the 2nd “P”, partitioning. Learning about Oracle’s partitioning has gotten me more interested in how MySQL’s partitioning works, and I do hope that MySQL partitioning will develop to the level that Oracle partitioning does, because Oracle’s partitioning looks very nice (then again, that’s why it costs so much I guess).

Partition – …

[Read more]
Using MySQL Partitioning Instead of MERGE Tables

One common question I get is how to use partitioning instead of MERGE tables. The process I use involves using stored procedures to create and drop partitions. This article will go over the stored procedures I use; special thanks to Roland Bouman for taking a look and giving great feedback to optimize this process.

First, a simple table, not partitioned (yet):

use test;
DROP TABLE IF EXISTS my_part;
CREATE TABLE IF NOT EXISTS my_part (
  id int NOT NULL,
  creationDate datetime NOT NULL,
  PRIMARY KEY (id,creationDate)
) ENGINE=InnoDB;

In real, life there is more to the table than just id and creationDate. The most important part is that the partitioned field(s) need to be part of the primary key.

Now, add the partition definition. This can be done in the CREATE statement, but I have found that it is easier for me to think …

[Read more]
Welcome googleCL
I am writing this blog post with Vim, my favorite editor, instead of using the online editor offered by blogger. And I am uploading this post to my Blogger account using Google CL a tool that lets you use Google services from the command line.
I am a command line geek, and as soon as I saw the announcement, I installed it in my laptop. The mere fact that you are reading this blog post shows that it works.


GoogleCL is an apparently simple application. If you install it on Mac using macports you realize how many dependencies it has and how much complexity it gives under the …

[Read more]
Showing entries 31 to 40 of 102
« 10 Newer Entries | 10 Older Entries »