Join 6100 others and follow Sean Hull on twitter @hullsean. Lots and lots of web applications need to page through information. From customer records, to the albums in your itunes collection. So as web developers and architects, it’s important that we do all this efficiently. Start by looking at how you’re fetching information from your [...]
Thanks to Tim Callaghan for speaking Tuesday night at the Effective MySQL New York meetup on Fractal Tree Indexes : Theory and Practice (MySQL and MongoDB). There was a good turnout and a full room to learn how the TokuDB storage engine from Tokutek is changing how to handle big data in MySQL.
Also interesting is how the same technology has been applied for use in MongoDB including giving MongoDB transactions; a big change for NoSQL.
Related News: …[Read more]
Next week I’ll be visiting Moscow to talk at Highload++. The conference will take place during Monday 22nd and Tuesday 23rd at the Radisson hotel. I will be giving my personal version of an indexing talk that my colleagues have given in meetups and conferences in the US.
Highload++ conference is targeted to address the issues of complex high traffic web properties. Most of these sites depend on databases to deliver their content, record the traffic and report the application activities in real time. As I learned early in my career at MySQL, the database schema and in particular the indexing strategy, are critical to achieve the highest possible performance out of the database. I’ll be reviewing the basic strategies to define the right indexes. I will also cover TokuDB’s Fractal Tree® and Cluster …[Read more]
In my three previous blogs I wrote about our implementation of Fractal Tree Indexes on MongoDB, showing a 10x insertion performance increase, a 268x query performance increase, and a comparison of covered indexes and clustered indexes. The benchmarks show the difference that rich and efficient indexing can make to your MongoDB workload.
It’s one thing for us to benchmark MongoDB + TokuDB and another to measure real world performance. If you are looking for a way to improve the performance or scalability of your MongoDB deployment, we can help …[Read more]
In my two previous blogs I wrote about our implementation of Fractal Tree Indexes on MongoDB, showing a 10x insertion performance increase and a 268x query performance increase. MongoDB’s covered indexes can provide some performance benefits over a regular MongoDB index, as they reduce the amount of IO required to satisfy certain queries. In essence, when all of the fields you are requesting are present in the index key, then MongoDB does not have to go back to the main storage heap to retrieve anything. My benchmark results are …[Read more]
Modern file systems are well equipped to deal with large writes. One area that remains challenging however is to efficiently write out “microdata”, such as metadata and small portions of large files, while showing good I/O utilization when the data is read back. This challenge is evident with mount options like “noatime” which disables updating file access time on reads. This kind of solution avoids the problem altogether. Another approach, delayed allocation, is meant to coalesce small writes in memory as long as possible before writing it out to disk. Filesystems like ext4 and Btrfs use delayed allocation to make a best-effort at reducing fragmentation and random I/O.
Isn’t there a way to fundamentally solve filesystem fragmentation and random I/O?
This week, I’ll be speaking at HotStorage 2012 in Boston. My talk …[Read more]
In the article about the role of a primary key, I mentioned that a secondary index in an InnoDB table consists not only of the values of its member columns, but also values of the table’s primary key are concatenated to the index. I.e. the primary key contents is part of every other index.
Assuming the following table structure:
CREATE TABLE `bets` ( `id` int(10) unsigned NOT NULL, `user_id` int(10) unsigned NOT NULL, `game_id` int(10) unsigned NOT NULL, ... PRIMARY KEY (`id`), KEY `user_id` (`user_id`) ) ENGINE=InnoDB
Here is the visualization:
If MySQL could use in queries these implicitly added values, it would maybe allow to save some space on listing the primary key columns at the end of an index explicitly. Let’s check various cases.
Row filtering…[Read more]
MySQL performance is largely defined by keys and how efficiently queries can use them. As you scale, at certain point it isn’t enough anymore to just have any indexes and still get a good performance in return. You have to really figure them out and allow your queries to do less work, as little work as possible.
The approach presented in this article can sometimes help designing such good, efficient indexes. As a consultant, I have to rely on it myself from time to time, having to optimize a query that works in a database I know nothing about.
Let’s assume there is an application, which collects user activity in various places. The application uses a poorly indexed database, so there are plenty of examples to choose from. Our example query performs a full table scan, which means it reads all rows from the table it uses. It is also among the most popular statements executed by application.
mysql> EXPLAIN …[Read more]
Indexes are tricky things. In my experience, indexes are added
whenever SQL queries are too slow. This makes sense. However,
sometimes these indexes were added without being thought through
enough (I am guilty of this). Sometimes they were thought
through, but the table itself now has so many indexes that the
optimizer doesn't know which one to choose from.
The overhead of indexes greatly effects tables that need to be written to or altered often (even indexes on NoSQL database greatly effect writes).
Over indexed tables inflate the database size dramatically which adds to people's concerns that the database becomes unwieldy.
Read more »
How important a primary key design can be for MySQL performance? The answer is: Extremely! If tables use InnoDB storage engine, that is.
It all begins with the specific way InnoDB organizes data internally. There are two major pieces of information that anyone should know:
- It physically stores rows together with and in the order of primary key values. It means that a primary key does not only uniquely identify a row, it is also part of it. Or perhaps rather, a physical row is part of table’s primary key.
- A secondary index entry does not point to the actual row position, which is how it works in MyISAM. Instead, every single index entry is concatenated with a value of the corresponding primary key. When a query reads a row through a secondary index, this added value is used in additional implicit lookup by the primary key, to locate the actual row.
What could be a “rule of the thumb” for …[Read more]