Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Previous 30 Newer Entries Showing entries 31 to 58

Displaying posts with tag: indexing (reset)

Thoughts and ideas for Online Schema Change
+6 Vote Up -0Vote Down

Here’s a few thoughts on current status and further possibilities for Facebook’s Online Schema Change (OSC) tool. I’ve had these thoughts for months now, pondering over improving oak-online-alter-table but haven’t got around to implement them nor even write them down. Better late than never.

The tool has some limitations. Some cannot be lifted, some could. Quoting from the announcement and looking at the code, I add a few comments. I conclude with a general opinion on the tool’s abilities.

“The original table must have PK. Otherwise an error is returned.”

  [Read more...]
How often should you use OPTIMIZE TABLE? – followup
+3 Vote Up -2Vote Down

This post follows up on Baron’s How often should you use OPTIMIZE TABLE?. I had the opportunity of doing some massive purging of data from large tables, and was interested to see the impact of the OPTIMIZE operation on table’s indexes. I worked on some production data I was authorized to provide as example.

The use case

I’ll present a single use case here. The table at hand is a compressed InnoDB table used for logs. I’ve rewritten some column names for privacy:

mysql> show create table logs \G

Create Table: CREATE TABLE `logs` (
 `name` varchar(20) CHARACTER SET ascii COLLATE ascii_bin NOT NULL,
  [Read more...]
A review of Relational Database Design and the Optimizers by Lahdenmaki and Leach
+2 Vote Up -0Vote Down

Relational Database Index Design and the Optimizers

Relational Database Index Design and the Optimizers. By Tapio Lahdenmaki and Mike Leach, Wiley 2005. (Here’s a link to the publisher’s site).

I picked this book up on the advice of an Oracle expert, and after one of my colleagues had read it and mentioned it to me. The focus is on how to design indexes that will produce the best performance for various types of queries. It goes into quite a bit of detail on how databases execute

  [Read more...]
Databases: Normalization or Denormalization. Which is the better technique?
+1 Vote Up -0Vote Down
This has really been a long debate as to which approach is more performance orientated, normalized databases or denormalized databases. So this article is a step on my part to figure out the right strategy, because neither one of these approaches can be rejected outright. I will start of by discussing the pros and cons of both the approaches. Pros and Cons of a Normalized database design. Normalized databases fair very well under conditions where the applications are write-intensive and the write-load is more than the read-load. This is because of the following reasons: Normalized tables are usually smaller and...
Table refactoring & application version upgrades, Part II
+0 Vote Up -1Vote Down

Continuing Table refactoring & application version upgrades, Part I, we now discuss code & database upgrades which require DROP operations. As before, we break apart the upgrade process into sequential steps, each involving either the application or the database, but not both.

As I’ll show, DROP operations are significantly simpler than creation operations. Interestingly, it’s the same as in life.


A column turns to be redundant, unused. Before it is dropped from the database, we must ensure no one is using it anymore. The steps are:

  • App: V1 -> V2. Remove all references to column; make sure no queries use said column.
  • DB:
  •   [Read more...]
    Table refactoring & application version upgrades, Part I
    +2 Vote Up -0Vote Down

    A developer’s major concern is: How do I do application & database upgrades with minimal downtime? How do I synchronize between a DB’s version upgrade and an application’s version upgrade?

    I will break down the discussion into types of database refactoring operations, and I will limit to single table refactoring. The discussion will try to understand the need for refactoring and will dictate the steps towards a successful upgrade.

    Reader prerequisites

    I will assume MySQL to be the underlying database. To take a major component out of the equation: we may need to deal with very large tables, for which an ALTER command may take long hours. I will assume familiarity with Master-Master (Active-Passive) replication, with possible use of MMM for MySQL.

      [Read more...]
    SQL: forcing single row tables integrity
    +2 Vote Up -0Vote Down

    Single row tables are used in various cases. Such tables can be used for “preferences” or “settings”; for managing counters (e.g. summary tables), for general-purpose administration tasks (e.g. heartbeat table) etc.

    The problem with single row tables is that, well, they must have s single row. And the question is: how can you force them to have just one row?

    The half-baked solution

    The common solution is to create a PRIMARY KEY and always use the same value for that key. In addition, using REPLACE or INSERT INTO ON DUPLICATE KEY UPDATE helps out in updating the row. For example:

    CREATE TABLE heartbeat (
     ts datetime NOT NULL

    The above table definition is taken from

      [Read more...]
    Reducing locks by narrowing primary key
    +2 Vote Up -0Vote Down

    In a period of two weeks, I had two cases with the exact same symptoms.

    Database users were experiencing low responsiveness. DBAs were seeing locks occurring on seemingly normal tables. In particular, looking at Innotop, it seemed that INSERTs were causing the locks.

    In both cases, tables were InnoDB. In both cases, there was a PRIMARY KEY on the combination of all 5 columns. And in both cases, there was no clear explanation as for why the PRIMARY KEY was chosen as such.

    Choosing a proper PRIMARY KEY

    Especially with InnoDB, which uses clustered index structure, the PRIMARY KEY is of particular importance. Besides the fact that a bloated PRIMARY KEY bloats the entire clustered index and secondary keys (see:

      [Read more...]
    Monotonic functions, SQL and MySQL
    +2 Vote Up -0Vote Down

    In mathematics, a monotonic function (or monotone function) is a function which preserves the given order. [Wikipedia]

    To be more precise, a function f is monotonic increasing, if for every x ≤ y it holds that f(x) ≤ f(y). f is said to be strictly monotonic increasing is for every x < y it holds that f(x) < f(y).

    So, if we follow values in some order, we say that f is monotonic increasing if f’s value never decreases (it either increases or stays the same), and we say that f is strictly increasing if f’s value is always changes “upwards”.

    Monotonic functions play an important role in SQL. To discuss monotonic functions

      [Read more...]
    Beware of implicit casting
    +1 Vote Up -0Vote Down

    Ever so often a query provides a “bad” execution plan. Adding a missing index can many times solve the problem. However, not everything can be solved with an index. I wish to highlight the point of having an implicit cast, which negates the use of an index on MySQL.

    I see this happening a lot on customers’ databases, and this begs for a short introduction.

    MySQL doesn’t support index functions

    Let’s assume the following table:

    CREATE TABLE `person` (
     `id` bigint(20) NOT NULL AUTO_INCREMENT,
     `first_name` varchar(32) CHARACTER SET utf8 DEFAULT NULL,
     `last_name` varchar(32) CHARACTER SET utf8 DEFAULT NULL,
     `driver_license_registration` bigint(20) DEFAULT NULL,
     PRIMARY KEY (`id`),
     KEY `last_name` (`last_name`),
     KEY `driver_license_registration` (`driver_license_registration`)
      [Read more...]
    Useful temporal functions & queries
    +3 Vote Up -0Vote Down

    Here’s a complication of some common and useful time & date calculations and equations. Some, though very simple, are often misunderstood, leading to inefficient or incorrect implementations.

    There are many ways to solve such problems. I’ll present my favorites.

    Querying for time difference

    Given two timestamps: ts1 (older) and ts2 (newer), how much time has passed between them?

    One can use TIMEDIFF() & DATEDIFF(), or compare two UNIX_TIMESTAMP() values. My personal favorite is to use TIMESTAMPDIFF(). Reason being that I’m usually interested in a specific metric, like the number of hours which have passed, or the number of days, disregarding the smaller minute/second resolution. Which allows one to:

      [Read more...]
    How (not) to find unused indexes
    +2 Vote Up -0Vote Down

    I've seen a few people link to an INFORMATION_SCHEMA query to be able to find any indexes that have low cardinality, in an effort to find out what indexes should be removed.  This method is flawed - here's the first reason why:

  • CREATE TABLE `sales` (
  • `id` int(11) NOT NULL AUTO_INCREMENT,
  • `customer_id` int(11) DEFAULT NULL,
  • `status` enum('archived','active') DEFAULT NULL,
  • PRIMARY KEY (`id`),
  • KEY `status` (`status`)
  • mysql&gt; SELECT count(*), STATUS FROM sales GROUP BY STATUS;
  • +----------+---------+
  • | count(*) | STATUS  |
  •   [Read more...]
    Comparison Between Solr And Sphinx Search Servers (Solr Vs Sphinx – Fight!)
    +2 Vote Up -0Vote Down

    In the past few weeks I've been implementing advanced search at Plaxo, working quite closely with Solr enterprise search server. Today, I saw this relatively detailed comparison between Solr and its main competitor Sphinx (full credit goes to StackOverflow user mausch who had been using Solr for the past 2 years). For those still confused, Solr and Sphinx are similar to MySQL FULLTEXT search, or for those even more confused, think Google (yeah, this is a bit of a stretch, I know).


    • Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.
    • Both
      [Read more...]
    Better Primary Keys, a Benefit to TokuDB’s Auto Increment Semantics
    +1 Vote Up -0Vote Down

    In our last post, Bradley described how auto increment works in TokuDB. In this post, I explain one of our implementation’s big benefits, the ability to combine better primary keys with clustered primary keys.

    In working with customers, the following scenario has come up frequently. The user has data that is streamed into the table, in order of time. The table will have a primary key that is an auto increment field, ‘id’, and then have an index on the field ‘time’. The queries the user does are all on some range of time (e.g. select sum(clicks) from foo where time > date ‘2008-12-19’ and time

    For storage engines with clustered primary keys (such as TokuDB and InnoDB), having such a schema hurts query

      [Read more...]
    MySQL 5.1 Grammar Changes to Support Clustering Indexes
    +0 Vote Up -0Vote Down

    This post is for storage engine developers that may be interested in implementing multiple clustering keys.

    After blogging about TokuDB’s multiple clustering indexes feature, Baron Schwartz suggested we contribute the patch to allow other storage engine to implement the feature. We filed a feature request to MySQL to support this, along with a proposed patch. The patch, along with known issues, can be found here.

    What the patch contains:

    This patch has the changes necessary to introduce the new grammar for clustering indexes, and to tell the storage engine what indexes are defined as clustering. With

      [Read more...]
    How clustering indexes sometimes help UPDATE and DELETE performance
    +0 Vote Up -0Vote Down

    I recently posted a blog entry on clustering indexes, which are good for speeding up queries.  Eric Day brought up the concern that clustering indexes might degrade update performance. This is often true, since any update will require updating the clustering index as well.

    However, there are some cases in TokuDB for MySQL, where the opposite is true: clustering indexes can drastically improve the performance of updates and deletions.  Consider the following analysis and example.

    Updates and deletions generally have two steps. First, a query is done to find the necessary rows (the where clause in the statement), and then the rows are modified: they are deleted or updated, as the case may be. So the total time to do a deletion

      [Read more...]
    Long Index Keys
    +1 Vote Up -0Vote Down

    In this post we’ll describe a query that accrued significant performance advantages from using a relatively long index key.  (This posting is by Zardosht and Bradley.)

    We ran across this query recently when interacting with a customer (who gave us permission to post this sanitized version of the story):

    SELECT name,
           Count(e2) AS CountOfe2
     FROM (SELECT distinct name, e2
            FROM (SELECT atable.NAME AS name,
                         pd1.NAME AS e2
                   FROM atable INNER JOIN atable AS pd1
                   ON  (atable.id = pd1.id)
                   AND (atable.off = pd1.off)
                   AND (atable.len = pd1.len)) ent
     WHERE ((ent.name<>ent.e2))) outside
     GROUP BY outside.name order by CountOfe2 desc;

    With a table defined as

    CREATE TABLE `atable` (
       `id` varchar(25) DEFAULT NULL,
       `off` bigint(20)

      [Read more...]
    Clustering indexes vs. Covering indexes
    +1 Vote Up -0Vote Down

    Yesterday, I (Zardosht) posted an entry introducing clustering indexes.  Here, I elaborate on three differences between a clustering index and a covering index:

  • Clustering indexes can create indexes that would otherwise bounce up against the limits on the maximum length and maximum number of columns in a MySQL index.

  • Clustering indexes simplify syntax making them easier and more intuitive to use.

  • Clustering indexes have smaller key sizes leading to better performance.

  • Expanding MySQL’s Limits

    MySQL allows at most 16 columns in an index and at most 3072 bytes per index. For tables that have more than 16 columns or a row size of greater than 3072 bytes, one cannot create a

      [Read more...]
    SQL: finding a user’s country/region based on IP
    +2 Vote Up -0Vote Down

    I’ve encountered the same problem twice for different customers, so I guess it’s worth a discussion.

    A common task for web applications is to find out the country/region of a user, based on her IP address, as can be detected in the HTTP request. Depending on the country of origin, the website can translate dates for different time zones, can change locale settings, and, perhaps most commonly, show advertisements in her native language.

    To start with, there’s a table which lists the IP ranges per country/region. Let’s assume we’re only dealing with IPv4:

    CREATE TABLE regions_ip_range (
      regions_ip_range_id INT UNSIGNED AUTO_INCREMENT,
      country VARCHAR(64) CHARSET utf8,
      region VARCHAR(64) CHARSET utf8,
      start_ip INT UNSIGNED,
      end_ip INT UNSIGNED,
      PRIMARY KEY(regions_ip_range_id),
      [Read more...]
    The depth of an index: primer
    +1 Vote Up -0Vote Down

    InnoDB and MyISAM use B+ and B trees for indexes (InnoDB also has internal hash index).

    In both these structures, the depth of the index is an important factor. When looking for an indexed row, a search is made on the index, from root to leaves.

    Assuming the index is not in memory, the depth of the index represents the minimal cost (in I/O operation) for an index based lookup. Of course, most of the time we expect large portions of the indexes to be cached in memory. Even so, the depth of the index is an important factor. The deeper the index is, the worse it performs: there are simply more lookups on index nodes.

    What affects the depth of an index?

    There are quite a few structural issues, but it boils down to two important factors:

  • The number of rows in the table: obviously, more rows leads to larger index, larger indexes grow in depth.
  •   [Read more...]
    Useful database analysis queries with INFORMATION_SCHEMA
    +0 Vote Up -0Vote Down

    A set of useful queries on INFORMATION_SCHEMA follows. These queries can be used when approaching a new database, to learn about some of its properties, or they can be regularly used on an existing schema, so as to verify its integrity.

    I will present queries for:

    • Checking on database engines and size
    • Locating duplicate and redundant indexes
    • Checking on character sets for columns and tables, looking for variances
    • Checking on processes and long queries (only with MySQL 5.1)


    The following query returns the total size per engine per database. For example, it is common that in a given database, all tables are InnoDB. But once in a while, and even though default-engine is set to InnoDB, someone creates a MyISAM table. This may break transactional behavior, or may cause a mysqldump --single-transaction to be ineffective.

      [Read more...]
    Two storage engines; different plans, Part II
    +0 Vote Up -0Vote Down
    In Part I of this article, we have seen how the internal structure of the storage engine’s index can affect an execution plan. We’ve seen that some plans are inherent to the way engines are implemented. We wish to present a second scenario in which execution plans vary for different storage engines. Again, we will consider [...]
    Two storage engines; different plans, Part I
    +0 Vote Up -0Vote Down
    A popping question is: “Can an execution plan change for different storage engines?” The answer is “Yes”. I will present two such cases, where the MySQL optimizer will choose different execution plans, based on our choice of storage engine. We will consider MyISAM and InnoDB, the two most popular engines. The two differ in many respects, and [...]
    Two basic indexing tips ...
    +0 Vote Up -0Vote Down
    Here are two basic tips for proper indexing ...Don't mess with datatypes, too often people refer to an attribute defining it as one datatype in a table and as another in different tables, this actually prevents index usage in joins (forget about FKs for this time ;)) See an example here. You could declare a function based index as a workaround, but why don't we all try to make it right?Put
    Progress report on High Performance MySQL, Second Edition
    +0 Vote Up -0Vote Down

    It's been a while since I've written about progress on the book. I actually stopped working on it as much at the beginning of the month, because on October 31(st) I managed to finish a first draft of the last big chapter! Now I'm back to full-time work at my employer, and I'm working on the book in the evenings and weekends only. Read on for details of what I've been working on and what's next in the pipeline.

    High Performance MySQL, Second Edition: Schema Optimization and Indexing
    +0 Vote Up -0Vote Down

    I've been trying to circle back and clean up things I left for later in several chapters of High Performance MySQL, second edition. This includes a lot of material in chapter 4, Schema Optimization and Indexing. At some point I'll write more about the process of writing this book, and what we've done well and what we've learned to do better, but for right now I wanted to complete the picture of what material we have on schema, index, and query optimization. The last two chapters I've written about (Query Performance Optimization and Advanced

      [Read more...]
    Sphinx - Open Source SQL Full Text Search Engine
    +1 Vote Up -0Vote Down

    I came across Sphinx today via the MySQL Performance Blog (which has some good entries you might want to check out). It is an Open Source Full Text SQL Search Engine. It can be installed as a storage engine type on MySQL, and from what I hear can beat the pants off of MySQL's built-in full text search in some cases.

    From the web site:

    Generally, it's a standalone search engine, meant to provide fast, size-efficient and relevant fulltext search functions to other applications. Sphinx was specially designed to integrate well with SQL databases and scripting languages. Currently
      [Read more...]
    MySQL FULLTEXT Indexing and Searching
    +1 Vote Up -0Vote Down

    MySQL has supported FULLTEXT indexes since version 3.23.23. VARCHAR and TEXT Columns that have been indexed with FULLTEXT can be used with special SQL statements that perform the full text search in MySQL.

    To get started you need to define the FULLTEXT index on some columns. Like other indexes, FULLTEXT indexes can contain multiple columns. Here's how you might add a FULLTEXT index to some table columns:

    ALTER TABLE news ADD FULLTEXT(headline, story);

    Once you have a FULLTEXT index, you can search it using MATCH and AGAINST statements. For example:

    SELECT headline, story FROM news
    WHERE MATCH (headline,story) AGAINST ('Hurricane');

    The result of this query is automatically sorted by relevancy.



      [Read more...]
    Previous 30 Newer Entries Showing entries 31 to 58

    Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

    Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.