I use Aspersa tools a lot and I find myself going to the website just to download one of the tools all the time. I love I can download maatkit with a simple wget maatkit.org/get/tool command so I made bit.ly shortcuts for all of the current aspersa tools. Here’s the full list with my favorite on the top and least favorite (but none the less very useful) on the bottom:
I recently worked on a customer case where at seemingly random times, inserts would fail with Innodb error 139. This is a rather simple problem, but due to it’s nature, it may only affect you after you already have a system running in production for a while.
Suppose you have the following table structure:
CREATE TABLE example ( id INT UNSIGNED NOT NULL AUTO_INCREMENT, fname TEXT NOT NULL, fcomment TEXT, ftitle TEXT NOT NULL, fsubtitle TEXT NOT NULL, fcontent TEXT NOT NULL, fheader TEXT, ffooter TEXT, fdisclaimer TEXT, fcopyright TEXT, fstylesheet TEXT, fterms TEXT, PRIMARY KEY (id) ) Engine=InnoDB;
Now you insert some test data into it:
mysql> INSERT INTO example
-> VALUES (
-> 'First example',
-> 'First comment',
-> 'First title',
I don’t use many tools in my consulting practice but for the ones I do, I try to know them as best as I can. I’ve been using mk-query-digest for almost as long as it exists but it continues to surprise me in ways I couldn’t imagine it would. This time I’d like to share a quick tip on how mk-query-digest allows you to slice your data in a completely different way than it otherwise would by default.
Disclaimer: this only works when persistent connections or connection pools aren’t used and is only accurate when single mysql connection is used during execution of a request.
If you are seeking to reduce the load on the database server and [as a result] increase response time for some random user requests, you are usually[Read more...]
Data access is expensive for your application. It often requires CPU, network and disk access, all of which can take a lot of time. Using less computing resources, particularly in the cloud, results in decreased overall operational costs, so caches provide real value by avoiding using those resources. You need an efficient and reliable cache in order to achieve the desired result. Your end users also care about response times because this affects their work productivity or their enjoyment of your service. This post describes some of the most common cache methods for MySQL.
When the query cache is enabled, MySQL examines each query to see if the contents have been stored in the query cache. If the results have been cached they are[Read more...]
Have you ever seen BIG weird numbers in mk-query-digest report that just seem wrong? I have! Here’s one report I got today:
... # Attribute total min max avg 95% stddev median # ============ ======= ======= ======= ======= ======= ======= ======= # Exec time 5088s 1us 171s 2ms 467us 104ms 28us # Lock time 76s 0 3s 26us 69us 3ms 0 # Rows sent 9.80M 0 1.05M 3.50 0.99 642.32 0 # Rows examine 5.59G 0 82.56M 2.00k 0.99 97.41k 0 # Rows affecte 457.30k 0 2.62k 0.16 0.99 1.68 0 # Rows read 2.16G 0 82.56M 788.53 21.45 82.91k 0.99 # Bytes sent 2.14T 0 4.00G 781.27k 3.52k 47.84M 84.10 #[Read more...]
If you know me, then you probably have heard of Flexviews. If not, then it might not be familiar to you. I’m giving a talk on it at the MySQL 2011 CE, and I figured I should blog about it before then. For those unfamiliar, Flexviews enables you to create and maintain incrementally refreshable materialized views.
You might be asking yourself “what is an incrementally refreshable materialized view?”. If so, then keep reading. This is the first in a multi-part series describing Flexviews.
You can find part 2 of the series here:
UPDATE 2011-03-15 12:50 Pacific – This post may already be out of date, am testing with a fresh snapshot of the source and will update accordingly. (Thanks, Andy)
Several weeks ago at Percona Live 2011, I presented on HandlerSocket. Due to time constraints, I had to omit some more in-depth discussions in favor of being able to present a broad overview. One of those discussions was about how, exactly, HandlerSocket does not play nicely with auto increment columns. So I wanted to take the time here to show how they behave together versus how we might expect.
Let’s take, for example, a standard setup using a standard SQL_MODE (not using anything like[Read more...]
Recent Tips and News on Java EE 6 & GlassFish:
• How to configure GlassFish 3 + ColdFusion 9 and IIS7 (Benjamin Wong)
• Can stateful Java EE 6 Apps scale ? (Adam Bien)
• (Markus Eisele @ JavaLobby)
• JSF and Java EE Newscast - Episode 01 (Kito Mann)
• Sending email to your SSL/TLS enabled smtp
MySQL keeps many different files, some contain real data, some contain meta data. Witch ones are important? Witch can your throw away?
This is my attempt to create a quick reference of all the files used by MySQL, whats in them, what can you do if they are missing, what can you do with them.
When I was working for Dell doing Linux support my first words to a customer where “DO YOU HAVE COMPLETE AND VERIFIED BACKUP?” Make one now before you think about doing anything I suggest here.
You should always try to manage your data through a MySQL client. If things have gone very bad this may not be possible. MySQL may not start. If your file system get corrupt you may have missing files. Sometimes people create other files in the MySQL directory (BAD). This should help you understand what is safe to remove.
Before you try to work with one of[Read more...]
This post started with a simple question: “Does the function NOW() get executed every time a row is examined?” According to the manual, “Functions that return the current date or time each are evaluated only once per query …. multiple references to a function such as NOW() … produce the same result. …. (However,) as of MySQL 5.0.12, SYSDATE() returns the time (the row is) executes. “
Inspired by Baron's earlier post, here is one I hear quite frequently -
"If you enable innodb_file_per_table, each table is it's own .ibd file. You can then relocate the heavy hit tables to a different location and create symlinks to the original location."
There are a few things wrong with this advice:
mysql -B -N -e 'show databases'-B forces batch mode (no dashes box around the data), while -N gets the result without the [Read more...]
Before I start a story about the data recovery case I worked on yesterday, here's a quick tip - having a database backup does not mean you can restore from it. Always verify your backup can be used to restore the database! If not automatically, do this manually, at least once a month. No, seriously - in most of the recovery cases I worked on, customers did have some sort of backup, but it just wasn't working, complete and what not. Someone set it up and never bothered to check if it still works after a while.
Anyway, this post is not really about the backups but rather about few interesting things I learned during last recovery case.
First, some facts about the system and how data was lost:
I have a 5G mysqldump which takes 30 minutes to restore from backup. That means that when the database reaches 50G, it should take 30x10=5 hours to restore. Right? Wrong.
Mysqldump recovery time is not linear. Bigger tables, or tables with more indexes will always take more time to restore.
If I restore from a raw backup (LVM snapshot, xtrabackup, innodb hot backup), it is very easy to model how much longer recovery time will take:
Backup is 80G
Copy is at 70MB/s.
10G is already complete.
= ((80-10) * 1024)/70/60 = ~17 minutes
I can tell progress with mysqldump by monitoring the rate at which show global status like 'Handler_write'; increases and compare it to my knowledge of about how many rows are in each table. But progress != a magic number like "17[Read more...]
The title is SEO bait – you can’t do it. We’ve seen a few recurring patterns trying to achieve similar – and I thought I would share with you my favorite two:
Option #1: Use a table to insert into, and grab the insert_id:
CREATE TABLE option1 (id int not null primary key auto_increment) engine=innodb; # each insert does one operations to get the value: INSERT INTO option1 VALUES (NULL); # $connection->insert_id();
Option #2: Use a table with one just row:
CREATE TABLE option2 (id int not null primary key) engine=innodb; INSERT INTO option2 VALUES (1); # start from 1 # each insert does two operations to get the value: UPDATE option2 SET id=@id:=id+1; SELECT @id;
So which is better? I don’t think it’s that easy to tell at a[Read more...]
Recent Tips and News on Java EE 6 & GlassFish:
I occasionally get in to light arguments healthy discussions with students about whether or not to use Foreign Key constraints on InnoDB tables. My standard response has always been: “it depends on how much of a tradeoff you are willing to make for performance. In some situations the cost can be considerable”.
.. that’s when they expect me to “come up with some real proof” to show them. I do not disagree with their logic or proof being on their list-of-demands. I support the use of data to make decisions.
The problem is that MySQL has (traditionally) been lacking the instrumentation required to make these decisions easy. This is getting better – here is an example we recently added to our[Read more...]
Few days ago I was working on a case where we needed to modify a lot of data before pushing it to sphinx – MySQL did not have a function to do the thing so I thought I’ll write MySQL Stored Function and we’ll be good to go. It worked! But not so well really – building the index, which was taking 10 minutes, was now taking 16 minutes. Then we added another MySQL function for different set of attributes and indexing speed went from 16 minutes to 26 minutes. I knew using UDF would be faster, but I had no idea how much. Have you ever wondered?
So what were the modifications we needed? It was couple very simple things – (1) two varchar columns needed leading[Read more...]
In my business (weather) we use lots map based (Geo) information. Almost every table has latitude and longitude. Working with this kind of data can be exciting and frustrating. This should give you a quick start into GIS with MySQL.
“A geographic information system (GIS), or geographical information system, is any system that captures, stores, analyzes, manages, and presents data that are linked to location. In the simplest terms, GIS is the merging of cartography, statistical analysis, and database technology. GIS systems are used in cartography, remote sensing, land surveying,[Read more...]
If you read Percona's whitepaper on Goal-Driven Performance Optimization, you will notice that we define performance using the combination of three separate terms. You really want to read the paper, but let me summarize it here:
Setting and meeting your response time goal should always be your primary focus, but the closer throughput is to capacity the worse response time can be. It's[Read more...]
MySQL seems to be happy to convert types for you. Developers are rushed to complete their project and if the function works they just move on. But what is the costs of mixing your types? Does it matter if your are running across a million rows or more? Lets find out.
Here is what the programmers see.
mysql> select 1+1; +-----+ | 1+1 | +-----+ | 2 | +-----+ 1 row in set (0.00 sec) mysql> select "1"+"1"; +---------+ | "1"+"1" | +---------+ | 2 | +---------+ 1 row in set (0.00 sec)
What if we do a thousand simple loops? How long does the looping itself take?
BENCHMARK() function executes the expression
I recently had a run-in with a very popular PHP ecommerce package which makes me want to voice a recurring mistake I see in how many web applications are architected.
What is that mistake?
The ecommerce package I was working with depended on caching. Out of the box it couldn't serve 10 pages/second unless I enabled some features which were designed to be "optional" (but clearly they weren't).
I think with great tools like memcached it is easy to get carried away and use it as the mallet for every performance problem, but in many cases it should not be your first choice. Here is why:
While many people are familiar with the MySQL EXPLAIN command, fewer people are familiar with "extended explain" which was added in MySQL 4.1
EXPLAIN EXTENDED can show you what the MySQL optimizer does to your query. You might not know this, but MySQL can dramatically change your query before it actually executes it. This process is called query rewriting, and it is a part of any good SQL optimizer. EXPLAIN EXTENDED adds a warning message to the EXPLAIN output which displays additional information, including the rewritten query.
To take a look at EXPLAIN EXTENDED, I'll start with three empty tables. It is important to note that the tables are empty because the MySQL optimizer treats empty tables (and incidentally, tables with only one row) differently than tables which contain more than one row. More about that in a[Read more...]
Today, I was looking for a quick way to see HTTP response codes of a bunch of urls. Naturally, I turned to the curl command, which I would usually use like this:
curl -IL "URL"
This command would send a HEAD request (-I), follow through all redirects (-L), and display some useful information in the end. Most of the time it's ideal:
curl -IL "http://www.google.com" HTTP/1.1 200 OK Date: Fri, 11 Jun 2010 03:58:55 GMT Expires: -1 Cache-Control: private, max-age=0 Content-Type: text/html; charset=ISO-8859-1 Server: gws X-XSS-Protection: 1; mode=block Transfer-Encoding: chunked
However, the server I was curling didn't support HEAD requests explicitly. Additionally, I was really only interested in HTTP status codes and not in the rest of the output. This means I would have to change my strategy and[Read more...]
Here are some tips that have been recently published on Java EE 6 & GlassFish:[Read more...]
MySQL supports two different algorithms for views: the MERGE algorithm and the TEMPTABLE algorithm. These two algorithms differ greatly. A view which uses the MERGE algorithm can merge filter conditions into the view query itself. This has significant performance advantages over TEMPTABLE views. A view which uses the TEMPTABLE algorithm will have to compute the entire contents of the view before any filter is applied. Computing the entire contents requires a temporary table and many more rows may be accessed than otherwise would had the filter been merged into the view query.
As long as a view avoids aggregation, DISTINCT, GROUP BY, ORDER BY, etc, then it can use the MERGE algorithm. Unfortunately, this means that views of any significant complexity will almost always use the TEMPTABLE algorithm.
This blog post[Read more...]
If your MySQL server has hundreds of clients (applications) and tens of thousands of queries per second, MySQL default network settings may NOT be for you. Network performance is not often a significant factor in the performance of MySQL. That said, there are things to consider.
If you are building new applications make these changes now. Developer expectations are hard to change. My example below will break your application if developers open a database connections and then spend ten[Read more...]
MySQL 5.1 introduces row based binary logging. In fact, the default binary logging format in GA versions of MySQL 5.1 is 'MIXED' STATEMENT*; The binlog_format variable can still be changed per sessions which means it is possible that some of your binary log entries will be written in a row-based fashion instead of the actual statement which changed data, even when the global setting on the master is to write binary logs in statement mode. The row-based format does offer advantages particularly if triggers or stored procedures are used, or if non deterministic functions like RAND() are used in DML statements.
A statement based replication slave can get out of sync with the master fairly easily, especially if data is changed on the slave. It is possible for a statement to execute successfully on a slave even if the data is not 100% in[Read more...]