I started this as a response to Keith Murphy’s post at http://www.paragon-cs.com/wordpress/?p=54, but it
got long, so it deserves its own post. The basic context is
figuring out how not to cause duplicate information if a large
INSERT statement fails before finishing.
Firstly, the surefire way to make sure there are no duplicates if
you have a unique (or primary) key is to use INSERT IGNORE
INTO.
Secondly, I just experimented with adding an index to an InnoDB
table that had 1 million rows, and here’s what I got (please
note, this is one experience only, the plural of “anecdote” is
*not* “data”; also I did this in this particular order, so there
may have been caching taking place): More »
So, O’Reilly’s ONLamp.com has published the “Top 10 MySQL Best
Practices” at http://www.onlamp.com/pub/a/onlamp/2002/07/11/MySQLtips.html.
Sadly, I find most “best practice” list do not thoroughly explain
the “why” enough so that people can make their own
decisions.
For instance, #3 is “Protect the MySQL installation directory
from access by other users.” I was intrigued at what they would
consider the “installation” directory. By reading the tip, they
actually mean the data directory. They say nothing of the log
directory, nor that innodb data files may be in different places
than the standard myisam data directories. More »
I?ve recently done a quick scalability test with MySQL 5.0 and 5.1 to check the new auto-inc patch with InnoDB and to see how MySQL 5.1 scales with InnoDB:
New in MySQL 5.1: innodb_autoinc_lock_mode = 1
(?consecutive? lock mode)
With this lock mode, ?simple inserts? (only) use a new locking
model where a light-weight mutex is used during the allocation of
auto-increment values, and no AUTO-INC table-level lock is used,
unless an AUTO-INC lock is held by another transaction. If
another transaction does hold an AUTO-INC lock, a ?simple insert?
waits for the AUTO-INC lock, as if it too were a ?bulk
insert.?
http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html#innodb-auto-increment-configurable
This fixes the …
aka…..”when good queries go bad!”
So, today the developers were debugging why a script was running
much longer than expected. They were doing text database inserts,
and got to the point where they realized that double the amount
of text meant the queries took double the amount of time.
You see, they were doing similar text inserts over and over,
instead of using connection pooling and/or batching them.
Apparently the other DBA explained that it was a limitation of
MySQL, but either the developers didn’t convey what they were
doing well, or the DBA didn’t think to mention batching. More »
I'm working on alternative strategies to make the use and maintenance of a multi-terabyte data warehouse implementation tolerably fast. For example, it's clear that a reporting query on a 275-million row table is not going to be fun by anyone's definition, but that for most purposes, it can be pre-processed to various aggregated tables of significantly smaller sizes.
However, what is not obvious is what would be the best strategy for creating those tables. I'm working with MySQL 5.0 and Business Objects' Data Integrator XI, so I have a couple of options.
I can just CREATE TABLE ... SELECT ... to see how things work out. This approach is simple to try, but essentially unmaintanable; no good.
I can define the process as a BODI data flow. This is good in many respects, as it creates a documented flow of how the aggregates are updated, is fairly easy to hook up to the workflows which pull in new data from source systems, and …
[Read more]
If you're using MySQL with the MyISAM engine for a high-writes
application, delay_key_write is usually very good.
Let me explain further: delay_key_write is a table option which
causes the database NOT to flush the MyISAM key file after every
write. This is a really good thing, as if you're doing another
write very soon anyway, this is likely to just waste I/O
time.
This doesn't sound like a good idea right, because it means that
if the power fails (or mysql crashes, or something), then you'll
be left with a broken index file? No, it's still a very good
idea:
- delay_key_write does NOT appear to affect the MyISAM data file - that will still be flushed according to the normal policy.
- If you had delay_key_write off, then a power failure or crash during the index write would cause the same level of corruption.
- Broken MyISAM index files need to be rebuilt, regardless of how little …
As a quick follow up to the last post on malloc() speed comparisons, I decided to extend the size out a bit to cover malloc()ing up to 500M.
Time for 128k: 0.035259 Time for 256k: 0.009718 Time for 1M: 0.478129 Time for 5M: 0.968945 Time for 10M: 0.965172 Time for 50M: 0.674316 Time for 500M: 1.018901
As you can see, once you make the jump up to mmap() (>256k), the cost is fairly well constant (give or take fluctuations). So it’s not that huge memory buffers are terrible, just that there is a cost difference between the smaller and larger buffer sizes that may or may not matter in your case.
[Read more]
The recent read_buffer issue got me curious, so I hacked up a quick (ugly) test program to see if I could show the different speeds of malloc()ing different buffer sizes. Here’s the test code:
[C]
#include
#include
#include
#define LOOP 10000
int main(void) {
int x[4] = { 128,256,1024,5*1024 } ;
int f = 0;
for(f=0;f<4;f++) {
int val = x[f];
timeval before;
timeval after;
gettimeofday(&before,0);
int loop=0;
for(loop=0;loop after.tv_usec) {
udiff = (float)(after.tv_usec + 10000000 -
before.tv_usec)/10000000.0;
diff = after.tv_sec - 1 - before.tv_sec;
} else {
udiff = (float)(after.tv_usec - before.tv_usec)/1000000.0;
diff = after.tv_sec - before.tv_sec;
}
printf(”Time for %dk:\t%f\n”,val,((float)diff+udiff));
}
return 0; …
[Read more]I had some fun yesterday with some odd performance problems. So I did a run with oprofile and got this:
561612 25.0417 /lib64/tls/libc-2.3.4.so memset 429457 19.1491 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux clear_page 214268 9.5540 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_page_fault 144293 6.4339 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux do_no_page 94410 4.2097 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux buffered_rmqueue 64998 2.8982 /lib64/tls/libc-2.3.4.so memcpy 59565 2.6559 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux __down_read_trylock 59369 2.6472 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux handle_mm_fault 47312 2.1096 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux free_hot_cold_page 39161 1.7462 /usr/lib/debug/lib/modules/2.6.9-34.ELsmp/vmlinux release_pages 39140 1.7452 …[Read more]
I (belatedly) noticed a meme running on Planet MySQL regarding wishlist items for the company. I think it started with Jay Pipes and Mårten Mickos, but has since moved on to users. In particular, I'd endorse most of Jeremy Cole's and Ronald Bradford's wishes myself as well.
But let me jump on the bandwagon and offer my view of the things that would most help us run and develop our services.
1. Online table changes. Ronald mentioned this as well, but I have to emphasize this more: for all the good that InnoDB did in terms of …
[Read more]