Fairly often I hear customers say that they plan a table that
accumulates millions of rows per day, and they want to keep
around, say, the last 30 days worth of data. (For the sake of
examples, I'm going to make it the last 3 days.) So this is a
kind of round-robin table, with rolling addition of new data and
removal of the oldest data.
With a high volume of data, this sounds like a table partitioned
on day boundaries (in MySQL 5.1). See Sarah's blog and her links
for a quick ramp-up on time-based table partitioning
(http://everythingmysql.ning.com/profiles/blogs/partitioning-by-dates-the).
One great benefit of table partitioning is that you can drop a
partition to lose millions of rows in one quick statement, much
faster than deleting millions of rows. Sort of like a partial
TRUNCATE TABLE.
First create the table with 4 partitions, and then, once a day,
drop the oldest partition and add another partition to store the …
How To Set Up Apache2 With mod_fcgid And PHP5 On Ubuntu 9.04
This tutorial describes how you can install Apache2 with mod_fcgid and PHP5 on Ubuntu 9.04. mod_fcgid is a compatible alternative to the older mod_fastcgi. It lets you execute PHP scripts with the permissions of their owners instead of the Apache user.
Some notes from my recent installation of MySQL 5.4.2-beta on a PPC G5 box running OSX 10.5.8. Hopefully these notes will save someone a bit of time when they begin the “I want to make my mac a development box” process. I’m not going to cover installing/configuring Apache and PHP since the web is full of those articles already. So, first things first, I downloaded the pkg file – not the tarball – from the MySQL downloads site and installed the following files in the following order:
- mysql-5.4.2-beta-osx10.5-powerpc-64bit.pkg
- MySQLStartupItem.pkg
- MySQL.prefPane
CNF File Settings
Here are some interesting settings that you may want to be aware of with 5.4 and OSX
- Do not set innodb_flush_method=fdatasync – it’s not a valid option, and the default after install is actually blank.
- Do not set “nice = …
Has anyone done any more work on recompiling InnoDB with 4k pages and benchmarking under SSD?
We’re building out a new DB that uses very small records (around 32-64 bytes) so reading a whole 16k for this record should have a performance difference.
I haven’t seen any benchmarks on 16k random read IOPS on the Intel SSD but my hunch is that there will be a 20-30% penalty here.
Though even if it was a 4x penalty that would still be about 9k transactions per second which is pretty good.
On a personal note I just bought a new Mac Book Pro which will be upgraded to the Intel X-25M MLC SSD.
Needless to say I’m very excited!
[Read more]I deployed a mysqld binary with support for show table_statistics and quickly realized I need to improve it by replacing the column Rows_changed with the columns Rows_updated, Rows_deleted and Rows_inserted. I also need to support this as a table in the information_schema.
With the current code, I was able to find a performance problem. I setup a server as a slave that did nothing but replication and then compared the list of the top-10 tables by Rows_read and Rows_changed. One table had 55% of the rows read but only 1% of the rows changed. From there I found the SQL for that table in the relay log and found the inefficient SQL. Hopefully we can fix it.
The problem was an update statement that queried many more rows than it updated. I wonder if there is an easier way …
[Read more]It has been more than a year since my self-imposed hiatus from serious MySQL development started and I think it is about time that I get back into the saddle. I have a handful of working prototypes but I should get the code out there, back into the community.I learned a bunch of stuff during the past year at Google but in the end, working on JavaScript, HTML/CSS and Google proprietary languages
A couple of days ago, Baron Schwartz posted some simple load and select benchmarking of MyISAM, Infobright and MonetDB, which Vadim Tkachenko followed up with a more realistic dataset and interesting figures where MonetDB beat Infobright in most queries.
Used to the parallel IEE loader, I was surprised by the apparent slow loading speed of Baron's benchmark and decided to try and replicate it. I installed Infobright 3.2 on my laptop (see, this is very unscientific) and wrote a simple perl script to generate and load an arbitrarily large data set resembling Baron's description. I'm not going to post my exact numbers, because this installation is severely …
[Read more]
Over the past few years I have found that "pager" inside of MySQL
is a really useful tool. I have come up with a few simple, but
extremely effective, ways to use it.
-- Example 1
Lets say you have a MySQL server that is really busy with
extremely long queries. You run a “show full processlist” and
everything going on streams before your eyes, new lines and all.
A more readable way to see what is going on is…
Mysql> pager less –S
By running the above statement the full processlist will NOT
appear in the traditional, word wrapped format. The output will
show in a very readable, right extended, format. All you need to
do now figure out what is going on.
Here is a helpful hint. Set a tee file for MySQL client
output.
Mysql> \\\\T /location/to/file.txt
Then, make sure pager is set to “less –S”. From there you can get
out of the …
Accidentally me and Baron played with InfoBright (see http://www.mysqlperformanceblog.com/2009/09/29/quick-comparison-of-myisam-infobright-and-monetdb/) this week. And following Baron's example I also run the same load against MonetDB. Reading comments to Baron's post I tied to load the same data to LucidDB, but I was not successful in this.
I tried to analyze a bigger dataset and I took public available
data
http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time
about USA domestic flights with information about flight length
and delays.
The data is available from 1988 to 2009 in chunks per month, so I downloaded 252 files (for 1988-2008 years) with …
[Read more]Does a query with FORCE INDEX reduce the overhead from query optimization? For me it did not when using sysbench. I configured sysbench to only do primary key lookup queries and ran it with the client and mysqld on the same server. Query optimization overhead is significant for a memcache-like workload (primary key lookups on cached data). I will publish more data on that in another post.
For QPS versus concurrent sessions the results are here
I modified sysbench to use a force index(PRIMARY) hint in …
[Read more]