How often do we think about our http sessions implementation? I mean, do you know, how your currently used sessions-related code will behave when sessions number in your database will grow up to millions (or, even, hundreds of millions) of records? This is one of the things we do not think about. But if you’ll think about it, you’ll notice, that 99% of your session-related operations are read-only and 99% of your sessions writes are not needed. Almost all your sessions table records have the same information: session_id and serialized empty session in the data field.
Looking at this sessions-related situation we have created really simple (and, at the same time, really useful for large Rails projects) plugin, which replaces ActiveRecord-based session store and makes sessions much more effective. Below you can find some information about implementation details and decisions we’ve made in this plugin, but if you just want to try it, then …
[Read more]We got a chance to speak with Mårten Mickos, CEO of MySQL AB, about Sun's planned acquisition of MySQL, and we asked him some questions, that the community at large, might have.
I've completed a beta implementation of my take on the
replication pre-cache tool... Sorry nothing to download yet, I
have to get it through an internal committee at Yahoo before I
can release it (and you can imagine things are kind of crazy
here). I wrote it myself because:
- I had it mostly done before I found out there were other versions out there
- I have to maintain it inside of Yahoo anyway
- I wanted to learn Ruby :)
It's just over 250 lines of Ruby, my new favorite language and
fairly compact. It doesn't use the Ruby Mysql library,
rather just IO.popen calls to the mysql command line
client. I did this for two reasons:
- I haven't figured out the "right" way to deploy ruby gems at Yahoo yet (it's complicated).
Thanks to everyone who attended the performance coding webinar today. I think there were about 250 people online, which is a great turnout! Sorry for dropping off about fifteen minutes into the webinar — that'll teach me to use Skype for presenting!
There were a bunch of questions from attendees and I have tried to quote them and answer to the best of my knowledge below. They are in no particular order, and where possible, I refer to the slide where I think the question referred to. Happy tuning and coding!
Miscellaneous Questions Will MySQL Support Functional Indexes Soon?
This comes up every time! Not sure why...perhaps there's lots of Oracle and DB2 users also using MySQL and wanting to migrate to MySQL without changing schemas. In any case, the short answer is no, they won't. There is a Worklog task that describes the status of this feature …
[Read more]So far, I’ve analyzed point and range queries. Now it’s time to talk about insertions and deletions. We’ll call the combination updates. Updates come in two flavors, and today we’ll cover both.
Depending on the exact settings of your database, the updates give a varying amount of feedback. For example, when a key is deleted, all rows with that key are deleted (assuming the database allows duplicate keys). The normal behavior is to return the number of rows deleted. The normal behavior when deleting a key that has no corresponding rows in the database is to return an error message. On insertion, one can allow duplicate or not. In the latter case, the storage engine returns an error message if a duplication insertion is attempted.
We’ll see that the details of error messages have a profound influence on the lower-bound arguments I’ve been making (and we’ll see a bit …
[Read more]Sometimes you need to have the general query log on and even though it causes more disk I/O than you may want, it’s good for troubleshooting. This log can and probably will fill up your disks rather quickly. Then there’s the slow query log - setting log_slow_queries and log_queries_not_using_indexes will write out the queries that take longer than long_query_time to execute, as well as any query not using an index.
So, since MySQL does not apply the expire_logs_days value to these logs - only to the binary log (log_bin), we need another solution. There are probably a bunch of custom scripts out there that do this, but big surprise - we have one as well. This was originally written by Jim Wood until I got my hands on it and made some changes. The changes are listed in the head of the script. This little guy will rotate the logs out to another directory and gzip them. …
[Read more]Yeah, I'm givin' another one of those webinars today at 1pm EST/10am PST on Performance Coding for MySQL. The webinars are FREE, they're FUN, and I have new slides...mostly!
The slide deck I am using today is a cut-down version of the performance coding session I gave at CodeMash a couple weeks ago. The full slide decks are available in OpenOffice Impress and PDF format below. Enjoy.
This is one of the new slides, which contains a picture of the pygmy marmoset, the world's …
[Read more]So far, I’ve analyzed point and range queries. Now it’s time to talk about insertions and deletions. We’ll call the combination updates. Updates come in two flavors, and today we’ll cover both.
Depending on the exact settings of your database, the updates give a varying amount of feedback. For example, when a key is deleted, all rows with that key are deleted (assuming the database allows duplicate keys). The normal behavior is to return the number of rows deleted. The normal behavior when deleting a key that has no corresponding rows in the database is to return an error message. On insertion, one can allow duplicate or not. In the latter case, the storage engine returns an error message if a duplication insertion is attempted.
We’ll see that the details of error messages have a profound influence on the lower-bound arguments I’ve been making (and we’ll see a bit …
[Read more]Following up on yesterday’s post on the MySQL slow query log, I’ve been doing quite a bit of analysis of our slow query log entries.
To make sure we catch everything, we’ve also turned on log_queries_not_using_indexes. This gives us a couple of problems
- Partly because we’re at an early stage of deployment, some tables are so small that even though the correct indexes are defined, they’re not used.
- Certain tables are updated frequently, with lots of inserts and deletes taking place. Most of the time, the tables in question contain relatively few rows, again causing the optimizer to ignore the defined indexes and the slow query log to be updated.
In other words, we’re having a boatload of false positives — just a quick analysis reveals more than 100k of the 144k queries picked up so far by the slow query log are …
[Read more]