Showing entries 1 to 10 of 12
2 Older Entries »
Displaying posts with tag: craigslist (reset)
GNU Parallel and Block Size(s)

I’ve been a fan of GNU Parallel for a while but until recently have only used it occasionally. That’s a shame, because it’s often the simplest solution for quickly solving embarrassingly parallel problems.

My recent usage of it has centered around database export/import operations where I have a file that contains a list of primary keys and need to fetch the matching rows from some number of tables and do something with the data. The database servers are sufficiently powerful that I can run N copies of my script to get the job done far faster (where N is value like 10 or 20).

A typical usage might look like this:

cat ids.txt | parallel -j24 --max-lines=1000 --pipe "bin/munge-data.pl  --db live >> {#}.out

However, I recently found myself scratching my head because parallel was only running 3 jobs rather than the 24 I had specified. …

[Read more]
Handling Database Failover at Craigslist

There has been some interesting discussion on-line recently about how to handle database (meaning MySQL, but really it applies to other systems too) failover. The discussion that I’ve followed so far, in order, is:

[Read more]
Slides for: Fusion-io and MySQL 5.5 at Craigslist

The slides from the talk I gave at Percona Live San Francisco yesterday are now available on slideshare (hopefully the embed works here too):

View more presentations from Jeremy Zawodny.

Overall, I enjoyed the conference–not just meeting up folks I don’t see often enough, but also getting a renewed sense of how active the MySQL ecosystem really is.

When an example falls in your lap

As I recently noted, I’m giving a short talk at Percona Live about our experience with Fusion-io for MySQL at Craigslist. As is often the case, I agreed to give the talk before giving too much thought about exactly what I’d say. But recently I’ve started to sweat a little at the prospect of having to think up a compelling and understandable presentation.

Thankfully, due to a cache misconfiguration this week, we ended up taking a number of steps that not only will help us to deal with future growth, but as a side-effect we got to directly quantify some of the benefits of Fusion-io in our big MySQL boxes. For whatever reason, the bulk of the presentation basically fell into my lap today.

Now I just have to put it all together.

I won’t go so far as to claim that this is an argument for procrastination,but it sure is nice when …

[Read more]
Speaking at Percona Live in San Francisco

On Wednesday, February 16th, I’ll be attending Percona Live in San Francisco to hear about what’s new in the MySQL ecosystem and talk about our adoption of Fusion-io storage for some of our systems at Craigslist. Not only do we have a busy web site, the data itself has posed some unique challenges over the last few years.

Part of getting a handle on that was upgrading to faster storage and moving from years-old MySQL 5.0.xx to more modern releases. I’ll also provide a bit of background on our plans to continue scaling and growing in the coming years.

If you’re in the area and interested in some of the cutting edge work that’s been going into production as part of major MySQL/XtraDB deployments, check out the conference. It’s …

[Read more]
Always Test with Real Data

As I previously noted, I’m in the midst of converting some data (roughly 2 billion records) into documents that will live in a MongoDB cluster. And any time you move data into a new data store, you have to be mindful of any limitations or bottlenecks you might encounter (since all systems have had to make compromises of some sort or another).

In MySQL one of the biggest compromises we make is deciding what indexes really need to be created. It’s great to have data all indexed when you’re searching it, but not so great when you’re adding and deleting many rows.

In MongoDB, the thing that gets me is the document size limit. Currently an object stored in MongoDB cannot be larger than 4MB (though that’s likely to be raised soon). Now, you can build your own MongoDB binaries and tweak that parameter, but I’ve been …

[Read more]
MySQL 5.5.4-m3 in Production

Back in April I wrote that MySQL 5.5.4 is Very Exciting and couldn’t wait to start running it in production. Now here we are several months later and are using 5.5.4-m3 on all the slaves in what is arguably our most visible (and one of the busiest) user-facing cluster. Along the way we deployed some new hardware (Fusion-IO) but not a complete replacement. Some boxes are Fusion-io, some local RAID, and some SAN.  We have too many eggs for any one basket.

We also converted table to the Barracuda format in InnoDB, dropped an index or two, converted some important columns to BIGINT UNSIGNED and enabled 2:1 compression for the table that has big chunks of text in it. Aside from a few false starts with the Barracuda conversion and compression, things went pretty well. Coming from 5.0 (skipping 5.1 entirely) we had some my.cnf work to do to …

[Read more]
MongoDB Early Impressions

I’ve been doing some prototyping work to see how suitable MongoDB is for replacing a small (in number, not size) cluster of MySQL servers. The motivation for looking at MongoDB in this role is that we need a flexible and reliable document store that can handle sharding, a small but predictable write volume (1.5 – 2.0 million new documents daily), light indexing, and map/reduce operations for heavier batch queries. Queries to fetch individual documents aren’t that common–let’s say 100/sec in aggregate at peak times.

What I’ve done so far is to create a set of Perl libraries that abstract away the data I need to store and provide a “backend” interface to which I can plug in a number of modules for talking to different data stores (including some “dummy” ones for testing and debugging). This has helped to clarify some …

[Read more]
I Want a New Data Store

While there is a dizzying array of technologies that have the “NoSQL” label applied to them, I’m looking for one to replace a MySQL cluster. This particular cluster has roughly a billion records in it, uses a few TB of disk space, and is growing all the time. It is currently a MySQL master with several slaves and handles a fairly light query volume. The reasons we’d like to change it are:

  • ALTER TABLE takes an unreasonably long time, so we can’t add or removed indexes or columns. Changes take over a month.
  • The data footprint is large enough that it requires RAID and we seem to be really good at breaking RAID controllers.
  • Queries are not very efficient, due partly to the underlying data organization, and due partly to the sheer amount of it compared to the available RAM.
  • The data really isn’t relational anymore, so a document store is more appropriate.  It’s just that when it was set …
[Read more]
Recent Sphinx Updates

If you use the Sphinx search engine and have been watching the development branch (0.9.10) and wondering when to upgrade, I'm here to tell you that "now" is a great time. As of r2037, the last major issue I regularly saw has been fixed. The other big bug was fixed in r2031.

Late last week I began testing those fixes in a "burn-in" test I've developed that makes liberal use of indextool --check. Instead of seeing index corruption within an hour, I saw none. After 3 days of no failures, I deployed it to a subset of our search back-end servers. Yesterday we deployed it to half of the remaining servers.

So far, so good!

I should note that all our index corruption was merge …

[Read more]
Showing entries 1 to 10 of 12
2 Older Entries »