Ages ago I created a Twitter account to get some free app. I figured some nobody was following me I didn't have to feel like a guilty spammer. For some odd obsession to honesty probably I did use my proper name and sooner or later people started following me despite me having only put out a single spam message. So on very few occasions I tried out tweeting (still feels weird using that word) since then, obviously I have never used it to get a free app again by spamming. Anyways I have now decided that for small blurps about technical stuff I will from now on use Twitter, thereby sparing my Facebook friends from such gibberish. In turn my developer friends on Facebook that do not care about what I have to say about Frisbee, DJing or politics can start removing me from Facebook. Actually I might just do this myself, because its FUCKING ANNOYING that so many people multi spam their status …
[Read more]I had an interesting conversation with Sheeri yesterday. She had pointed out that today was Ada Lovelace Day, a day devoted to highlight and thank the many women in the Information Technology industry for their contributions. She suggested that if I wanted to blog about it she would find that appropriate, given what we’ve achieved here at Pythian.
First, I consider that a huge compliment. And then, a distant second, I told Sheeri – no I don’t think I’ll blog about it, that’s not my thing.
This is the IM conversation that came out of that email exchange
when Sheeri and I connected about an hour later. You may or may
not find it interesting, but ultimately I thought it was
interesting enough to share.
tl;dr: Happy Ada Lovelace Day.
expanded version:
Paul Vallee:
hey sheeri!
Sheeri K. Cabral: …
I guess that it’s time for the 3rd annual “Ravelry Runs On” roundup. The last two were in March 2008 and March 2009.
This year, our traffic increased by 50% to 5,000,000 page views and 15 million Rails requests per day. We made very few changes to our architecture in 2009 but we did add a new master database server after our working set of data outgrew our memory and IO capacity.
This summary is more detailed then the last two and I’ve broken it up into rough sections.
Physical Network
We own our own servers and colocate then in a datacenter here in Boston. The datacenter provides us with a cooled half cabinet, redundant power, and a blend of premium (Internap, Savvis) bandwidth. We do the rest.
I use …
[Read more]Here’s what Pythian is cooking up for MySQL Conference this year.
Monday, April 128:30am: Get out of bed lazy bones and head to Ballroom B
… because you’re going to want to attend Sheeri K. Cabral‘s tutorial in two parts:
MySQL Configuration Options and Files: Basic MySQL Variables (Part 1)
Unlock all the information the MySQL server can give you! MySQL
has many status variables that show how well your environment
utilizes its resources. There are many system variables that can
be set and changed to tune the server.
Read more.
Add to your personal schedule at …
I was reading a post by Dathan Vance Pattishall titled "Cassandra is my NoSQL solution but..". In the
post, Dathan explains that he uses Cassandra to store clicks
because it can write a lot faster than MySQL. However, he runs
into problems with the read speed when he needs to get a range of
data back from Cassandra. This is the number one problem I have
with NoSQL solutions.
SQL is really good at retrieving a set of data based on a key or
range of keys. Whereas NoSQL products are really good at writing
things and retrieving one item from storage. When looking at
redoing our architecture a few years ago to be more scalable, I
had to consider these two issues. For what it is worth, the NoSQL
market was not nearly as mature as it is now. So, my choices were
much more limited. In the end, we decided to stick with MySQL. It
turns out …
In the past few months, I have tested many NoSQL solutions.
Redis, MongoDB, HBase yet Cassandra is the Column Store DB I
picked because of its speed (on writes), reliability, built in
feature set that makes it multi-datacenter aware. The one other
personal reward for Cassandra is it is written in Java. I like
reading and writing in Java more than C++ although it really does
not matter for me personally in the end.
Let us talk about the reason why I am introducing Cassandra into
my infrastructure and some of its drawbacks I have noticed so
far.
Why it is being introduced:
We have a feature where we record every single click for 50
million Monthly Active Users (real-time) and storing this in
mySQL is just waste of semi-good hardware for data that is only
looked at for the past 24 hours. Over the course of some time
(couple of months) more than 3 billion rows accumulated, which
translated into a 3.5 TB distributed …
We don’t often see this option configured (default: unlimited) but it might be a good idea to set it. What it does is limit the amount of disk space the combined relay logs are allowed to take up.
A slave’s IO_Thread reads from the master and puts the events into the relay log; the slave’s SQL_Thread reads from the relay log and executes the query. If/when replication “breaks”, unless it’s connection related it tends to be during execution of a query. In that case the IO_Thread will keep running (receiving master events and storing in the relay log). Beyond some point, that doesn’t make sense.
The reason for having two separate replication threads (introduced in MySQL 4.0) is that long-running queries don’t delay receiving more data. That’s good. But receiving data is generally pretty fast, so as long as that basic issue is handled, it’s not necessary (for performance) to have the IO_Thread run ahead that far. …
[Read more]As soon as we get couple FusionIO cards, there is question how to join them in single space for database. FusionIO does not provide any mirroring/stripping solutions and totally relies on OS tools there.
So for Linux we have software RAID and LVM, I tried to followup
on my post
How many fsync / sec FusionIO can handle, and
check what overhead we can expect using additional layers over
FusionIO card.
The card I used is Fusion-io ioDrive Duo 320GB,
physically it is two cards on single board, and visible as two
cards to OS.
By some reason I was not able to setup LVM on cards, so I've finished tests only for software RAID0 and RAID1.
I used XFS filesystem mounted with "-o nobarrier" option, and I've the test I used in previous post on next configurations:
- Single …
Did that ever happen to you in production?
PLAIN TEXT CODE:
- [percona@sandbox msb_5_0_87]$ ./use
- ERROR 1040 (00000): Too many connections
Just happened to one of our customers. Want to know what we did?
For demo purposes I'll use sandbox here (so the ./use is actually executing mysql cli). Oh and mind it is not a general-purpose best-practice, but rather a break-and-enter hack when the server is flooded. So, when this happens in production, the problem is - how do you quickly regain access to mysql server to see what are all the sessions doing and how do you do that without restarting the application? Here's the trick:
PLAIN TEXT CODE:
- [percona@sandbox msb_5_0_87]$ gdb -p $(cat data/mysql_sandbox5087.pid) \
- …
I recently had a long conversation with Joe Stump, CTO of SimpleGeo, about location, geodata, and the NoSQL movement. Stump, who was formerly lead architect at Digg, had a lot to say. Highlights are posted below. You can find a transcript of the full interview here.
Competition in the geodata industry:
I personally haven't seen anybody that has come out and said, "We're actively indexing millions of points of data. We're also offering storage and we're giving tools to leverage that. I've seen a lot of fragmentation." Where SimpleGeo fits is, I really think, at the crossroads or the nexus of a lot of people that are trying to figure out this space. So …
[Read more]