Download Tungsten Enterprise v. 1.2.3 (Release date March 16, 2010) This is a maintenance release that adds the following new features plus a number of important fixes: Completely updated documentation, including expanded description on cluster concepts, management procedures, and connectivity options New features for SaaS vendors include transparent session consistency for
A little two-part quiz. If you get the first one without peeking, you're worth your pay as a DBA. If you get the second one without peeking, you may tell your boss that some random guy on the Internet says you deserve a raise.
Start with a text file, 'test.txt', with these three lines:
1
1
2
Set up the test in MySQL:
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (id int primary key);
LOAD DATA INFILE 'test.txt' INTO TABLE t1;
This gives "ERROR 1062 (23000): Duplicate entry '1' for key
'PRIMARY'
", which is expected.
What's in the table?
It depends. If the engine is MyISAM, then you'll have one row: the first '1' from the file was inserted, everything else was skipped. If the engine is InnoDB, you'll have no rows, because the transaction would rollback. So either 1 row or 0 rows.
…
[Read more]Brian wrote recently Where did all of the MySQL Developers Go?, while over in Drizzle land they have been accepted for the Google Summer of code along with many other open source projects. MySQL from my observation a noticeable absentee.
Historically, the lack of opportunity to enable community contributions and see them implemented in say under 5 years, has really hurt MySQL in recent times. There is plenty of history here so that’s not worth repeating. The current landscape of patches, forks and custom MySQL binaries for storage engine provider has provided a boom of innovation that sadly is now lost from the core MySQL product.
In Drizzle, community contribution is actively sought and a good portion of committed code is not from the core Drizzle …
[Read more]Ages ago I created a Twitter account to get some free app. I figured some nobody was following me I didn't have to feel like a guilty spammer. For some odd obsession to honesty probably I did use my proper name and sooner or later people started following me despite me having only put out a single spam message. So on very few occasions I tried out tweeting (still feels weird using that word) since then, obviously I have never used it to get a free app again by spamming. Anyways I have now decided that for small blurps about technical stuff I will from now on use Twitter, thereby sparing my Facebook friends from such gibberish. In turn my developer friends on Facebook that do not care about what I have to say about Frisbee, DJing or politics can start removing me from Facebook. Actually I might just do this myself, because its FUCKING ANNOYING that so many people multi spam their status …
[Read more]I had an interesting conversation with Sheeri yesterday. She had pointed out that today was Ada Lovelace Day, a day devoted to highlight and thank the many women in the Information Technology industry for their contributions. She suggested that if I wanted to blog about it she would find that appropriate, given what we’ve achieved here at Pythian.
First, I consider that a huge compliment. And then, a distant second, I told Sheeri – no I don’t think I’ll blog about it, that’s not my thing.
This is the IM conversation that came out of that email exchange
when Sheeri and I connected about an hour later. You may or may
not find it interesting, but ultimately I thought it was
interesting enough to share.
tl;dr: Happy Ada Lovelace Day.
expanded version:
Paul Vallee:
hey sheeri!
Sheeri K. Cabral: …
I guess that it’s time for the 3rd annual “Ravelry Runs On” roundup. The last two were in March 2008 and March 2009.
This year, our traffic increased by 50% to 5,000,000 page views and 15 million Rails requests per day. We made very few changes to our architecture in 2009 but we did add a new master database server after our working set of data outgrew our memory and IO capacity.
This summary is more detailed then the last two and I’ve broken it up into rough sections.
Physical Network
We own our own servers and colocate then in a datacenter here in Boston. The datacenter provides us with a cooled half cabinet, redundant power, and a blend of premium (Internap, Savvis) bandwidth. We do the rest.
I use …
[Read more]Here’s what Pythian is cooking up for MySQL Conference this year.
Monday, April 128:30am: Get out of bed lazy bones and head to Ballroom B
… because you’re going to want to attend Sheeri K. Cabral‘s tutorial in two parts:
MySQL Configuration Options and Files: Basic MySQL Variables (Part 1)
Unlock all the information the MySQL server can give you! MySQL
has many status variables that show how well your environment
utilizes its resources. There are many system variables that can
be set and changed to tune the server.
Read more.
Add to your personal schedule at …
I was reading a post by Dathan Vance Pattishall titled "Cassandra is my NoSQL solution but..". In the
post, Dathan explains that he uses Cassandra to store clicks
because it can write a lot faster than MySQL. However, he runs
into problems with the read speed when he needs to get a range of
data back from Cassandra. This is the number one problem I have
with NoSQL solutions.
SQL is really good at retrieving a set of data based on a key or
range of keys. Whereas NoSQL products are really good at writing
things and retrieving one item from storage. When looking at
redoing our architecture a few years ago to be more scalable, I
had to consider these two issues. For what it is worth, the NoSQL
market was not nearly as mature as it is now. So, my choices were
much more limited. In the end, we decided to stick with MySQL. It
turns out …
In the past few months, I have tested many NoSQL solutions.
Redis, MongoDB, HBase yet Cassandra is the Column Store DB I
picked because of its speed (on writes), reliability, built in
feature set that makes it multi-datacenter aware. The one other
personal reward for Cassandra is it is written in Java. I like
reading and writing in Java more than C++ although it really does
not matter for me personally in the end.
Let us talk about the reason why I am introducing Cassandra into
my infrastructure and some of its drawbacks I have noticed so
far.
Why it is being introduced:
We have a feature where we record every single click for 50
million Monthly Active Users (real-time) and storing this in
mySQL is just waste of semi-good hardware for data that is only
looked at for the past 24 hours. Over the course of some time
(couple of months) more than 3 billion rows accumulated, which
translated into a 3.5 TB distributed …
We don’t often see this option configured (default: unlimited) but it might be a good idea to set it. What it does is limit the amount of disk space the combined relay logs are allowed to take up.
A slave’s IO_Thread reads from the master and puts the events into the relay log; the slave’s SQL_Thread reads from the relay log and executes the query. If/when replication “breaks”, unless it’s connection related it tends to be during execution of a query. In that case the IO_Thread will keep running (receiving master events and storing in the relay log). Beyond some point, that doesn’t make sense.
The reason for having two separate replication threads (introduced in MySQL 4.0) is that long-running queries don’t delay receiving more data. That’s good. But receiving data is generally pretty fast, so as long as that basic issue is handled, it’s not necessary (for performance) to have the IO_Thread run ahead that far. …
[Read more]