Showing entries 1 to 9
Displaying posts with tag: san (reset)
Luxbet, MariaDB and Melbourne Cup

Yesterday was Melbourne Cup day in Australia – the biggest annual horse race event in the country, and in the state of Victoria it’s even a public holiday.

Open Query does work for Luxbet (part of Tabcorp), and Melbourne Cup day is by far their biggest day of the year in terms of traffic. It’s not just a big spike, there’s orders of magnitude difference so you can really say that the rest of the year is downright quiet (in relative terms). So, a very interesting load pattern.

Since last year Luxbet has upgraded from stock MySQL to MariaDB, and with our input made some other infrastructure modifications including moving to a pure solid state storage (FusionIO) solution as a SAN just won’t deliver the resilience and performance required. This may seem odd, but remember that a) …

[Read more]
Storage caching options in Linux 3.9 kernel

dm-cache is (albeit still classified “experimental”) is in the just released Linux 3.9 kernel. It deals with generic block devices and uses the device mapper framework. While there have been a few other similar tools flying around, since this one has been adopted into the kernel it looks like this will be the one that you’ll be seeing the most in to the future. It saves sysadmins the hassle of compiling extra stuff for a system.

A typical use is for an SSD to cache a HDD. Similar to a battery backed RAID controller, the objective is to insulate the application from latency caused by the mechanical device, the most laggy part of which is seek time (measured in milliseconds). Giventhe  relatively high storage capacity of an SSD (in the hundreds of GBs), this allows you to mostly disregard the mechanical latency for writes and that’s very useful for database systems such as MariaDB.

That covers writes (for the moment), but …

[Read more]
HDlatency – now with quick option

I’ve done a minor update to the hdlatency tool (get it from Launchpad), it now has a –quick option to have it only do its tests with 16KB blocks rather than a whole range of sizes. This is much quicker, and 16KB is the InnoDB page size so it’s the most relevant for MySQL/MariaDB deployments.

However, I didn’t just remove the other stuff, because it can be very helpful in tracking down problems and putting misconceptions to rest. On SANs (and local RAID of course) you have things like block sizes and stripe sizes, and opinions on what might be faster. Interestingly, the real world doesn’t always agree with the opinions.

We Mark Callaghan correctly pointed out when I first published it, hdlatency does not provide anything new in terms of functionality, the db IO tests of sysbench cover it all. A key advantage of hdlatency is that it doesn’t have any …

[Read more]
RE: A bit on SANs and system dependencies

This is a reply on A bit on SANs and system dependencies by Eric Bergen.

Lets first start by making a difference between entry level, midrange and high-end SAN's.

Entry level:
This is basically a bunch of disks with a network connection. The Oracle/Sun/StorageTek 2540 is an example for this category. This storage is aimed at lowcost shared storage.

Midrange:
This kind of storage generally features replication and more disks than entry level. HP EVA is what comes to mind for this category. This storage is aimed at a good price/performance.

High-End:
This is mainframe style storage which is made usable for open systems. Hitachi Data Systems VSP is a good example of this category. This category is for extreme performance and reliability. …

[Read more]
Why you can’t rely on a replica for disaster recovery

A couple of weeks ago one of my colleagues and I worked on a data corruption case that reminded me that sometimes people make unsafe assumptions without knowing it. This one involved SAN snapshotting that was unsafe.

In a nutshell, the client used SAN block-level replication to maintain a standby/failover MySQL system, and there was a failover that didn't work; both the primary and fallback machine had identically corrupted data files. After running fsck on the replica, the InnoDB data files were entirely deleted.

When we arrived on the scene, there was a data directory with an 800+ GB data file, which we determined had been restored from a SAN snapshot. Accessing this file caused a number of errors, including warnings about accessing data outside of the partition boundaries. We were eventually able to coax the filesystem into truncating the data file back to a size that didn't contain invalid pointers and could be read without …

[Read more]
Data Warehousing Best Practices: Comparing Oracle to MySQL pt 1

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

These are my notes from the session, which include comparisons of how Oracle works (which Maria gave) and how MySQL works (which I researched to figure out the difference, which is why this blog post took a month after the conference to write). Note that I am not an expert on data warehousing in either Oracle or MySQL, so these are more concepts to think about than hard-and-fast advice. In some places, I still have questions, and I am happy to have folks comment and contribute what they know.

One interesting point brought up:
Maria quoted someone (she said the name but I did not grab it) from …

[Read more]
Quest for Resilience: Multi-DC Masters

This is a Request for Input. Dual MySQL masters with MMM in a single datacentre are in common use, and other setups like DRBD and of course VM/SAN based failover solutions are conceptually straightforward also. Thus, achieving various forms of resilience within a single data-centre is doable and not costly.

Doing the same across multiple (let’s for simplicity sake limit it to two) datacentres is another matter. MySQL replication works well across longer links, and it can use MySQL’s in-built SSL or tools like stunnel. Of course it needs to be kept an eye on, as usual, but since it’s asynchronous the latency between the datacentres is not a big issue (apart from the fact that the second server gets up-to-date a little bit later).

But as those who have tried will know, having a client (application server) connection to a MySQL instance in a remote data-centre is a whole other matter, latency becomes a big issue and is generally …

[Read more]
On Oracle (and MySQL), Enterprise, Suitability and Sense

50 things to know before migrating Oracle to MySQL by Baron Schwartz is an interesting read, it points out clearly that MySQL is not Oracle. However, Oracle is not the benchmark by which all others are to be judged. So what do we compare with, or actually, why do we compare at all?

Hmm, so we take three steps back, and get a much better view... Marten Mickos (MySQL CEO from 2001 until the Sun acquisition in 2008) said it all along "MySQL does not compete with Oracle". I don't think people actually appreciated what he was saying, or even believed that he meant precisely what he said. They might have thought "oh that's just positioning and protesting too much to make the opposite point". But he wasn't, it was the clear plain truth and it still is today (and so it should remain, I think).

[Read more]
Can I have your horror-stories, please? (SANs and VMs)

Please make it descriptive, graphic, and if anything burnt or exploded I'd love to have pictures.
Include an approximate timeline of when things happened and when it was all working again (if ever).
Thanks!

This somewhat relates to the earlier post A SAN is a single point-of-failure, too. Somehow people get into scenarios where highly virtualised environments with SANs get things like replication and everything, but it all runs on the same hardware and SAN backend. So if this admittedly very nice hardware fails (and it will!), the degree of "we're stuffed" is particularly high. The reliance in terms of business processes is possibly a key factor there, rather than purely technical issues.

Anyway, if you have good stories of (distributed?) SAN and VM infra failure, please step up and tell all. It'll help prevent similar issues for …

[Read more]
Showing entries 1 to 9