|Showing entries 1 to 7|
As I reported via Twitter late last week, we encountered an issue that got some of our mail delivery delayed by about a day and a half. I’ll explain more about what happened as I believe in openness on these matters, and also the experience has educational content for others.
Our mail server doesn’t have direct external interaction, it’s shielded by two relays that handle both the inbound MX and the outbound queue. This setup works remarkably well in terms of exposure to spam and other malicious activity. As previously discussed, it appears that it’s more difficult to make mail server infra more resilient without expending lots more time/effort and infrastructure expenditure. Just because of the way the common tools for mail delivery and imap are built, having two or more of each in a semi-active setup gets quite complex. Complexity is in itself a risk so it has to[Read more...]
I’ll be presenting a free one-hour webinar on preventing downtime in production MySQL servers, in conjunction with the ODTUG. It is scheduled on Thursday, November 10, 2011 3:00 PM – 4:00 PM EST, and you can register for free.
Here’s an abstract of what you’ll learn:
Everyone wants to prevent database downtime by being proactive, but how effective are the common measures such as inspecting logs and analyzing SQL? To be truly proactive, one must prevent problems, which requires studying and understanding the reasons for downtime. We have analyzed a selection of emergency issues that we have solved, to better understand what types of problems really occur in production environments. The results are somewhat surprising, and will be detailed in this talk. Most incidents[Read more...]
I’ll be presenting at Oracle Open World on the causes of downtime in MySQL, and how to prevent it. This is a research-based session that presents an easy-to-digest post-mortem of hundreds of emergency issues filed by Percona customers. The real causes and types of downtime surprised me quite a bit, and the preventions run counter to a lot of conventional wisdom. I’ll just give a preview by saying that you should consider it a top priority to monitor how full your disks are! On the other hand, despite the fact that every monitoring tool in existence shows the binary log cache hit rate, not a single emergency in Percona history has ever been attributed to that.
The agenda at OOW is mind-bogglingly huge (see Dave[Read more...]
Disclaimer: the information in this post is the author’s personal opinion and is not the opinion or policy of his employer.
It was spring 2010 when we decided that even though Softlayer‘s server provisioning system is really great and it takes only a few hours to get a new server when we need it, it is still too long sometimes. We wanted to be able to scale up when needed and do it faster. It was especially critical because we were working hard on bringing up Facebook integration to our site and that project could have dramatically changed our application servers cloud capacity requirements.
What buzzword comes to your mind when we talk about scaling up really fast, sometimes within minutes, not hours or days? Exactly – cloud computing! So, after some initial testing and playing around[Read more...]
|Showing entries 1 to 7|