|Showing entries 1 to 23|
A fascinating post-mortem on high profile network failures:
This post is meant as a reference point–to illustrate that, according to a wide range of accounts, partitions occur in many real-world environments. Processes, servers, NICs, switches, local and wide area networks can all fail, and the resulting economic consequences are real. Network outages can suddenly arise in systems that are stable for months at a time, during routine upgrades, or as a result of emergency maintenance. The consequences of these outages range from increased latency and temporary unavailability to inconsistency, corruption, and data loss. Split-brain is not an academic concern: it happens to all kinds of systems–sometimes for days on end. Partitions deserve serious consideration.
I have a fairly lightly loaded MySQL server with a few tables that are updated every five minutes. Other than these updates, there are very few queries run against the database. The data is queried just a few times per month. Ever so often, one of the more complicated queries will result in the process getting hung in the "copying to tmp table" state. To be honest, the queries that get hung aren't even that complicated. Usually there's one or two joins, a
GROUP BY, and an
So far, when a process gets stuck in this state, I find that killing and restarting the process does not clear up the problem. I've changed my recovery method to killing the process, issuing a
I don't know why SELinux problems seem so frustrating. The problem almost certainly is related to the fact that there is frequently no error message. This is exactly the problem I ran into while turning up a new Apache web server on Red Hat Enterprise Linux 6 (RHEL6) with SELinux enabled.
The problem is that SELinux prevents Apache from making network connections by default. This is defined by the SELinux boolean
httpd_can_network_connect_db. In order to change this value, issue the following
I recently came across a dev VM running MySQL 5.0.77 (an old release, 28 January 2009) that didn’t have InnoDB available.
skip-innodb wasn’t set,
SHOW VARIABLES LIKE '%innodb%' looked as expected, but with one exception: the value of
I confirmed this with
(root@localhost) [(none)]> show engines; +------------+----------+----------------------------------------------------------------+ | Engine | Support | Comment | +------------+----------+----------------------------------------------------------------+ | MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance[Read more...]
CFEngine is both the oldest and the newest of the popular tools for automating site administration. Mark Burgess invented it as a free software project in 1993, and years later, as deployments in the field outgrew its original design he gave it a complete rethink and developed the powerful concept of promise theory to make it modular and maintainable. In this guise as version 3, CFEngine stands along with two other pieces of free software, Puppet and Chef, as key parts of enterprise computing. Along the way, Burgess also started a commercial venture, CFEngine AS, that maintains both the open source and proprietary versions of CFEngine.
Diego Zamboni has recently taken the position of Senior Security Advisor at CFEngine AS and is writing[Read more...]
snmptrapd to catch SNMP traps and then put them into a MySQL database using a pretty generic trap handler. This gives me the opportunity to generate such useful information as:
> SELECT HOUR(time) AS "hour", COUNT(*) AS "bounces" FROM snmptraps WHERE hostname = 'XXX' AND time > '2011-07-25' AND trap_oid = 'SNMPv2-MIB::snmpTrapOID.0 IF-MIB::linkDown' GROUP BY HOUR(time) ; +------+---------+ | hour | bounces | +------+---------+ | 0 | 729 | | 1 | 936 | | 2 | 841 | | 3 | 810 | | 4 | 547 | | 5 | 316 | | 6 | 224 | | 7 | 144 | | 8 | 481 | | 9 | 584 | | 10 | 1 | +------+---------+ 11 rows in set (0.05 sec) >
This tells us that interfaces on host 'XXX' where bouncing a lot until 10 AM and then stopped.
Clearing the Windows DNS Server cache from the command line is an easy task.
C:\> dnscmd . /clearcache
. indicates the local DNS server. You can use this command to clear the cache of remote DNS servers by replacing the
. with the hostname or IP address of the remote server.
If you're using a non-standard MySQL data directory on your Red Hat Enterprise Linux (RHEL) server, you may have seen an error like
/usr/libexec/mysqld: Can't change dir to '/mysql_data/' (Errcode: 13). The key to fixing this problem is to ensure the new MySQL data directory has the proper SELinux security context. In my case:
# chcon -R system_u:object_r:mysqld_db_t /mysql_data/
After that, mysqld should start up fine.
We ran across the following error on a MySQL slave server recent:
mysql> SHOW SLAVE STATUS \G <snip> Last_Error: Query caused different errors on master and slave. Error on master: 'Deadlock found when trying to get lock; try restarting transaction' (1213), Error on slave: 'no error' (0). Default database: '<database_name>'. Query: '<query>' <snip>
In this case, an insert failed on the master. When this happens you have three options:
In the[Read more...]
I’m really proud to announce the release of the version 1.0 of mysql-snmp.
mysql-snmp is a mix between the excellent MySQL Cacti Templates and a Net-SNMP agent. The idea is that combining the power of the MySQL Cacti Templates and any SNMP based monitoring would unleash a powerful mysql monitoring system. Of course this project favorite monitoring system is OpenNMS.
mysql-snmp is shipped with the necessary OpenNMS configuration files, but any other SNMP monitoring software can be used (provided[Read more...]
At Days of Wonder we are huge fans of MySQL (http://www.mysql.com/) (and since about a year of the various Open Query, Percona, Google or other community patches), up to the point we’re using MySQL for about everything in production.
But since we moved to 5.0, back 3 years ago our production databases which hold our website and online game systems has a unique issue: the mysqld process uses more and more RAM, up to the point where the kernel OOM decide to kill the process.
You’d certainly think we are complete morons because we didn’t do anything in the last 3 years to fix the issue[Read more...]
Peter Lieverdink (also known as cafuego on IRC/identi.ca, engineer on OurDelta builds and for Open Query) has co-authored a book that’s available since Monday. The title is Pro Linux System Administration published by Apress.
These days some people don’t want to bother with system administration, and either hire or outsource. Others want to find out more and do things themselves (home and small office use), and that’s the intended audience for this book.
Thanks to Days of Wonder the company I work for, I’m proud to release in Free Software (GPL):
At Days of Wonder, we’re using MySQL for almost everything since the beginning of the company. We were initially monitoring all our infrastructure with mon and Cricket, including our MySQL servers. Nine months ago I migrated the monitoring infrastructure to OpenNMS, and at the same we lost the Cricket MySQL monitoring (which was done with direct SQL SHOW STATUS LIKE commands).
I had to find another way, and since OpenNMS excels at SNMP, it was natural to monitor MySQL through[Read more...]
A couple of days ago, I was helping some USG admins who were facing an interesting issue. Interesting for me, but I don’t think they’d share my views on this, as their servers were melting down under the database load.
But first let me explain the issue.
The thing is that when a client checks in to get its configuration, the puppetmaster compiles its configuration to a digestible[Read more...]
Since a long time people (including me) complained that storeconfigs was a real resource hog. Unfortunately for us, this option is so cool and useful.
Storeconfigs is a puppetmasterd option that stores the nodes actual configuration to a database. It does this by comparing the result of the last compilation against what is actually in the database, resource per resource, then parameter per parameter, and so on.
The actual implementation is based on Rails’ Active Record, which is a great way to abstract the gory details of the database, and prototype code[Read more...]
We are monitoring a few things with the JDBC Stored Procedure Poller, which is really great to monitor complex business operations without writing remote or GP scripts.
Unfortunately the migration to OpenNMS 1.6.1 led me to discover that the JDBC Stored Procedure poller was not working anymore, crashing with a NullPointerException in the MySQL JDBC Driver while trying to fetch the output parameter.
In fact it turned out I was plain wrong. I was using a MySQL PROCEDURE:
DELIMITER // CREATE PROCEDURE `check_for_something`() READS SQL DATA[Read more...]
I've updated MySQL Query Profiler, which I consider the most important tool I've written. It's now included as part of the MySQL Toolkit project on Sourceforge.
I've just released MySQL Duplicate Key Checker on SourceForge. This is a complete rewrite of a tool I initially released under a slightly different name. It is now much more powerful and friendlier to use, especially for scripting, and has many more options.
MySQL Table Checksum is a tool to efficiently verify the contents of any MySQL table in any storage engine. You can use it to compare tables across many servers at once. The output is friendly and easy to use, both by eyeball and in UNIX command-line scripts. The provided MySQL Checksum Filter helps you winnow output so you only see tables that have problems.
Last week I read two books on Nagios. I found one easy to use and the other difficult.
|Showing entries 1 to 23|