Being a MySQL DBA is more than just being able to understand the
explain command to optimize a query. The
best optimized queries in the world are worthless with the
database is down.
Some of the big responsibilities with being a MySQL DBA include
monitoring, backup and recovery. This is a high level overview of
those topics.
Monitoring:
A single monitor is just not good enough. If you are running a
Nagios server
with the MySQL plugins that is a great start.
This should not be your only solution. The first thing that
should be added is another remote Nagios server to
monitor the 1st master Nagios monitor. After all if the monitor
goes down how do you know?
When running a revenue generating MySQL Server it is easy to
suggest using the MySQL Enterprise Monitor. I have used the
MySQL Enterprise Monitor as an additional resource for
monitoring. It helps you spot trends as well as offer suggestions for
tuning.
Monitor your logs. In order to be able to recover, you need to
understand what happened. Monitoring log is more than just the
error log. You need to watch the slow_query_log, error_log as
well as OS logs. Knowing how to gather everything from network
stats to memory or raid status is key.
Monitor your reports. Put in place events || etls || checks to
validate reporting numbers. Every application is different so
this is a custom solution per application but be able to get
valid notifications when key reporting numbers fall under
thresholds. This could be an early warning sign to other
issues.
Monitor your backup logs and sizes. If you fail to watch the
sizes of your backups as well as the logged output then I wish
you the best of luck. Something like an upgrade could change your
backup procedure. While you consider it to still be running just
fine, maybe they error out now? A simple cron job that executes a
script to record file sizes and checks for errors can put the
output into a the database. Then another script can be used for
reporting on that file size output. You will then have all the
trends of file sizes to make positive thresholds.
Take out the human factor. If all the monitoring
notifications go to one person, then that is a single point of
failure itself. Make sure fine tune notifications to remove false
positives and alert more than one person. Text messages to smart
phones are not enough. People are allowed to sleep and it is very
easy to sleep through a txt message. I enjoy the old school pager
because it simply works. If you can take advantage of asterisks then use
it. A phone call wakes people up more than just a txt
message.
This quick overview highlights that monitoring is more than just
a single entity. It is a vast process that should never be taken
for granted.
Backups:
MySQL backups are the process that everyone knows is a good idea.
The problem is how many people actually do it effectively.
Meaning that they work, tested, and do not harm a live revenue
generating site.
mysqldump is an effective tool for
backups when it is not a live system. Stopping a slave and doing
a mysqldump then restarting a slave is a very safe method for
your system. The problem is that to many applications or entities
either take to long to do a dump and rarely tested when it is
done.
mysqldbexport is another option over mysqldump
because it allows different output options.
Replication is not a backup solution. It is a step in the right
direction for higher availability and you can use replication for
rolling historical data with time delayed replication but you still need to
have the data as a backup to be safe. If using MySQL dump or the
enterprise options, yes you can use them off a slave easier just
replication is not a backup solution.
Take advantage of the mysqlbinlog and use it to backup your binlogs. The binlogs are the life
of your MySQL server, take care of them.
The best option in my opinion is MySQL Enterprise Backup. Nothing against Percona XtraBackup but since I already encourage
the use of enterprise monitor I typically would support and
stay with enterprise backup since I have access to it.
Regardless of your choice, the point is simple, take advantage of
on-line backups. What happens when you have a serious replication
issue and your slaves are useless? Do you really want to do a
mysqldump off the master? They also offer incremental and
compression to help with your backup needs.
Recovery:
This is often the piece of the process that tested in theory but
less in reality. It is often not realistic for some companies to
be able to restore their backups often. The hardware required to
hold the database is not cheap and can be used for other things
than just backup restores. You do need to test your backups
though. Maybe it is just once a quarter and you can afford to
test a slave with a restored backup. Maybe you have a
virtual environment and you can test a backup into a VM just to
see if it works? Execute some reporting on this database to
verify numbers. Take advantage of the different database checksum options available in the
community.
Whatever it is something is better than nothing.
This again is just a overview of the concepts about monitoring,
backup and recovery. Please use the links provided to find out
more information.
More community and MySQL resources and options:
- MySQL Enterprise Backup
- MySQL Enterprise Monitor
- Ronald's MySQL Monitoring 101
- Ronald's MySQL Monitoring 101 Graph results
- Logwatch
- Backing up binary log files with mysqlbinlog