If you read my blog from time to time, you know I am a huge
advocate of automating tasks. If I have to do something more then
once I might was well automate the task.
Currently at my new gig, we are using opennms. OpenNMS is a combo
of Nagios, Ganglia talking over SNMP Java goodness. They recently
released a REST API that is less then adequately documented
making integration rather difficult. But, I am jumping ahead
let's talk about the problem.
Backing up an INNODB Slave is rather easy with Percona's
xtrabackup program. It's free, works, offers better features than
its paid counter part. Additionally its opensource.
In my environment my algorithm is as follows:
Mount A Huge Disk over NFS which gets snapshotted
At the Start of the day (00:00:00 UTC) do a full backup.
The command used: innobackupex --user=root --password='***'
$backupDir --parallel=8 --slave-info
At every top of the hour do an incremental backup if a full
backup does not exist
The problem is for me to do an incremental backup I use the
following command
my $INNOBACKUP="innobackupex --user=root --password='***' $incrementalDir --parallel=8 --incremental --slave-info --safe-slave-backup --incremental-basedir=";
$INNOBACKUP .= $lastIncremental;
--incremental says do an incremental backup
--slave-info says dump the slave info
--safe-slave-backup says stop the slave and start it back when
the backup finishes *THE PROBLEM*
--incremental-basedir is the last successful incremental
directory
The problem is I would get alerted every-time incremental runs
forcing me to acknowledge the alert.
This is annoying so let's fix this with automation. (if the slave
is off it's okay I am backing up the DR)
So the new algorithm is:
Mount
Set downtime over the OPENNMS Rest API
Backup
Remove downtime over the opennms rest API
Unmount
Setting downtime for OpenNMS is hard since the documentation is
not helpful. Good news I was able to find some online code to see
how things worked. OpenNMS is an opensource product with code
viewable from fisheye. Anytime you want to figure out how an API
works, look at the test code of said API or read the API twists
and turns; I did just that by reading this.
http://fisheye.opennms.org/browse/opennms/opennms-webapp/src/test/java/org/opennms/web/rest/ScheduledOutagesRestServiceTest.java?hb=true
But nothing I found told me how to authenticate to use the REST
API. Thus searching around and looking at the code shipped with
Opennms I found a perl script called provision.pl in
/opt/opennms/bin-it uses COOKIES!
Good thing I am good with PERL, thus I went ahead to try to do
some simple fetches until I realize that getting LWP, HTTP and
various other perl modules including ISBN::Data was near
impossible. Making an RPM for each one is just a huge waste of
time and forcing CPAN installs across N boxes sucks not to
mention its just plain WRONG.
So I looked at my options; PYTHON is loaded by centos by default
and has built in httplibs like urllib2, http, cookie etc.
Perfect.
Below is the script
#!/usr/bin/python
#
# @author Dathan Vance Pattishall
# OpenNMS Schedule downtime script
#
import urllib, urllib2, cookielib, pprint, os, sys
import elementtree.ElementTree as ET
import time
from datetime import date, tzinfo, timedelta, datetime
from optparse import OptionParser
usage = "usage: %prog [options]"
parser = OptionParser(usage=usage)
parser.add_option("-v", "--verbose", action="store_true", dest="verbose", default=False, help="make lots of noise [default]")
parser.add_option("-t", "--downtime", dest="downtime", help="length of downtime")
parser.add_option("-c", "--contains", dest="contains", help="Schedule downtime for all nodes that contain this string")
parser.add_option("-l", "--like", dest="like", help="get servers with a wild card e.g. shard%-dr")
parser.add_option("-d", "--delete", action="store_true", dest="delete", default=False, help="delete and outage")
parser.add_option("-p", "--package", dest="package", default="SFO", help="which nms package")
(options, args) = parser.parse_args()
base_url = 'https://enteropennmshosthere/opennms/'
auth_url = base_url + 'j_spring_security_check'
nodes_url = base_url + 'rest/nodes/'
sched_outage_url = base_url + 'rest/sched-outages/'
username = 'outagerole'
password = 'add_pass_here'
#
# get the hostname
#
hostname = os.environ['HOSTNAME']
host_abbr = os.environ['HOSTNAME'].split('.')[0]
if not host_abbr :
print("Environment is not setup correctly")
sys.exit(1)
#
# set up the cookie jar
#
cj = cookielib.CookieJar()
#
# build opener returns an OpenerDirector: http://docs.python.org/library/urllib2.html?highlight=urllib2#urllib2.OpenerDirector0
#
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
#
# now the cookie will work with urlopen as a global for all following requests
#
urllib2.install_opener(opener)
#
# log-in and set the cookie
#
login_data = urllib.urlencode({'j_username' : username, 'j_password' : password, 'Login': 'Login'})
opener.open(auth_url, login_data)
#
# get the nodes
#
if options.contains :
url_data = { 'label' : options.contains, 'limit' : 0, 'comparator' : 'contains' }
url_data = urllib.urlencode(url_data)
resp = opener.open(nodes_url + '?' + url_data)
outagename = options.contains + "-contiains-script-outage"
elif options.like :
url_data = { 'label' : options.like + '%', 'limit' : 0, 'comparator' : 'ilike' }
url_data = urllib.urlencode(url_data)
resp = opener.open(nodes_url + '?' + url_data)
outagename = options.like.replace('%', 'WC') + "-ilike-script-outage"
else :
#
# get the hostname info to see if the host is in opennms
#
url_data = { 'label' : hostname, 'limit' : 0 }
url_data = urllib.urlencode(url_data)
resp = opener.open(nodes_url + '?' + url_data)
outagename = host_abbr + "-script-outage"
#
# tree is an Element
# http://docs.python.org/library/xml.etree.elementtree.html?highlight=elementtree#element-objects
#
tree = ET.XML(resp.read())
#
# build a label name to id map
#
name_to_id_map = {}
for node in tree.getiterator('node') :
if node.get('label') :
name_to_id_map[node.get('label')] = node.get('id')
if not name_to_id_map and options.contains:
print("This HOST [%s] is not in nms" % options.contains)
sys.exit(1)
#
# delete an outage - really outage name is all that is needed
#
if options.delete :
try :
print("Deleting Outage: %s" % outagename)
req3 = urllib2.Request(sched_outage_url + outagename)
req3.add_header('Content-Type', 'application/xml')
req3.add_header('Content-Length', '0')
req3.get_method = lambda: 'DELETE'
r = urllib2.urlopen(req3)
except urllib2.HTTPError :
print("This Outage: %s was already deleted" % outagename)
sys.exit(0)
#
# schedule downtime
#
print("Scheduling downtime for 1 hour Outage Name %s" % outagename)
start = datetime.today()
downtime = 1 #1 hour
if options.downtime :
downtime = int(options.downtime) # units of hours
end = start + timedelta(hours=downtime)
end = end.strftime('%d-%b-%Y %H:%M:%S')
start = start.strftime('%d-%b-%Y %H:%M:%S')
print("Start of the downtime: %s" % start)
print("End of the downtime: %s" % end)
#
# build the request
#
req = urllib2.Request(sched_outage_url)
req.add_header('Content-Type', 'application/xml')
xml_str = "" + "<outage name=" + outagename + " type="specific">" + "<time begins=" + str(start) + " ends=" + str(end) + ">"
for nodename in name_to_id_map :
xml_str += "<node id=" + name_to_id_map[nodename] + ">"
xml_str += "</node></time></outage>"
#
# send the XML
#
req.add_data(xml_str)
r = urllib2.urlopen(req)
#
# tell notifd to attach to the downtime
#
req2 = urllib2.Request(sched_outage_url + outagename + '/notifd')
req2.add_header('Content-Type', 'application/xml')
req2.add_header('Content-Length', '0')
req2.get_method = lambda: 'PUT'
r = urllib2.urlopen(req2)
#
# tell pollerd to attach to the downtime
#
req3 = urllib2.Request(sched_outage_url + outagename +'/pollerd/' + options.package)
req3.add_header('Content-Type', 'application/xml')
req3.add_header('Content-Length', '0')
req3.get_method = lambda: 'PUT'
r = urllib2.urlopen(req3)
sys.exit(0)
This is a quick and dirty script which I will eventually turn
into a class to control openNMS from the command line. In summary
it logs in with the specified user. Gets a cookie. Issues
commands pulled from reading Java code (good thing I can code
good in java as well). My main problem was searching to find that
there are unpublished filters like comparator => contains and
finding that the post structure for schedule-outages was not key
value param but XML!!
public void testSetOutage() throws Exception {
String url = "/sched-outages";
String outage = "<?xml version=\"1.0\"?>" +
"<outage name='test-outage' type='specific'>" +
"<time day='friday' begins='13:20:00' ends='15:30:00' />" +
"<time begins='17-Feb-2012 19:20:00' ends='18-Feb-2012 22:30:00' />" +
"<node id='11' />" +
"</outage>";
sendPost(url, outage);
}
In summary everything works well. Backups are working and I am happy to not send pages. Eventually when I get around to it I'll upload this script to git.