Showing entries 171 to 180 of 279
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: monitoring (reset)
On the threshold

When you setup a monitoring system for SQL Server, you often use thresholds to determine when an instance is healthy. You might say that you want to be alerted when CPU use is over 90% or when there’s only 10% of disk space left. The trouble with these thresholds is that they will often throw off false positives, or send you an alert when really nothing is wrong. Simple thresholds often have to be tuned to the individual instance, since a server with 10 TB still has 1 TB of space left at 90% disk use.

Baron Schwartz blogged about this issue in an article and he’s been creating software that monitors MySQL beyond simple thresholds, after stating that they do not work in most cases. He makes a good …

[Read more]
Adaptive Fault Detection food fight

I was a guest on the Food Fight Show last week, along with a bevy of really smart people asking and answering tough questions on fault detection. We didn’t talk a lot about MySQL, but given that VividCortex is focusing on MySQL initially, pretty much all of my experience with zero-threshold, zero-configuration fault detection is MySQL-based.

It’s a fun conversation with a lot of insights into the industry, what’s wrong with current monitoring tools, and where monitoring is going. Also, it’s sold out now, but Monitorama is a conference you might be interested in if you’re doing monitoring (and who isn’t?)

MySQL Cluster: Troubleshooting Error 157 / 4009 Cluster Failure


Suddenly your application starts throwing "error 157" and performance degrades or is non-existing. It is easy to panic then and try all sorts of actions to get past the problem. We have seen several users doing:

  • rolling restart
  • stop cluster / start cluster

because they also see this in the error logs:

120828 13:15:11 [Warning] NDB: Could not acquire global schema lock (4009)Cluster Failure


That is not a really a nice error message. To begin with, it is a WARNING when something is obviously wrong. IMHO, it should be CRITICAL. Secondly, the message ‘Cluster Failure’ is misleading.  The cluster may not really have failed, so there is no point trying to restart it before we know more.


So what does error 157 mean and what can we do about it?


By using perror we can get a hint what it means:


$ …

[Read more]
Manage your MySQL & MariaDB databases - the simple way

New quick-start guide for MySQL DBAs: SkySQL™ Enterprise Monitor makes managing your MySQL & MariaDB databases that much easier

We’ve just published a new Quick Start Guide to SkySQL™ Enterprise Monitor for all MySQL & MariaDB DBAs out there, who are looking for ways to manage their databases more easily.

read more

Devops in Munich

Devopsdays Mountainview sold out in a short 3 hours .. but there's other events that will breath devops this summer.
DrupalCon in Munich will be one of them ..

Some of you might have noticed that I`m cochairing the devops track for DrupalCon Munich,
The CFP is open till the 11th of this month and we are still actively looking for speakers.

We're trying to bridge the gap between drupal developers and the people that put their code to production, at scale.
But also enhancing the knowledge of infrastructure components Drupal developers depend on.

We're looking for talks both on culture (both success stories and failure) , automation,
specifically looking for people talking about drupal deployments , eg using tools like Capistrano, Chef, Puppet,
We want to hear where Continuous Integration fits in your deployment , do you do Continuous Delivery of a drupal environment.
And how do you …

[Read more]
how to determine the runtime and start time of a Linux process

Yesterday, I needed to determine the runtime of a Linux process for a monitoring script.

Cos the format for start_time of the ps command may change if the process was not started in the same year, I decided to take the neccessary informations from the /proc/<PID>/stat file.

In this file the process start time since boot is defined at the twenty-second field, expressed in Jiffies – the scale unit of the system timer. (One Jiffie is one tick of the system timer).

To convert Jiffies to seconds I just have to divide the number of Jiffies by the frequency (hertz) of the system timer, which is defined in the Linux Kernel header file include/asm-generic/param.h. The frequency may differ between Linux kernel versions and hardware platform! On my Linux systems the frequency is 100 HZ.

In a shell script the following line will …

[Read more]
MySQL replication monitoring on Ubuntu 10.04 with Nagios and NRPE

If you're using MySQL replication, then you're probably counting on it for some fairly important need. Monitoring via something like Nagios is generally considered a best practice. This article assumes you've already got your Nagios server setup and your intention is to add a Ubuntu 10.04 NRPE client. This article also assumes the Ubuntu 10.04 NRPE client is your MySQL replication master, not the slave. The OS of the slave does not matter.

Getting the Nagios NRPE client setup on Ubuntu 10.04

At first it wasn't clear what packages would be appropriate packages to install. I was initially misled by the naming of the nrpe package, but I found the correct packages to be:

sudo apt-get install nagios-nrpe-server nagios-plugins

The NRPE configuration is stored in /etc/nagios/nrpe.cfg, while the plugins are installed in /usr/lib/nagios/plugins/ (or lib64). The installation of this package …

[Read more]
Monitoring your monitoring tools (MONyog inside) !

Regardless of the monitoring tool you use to monitor your databases, it can be better to monitor this tool.
No, it’s not a joke ! Do you think you can have a benefit with a monitoring tool not connected to your servers ? ( without being alerted )

I choose to talk about MONyog here but this can apply to all existing monitoring tools.
I just want to share the message, the tool does not matter, so, do it !

So, let me explain how to control if you have fresh data with MONyog.
With MONyog it’s easy because it’s an agentless monitoring tool.

There are two ways to check that :

Per server general info :

 

For each server, you can …

[Read more]
Performance monitoring with nmon

In this tutorial I will describe, how to use nmon (Nigel’s performance Monitor) to monitor performance data in the interactive mode or in the capture mode.

nmon can display / capture the following performance data

  • CPU utilization
  • Memory use
  • Kernel statistics and run queue
  • Disks I/O rates, transfers, and read/write ratios
  • File systems size and free space
  • Disk adapters
  • Network I/O rates, transfers, and read/write ratios
  • Paging space and paging rates
  • Machine details, CPU and OS specification
  • Top processors
  • User defined disk groups
  • Asynchronous I/O – AIX only
  • Workload Manager – AIX only
  • ESS and other disk subsystem – AIX only
  • Dynamic LPAR changes …
[Read more]
My MySQL SNMP Agent

Back in February I wrote an article titled A Small Fix For mysql-agent. Since then we did a few more fixes to the agent and included a Bytes Behind Master (or BBM) chart. For those who can't wait to get their hands on the code, here's the current version: MySQL SNMP agent RPM. For those who'd like to learn about it's capabilities and issues, keep reading.

What to Expect From this Version
The article I quoted above pretty much describes the main differences with the original project, but we went further with the changes while still relying on Masterzen's code for the data collection piece.

The first big change is that we transformed Masterzen's code into a Perl module, …

[Read more]
Showing entries 171 to 180 of 279
« 10 Newer Entries | 10 Older Entries »