Showing entries 1 to 4
Displaying posts with tag: visualizations (reset)
Memory Leak (and Growth) Flame Graphs

Memory Leak

Memory Flame Graph    

Your application memory usage is steadily growing, and you are racing against time to fix it. This could either be memory growth due to a misconfig, or a memory leak due to a software bug. For some applications, performance can begin to degrade as garbage collection works harder, consuming CPU. If an application grows too large, performance can drop off a cliff due to paging (swapping), or the application may be killed by the system (OOM killer). You want to take a quick look before either occurs, in case it’s an easy fix. But how?

Debugging growth issues involves checking the application config and memory usage, either from application or system tools. Memory leaks are much …

[Read more]
What the Mean Really Means

When analyzing response time, or latency, you need much more information than an average provides. The average, commonly the arithmetic mean, shows the index of central tendency. But, as I found in earlier posts, the tendency is often not central, but may be skewed by outliers, or split by multiple modes. How often these factors occur was determined quantitatively, using tests and a survey of hundreds of production servers and different types of latency: over 95% had six-sigma outliers, and at least 20% had multiple modes. While these numerical results are useful, nothing beats a visualization, such as a histogram, …

[Read more]
Modes and Modality

It is a truth universally acknowledged that the average is the index of central tendency. But what if the tendency isn’t central?

I’ve worked many performance issues where the latency or response time was multimodal, and higher-latency modes turned out to be the cause of the problem. Their existence isn’t shown by the average – the arithmetic mean; it could only be seen by examining the distribution as a histogram, density plot, heat map, or frequency trail. Once you know that more than one mode is present, it’s often straightforward to determine what causes the slower mode, by seeing what parameters of …

[Read more]
Detecting Outliers

In computer performance, we’re especially concerned about latency outliers: very slow database queries, application requests, disk I/O, etc. The term “outlier” is subjective: there is no rigid mathematical definition. From [Grubbs 69]:

An outlying observation, or “outlier,” is one that appears to deviate markedly from other members of the sample in which it occurs.

Outliers are commonly detected by comparing the maximum value in a data set to a custom threshold, such as 50 or 100 ms for disk I/O. This requires the metric to be well understood beforehand, as is usually the case for application latency and other key metrics. However, we are also often faced with a large number of unfamiliar metrics, where we don’t know the thresholds in advance.

There are a number of …

[Read more]
Showing entries 1 to 4