In the previous installment of this series, I showed how coarse sampling can completely misrepresent the source data, such as the following example where the 1-second metrics from Aternity APM (blue) accurately shows that CPU Load is spiking every minute for about 15 seconds, yet when that very same data is sampled every 60-seconds (red), it doesn’t match the source behavior at all:
This is a serious problem for those who rely on their performance monitoring data to detect and analyze issues. If their data is wrong, they’ll be blind to certain issues or chase ones which don’t really exist. Let’s dive into why this happens.
Monitoring’s biggest enemy: aliasing
The reason behind discrepancies between the source behavior and the resulting monitoring data can be found in the field of signal processing. Wikipedia describes aliasing as follows:
In signal processing and related disciplines, aliasing is an effect that causes different signals to become indistinguishable (or aliases of one another) when sampled. It also refers to the distortion or artifact that results when the signal reconstructed from samples is different from the original continuous signal.
Aliasing can occur in signals sampled in time, for instance digital audio, and is referred to as temporal aliasing. Aliasing can also occur in spatially sampled signals, for instance moiré patterns in digital images. Aliasing in spatially sampled signals is called spatial aliasing.
This is a pervasive problem in any field which captures analog data digitally, and those fields follow widely accepted principles and practices to avoid or minimize these issues. A key one is the Nyquist-Shannon sampling theorem:
In the field of digital signal processing, the sampling theorem is a fundamental bridge between continuous-time signals (often called “analog signals”) and discrete-time signals (often called “digital signals”). It establishes a sufficient condition for a sample rate that permits a discrete sequence of samples to capture all the information from a continuous-time signal of finite bandwidth.
This leads us to arguably the most important concept, the Nyquist Rate:
The minimum rate at which a signal can be sampled without introducing errors, which is twice the highest frequency present in the signal.
The simple summary of all of that is if you want to accurately capture an event which occurs (for example) every 10 seconds, you need to sample at least every 5 seconds. Sampling more coarsely can introduce errors and distortions in the resulting signal, called aliasing.
Aliasing vs. sampling rate
To show the effects of aliasing I have a 60-second period sine wave sampled at various rates. The Nyqusit Rate for this signal mandates a minimum sampling rate of 30 seconds to prevent aliasing. The source signal (blue) is accurately reflected in the 30-second sampled result (red):
When the sampling rate is lowered to 45 seconds, the sampled result (red) starts to differ from the source:
At 59 seconds the sampled is significantly different:
At 60 seconds, the sampled result would lead you to believe there’s no signal at all:
At 90 seconds the sampled result completely misrepresents the source suggesting the frequency is 1/4 what it actually is:
Here’s a side-by side comparison of the same source data when sampled at 16 different rates (click image to enlarge):
Take a moment to really reflect on the previous image: The same signal is related 16 different ways, and 15 of them are totally wrong. They’re not “less accurate”—they’re completely incorrect.
If your performance monitoring data suffers from aliasing, then it may hide the existence of an issue, rule-out a true root cause, or lead you to chase a false root cause. This means wasted effort, lingering issues and negative impact to the business.
Aliasing is everywhere
As I previously stated, this is a pervasive problem in any field which captures analog data digitally. Photography, publishing, video, audio and scientific imaging are but a few fields which are fundamentally aware of the dangers of aliasing. A major tenet for those fields is fidelity—to accurately reproduce their respective content, so they take great lengths to avoid it. Here are some examples to show how misleading aliasing can be.
Aliasing example: audio
This series is titled The “Sound” of Performance Data because audio is one of the mediums where aliasing is the most noticeable. When audio is downsampled improperly, high frequencies will turn into lower ones and the audio will sound extremely distorted. It’s a great proxy for what happens to performance data when it’s improperly sampled. Listen to this example where aliasing increases as sampling rate decreases:
Song: Code Monkey by Jonathan Coulton
Aliasing example: video
Watch the following video and imagine you had to write an article about it solely based on what you saw (in other words not trying to guess as to what’s probably happening):
You’d likely write something like this:
This is an example of the wagon-wheel effect which is a common form of aliasing in video recordings where rotating objects may appear stationary, rotating slower or even in a reverse direction. In the helicopter video the frame rate was an exact divisor of the rotation speed of the blades, causing them to appear to be standing still. This is effectively the same as the earlier example where the 60-second sine wave was sampled every 60 seconds and appeared to be flat: (click icon to enlarge)
Aliasing example: photography
A common issue in digital photography are subjects with fine repeating patterns, like this striped shirt:
When this is resized to 25%, a psychedelic pattern emerges:
This type of aliasing is known as moiré and is effectively the same as the earlier example where the 60-second sine wave was sampled every 59 seconds and resulted in a totally incorrect pattern: (click icon to enlarge)
To combat this, “anti-aliasing” is typically applied when scaling images to reduce the moiré, but can result in the details being lost altogether:
The shirt now appears to be a solid color. Imagine your disappointment if you ordered this shirt online based on this image, but you received a striped shirt instead! Clearly is a serious issue for online retailers, so they go to significant lengths to ensure their imagery is faithful to the product and doesn’t mislead the consumer.
Aliasing in other fields
Some quick online searching revealed that just about every scientific field is intimately aware of the dangers of aliasing and has practices and techniques to deal with it. Weather & Radar have to deal with “Purple Haze,” Ultrasound can have flows look like they’re going in the reverse direction, MRI has to battle “wrap-around,” and Mass Spectrometry has similar challenges.
Aliasing in performance management
It surprises and saddens me that the concepts of Aliasing and Nyquist Rate are largely absent from the performance management industry. I keep running into cases where troubleshooters are blind to issues or chase phantom problems because their monitoring tools are presenting completely inaccurate (aliased) data due to insufficient sampling.
I’m spoiled… I’ve had the luxury of Aternity APM‘s 1-second granular metrics at my disposal for the past 15 years, and non-sampled transactional data for almost half of that. Being able to compare how other tools with 1-minute or even 15-second granularity relate application behaviors very differently (and sometimes extremely inaccurately) from Aternity APM’s high-definition data has given me a unique perspective as to how pervasive aliasing is in the performance management space, and trained me to recognize when data can and can’t be trusted.
In the next installment I’ll discuss how these concepts specifically apply to performance management practices, such as determining what data granularity is needed for certain use cases, and how to train yourself to spot cases where your data may be misleading you.