This is the first of two articles on sampling queries effectively. The second part is here.
Sampling is hard. This is the title of a talk I gave at a meetup in Boston a few weeks back. But what’s so hard about sampling anyway?
To begin with, let’s clarify what I mean by sampling. It’s a bit ambiguous because sampling could apply to a few different things one does with time series data. In this context, I’ll be talking about capturing individual events from a large, diverse set of events (queries).
Here’s a picture of a simple stream of events over time.
Notice that they are not all the same–some of them are higher or lower than others. This is a simple illustration of some variability in the stream.
The way VividCortex generates query insight is by computing metrics about the …
[Read more]