A few weeks ago, I wrote a blog post explaining how sketch sampling methods can be employed to achieve fast 30-day data visibility for monitoring users. The problem we faced was that with that standard of retention, we’ve frequently seen systems that involve nearly a million query samples in a 30-day window, meaning that special solutions are needed in order to avoid overloading users’ browsers.
The solution we’ve found lies in a hash ordering that's proven to be both surprisingly simple and efficient. In this Part 2 post, I’ll look at why it works so well.
Ordering with a Hash
VividCortex’s query samples are stored in MySQL tables. The simplified schema definition looks like this: