Last time I talked about point queries. The conclusion was that big databases and point queries don’t mix. It’s ok to do them from time to time, but it’s not how you’re going to use your database, unless you have a lot of time. Today, I’d like to talk about range queries, which seem much more useful for the analysis of big databases, say in a business intelligence setting.
Recall that the focus is on the storage engine (a la MySQL) level, and a database on a single disk—the one we are using for illustration is the 1TB Hitachi Deskstar 7K1000. It has a disk seek time 14ms and transfer rate of around 69MB/s [See tomshardware.com] Now imagine filling the disk with random pairs, each 8 bytes. So that’s 62.5 billion pairs.
Range Queries
Suppose the above data is stored in a B-tree, and that you’d like to iterate over all the data in order by key. Further suppose that the …
[Read more]