One of the things I don't see much discussion on in distributed system research is the advantage that using batching and streaming can have on scalability.
Batching
Lets cover batching first. Say you have 1000 objects you need to fetch from the database. Now lets say this time is instantaneous on the database side (which is seldom the case). If you fetch all 1000 items one at a time this will end up killing your performance. Each operation will take about 1-2 ms which isn't very long for an individual fetch but it all adds up. If this was a page load on behalf of an HTTP client it would load in the 1-2 second range with is pathetic.
If you could somehow batch these up into one operation you'd see a 1000x performance boost. Not bad. This isn't a theoretical situation btw. Memcached has a getMulti method for just this reason. Unfortunately, there is no putMulti …
[Read more]