For large number of online applications once you implemented proper sharding you can consider your scaling problems solved - by getting more and more hardware you can grow. As I recently wrote it however does not mean it is the most optimal way by itself to do things.
The "classical" sharding involves partitioning by user_id,site_id or somethat similar. This allows to spread data more or less evenly across the boxes and use any number of boxes. However this may be not the most optimal approach by itself because not all data belonging to same user is equal.
Consider Blog or Forum as example - most likely few last posts will get majority of hits while things written year ago are accessed with much less frequency. You can often level off this significantly for reads by using caching (if things are accessed frequently …
[Read more]