One of the most typical reasons for performance and scalability problems I encounter is simply failing to do the math. And these are typically bad one because it often leads to implementing architectures which are not up for job they are intended to solve.
Let me start with example to make it clear. Lets say you're doing some reports from your apache log files - how many distinct visitors hit the page and stuff like that. You picked full logs because they are great in flexibility - you can run any adhoc queries and drill down as much as you like. Initially traffic was small and young and with 10000 page views a day you few days of history the queries there instant which gave you a confidence this approach will work.
As the time passes and you get 1.000.000 events per day and looking to do reporting for up to the whole year worth of data you find things not working any more with response times for individual queries taking half an …
[Read more]