We recently examined a customer’s system to try to speed up an ETL (Extraction, Transformation and Loading) process for a big data set into a sort of datamart or DW. What we typically do is ask customers to run the process in question, and then examine what’s happening. In this case, the (very large, powerful) database server was almost completely idle, with virtually no I/O activity or CPU usage. So we looked at the server where the ETL process was running. It was running at 25% CPU usage and was writing some files to disk, but not waiting on I/O.
What’s going on here? Where’s the bottleneck? The process is slow, and neither machine is really doing much work. Why?
Maybe you guessed the network. Nope, not the network either. There was plenty of spare network capacity.
If I told you the ETL machine was using exactly 25% of its CPU capacity, would you guess that …
[Read more]