This is a followup to my previous post Apache Spark with Air ontime performance data.
To recap an interesting point in that post: when using 48 cores with the server, the result was worse than with 12 cores. I wanted to understand the reason is was true, so I started digging. My primary suspicion was that Java (I never trust Java) was not good dealing with 100GB of memory.
There are few links pointing to the potential issues with a huge HEAP:
http://stackoverflow.com/questions/214362/java-very-large-heap-sizes
…