Let’s have a quick look at some clustering examples in the new 3.0 engine:
This example runs all steps in the transformation in a
clustered mode. That means that there are 4 slave
transformations that run in parallel with each other.
The interesting part is that first of all the “Fixed Input”
step is running in parallel, each copy reading a certain part of
a file.
The second thing to mention about it is that we now allow you
to run multiple copies of a single step on a cluster. In this
example, we run 3 copies of a step per slave transformation. In
total there are 12 copies of the sort step in action in
parallel.
IMPORTANT: this is a test-transformation, a real world sorting exercise would also include a “Sorted Merge” step to keep the data sorted. I was too lazy to redo the screenshots though
The clustered transformations now also support logging to …
[Read more]