Dear Kettle fan,
Since our code is open, we have to be honest: in the past, the
performance of Kettle was less than stellar in the “Text File”
department. It’s true that we did offer some workarounds with
respect to database loading, but there are cases when people
don’t want to touch any database at all. Let’s take a closer look
at that specific problem…
Reading and writing text files…
Let’s take a look at this delimited (CSV) file (28MB). Unzipped, the
file is around 89MB in size.
Suppose you read this file using version 2.5.1 (soon to be out) with a single “Text File Input” step. On my machine, that process consumes most of the available CPU power it can take and takes around 57 seconds to complete. (1M rows/minute or 60M rows/hour)
When we analyze what’s eating the CPU …
[Read more]