We’ve been doing some tests with medium sized data sets
lately. We extracted around half a year of data (514M rows)
from a warehouse where we’re doing a database partitioning and
clustering test.
Below is an example where we copy +500M rows from one database to
another one that is partitioned. (MS SQL Server to MySQL
5.1). This is done using the following
transformation. In stead of just using one partitioned
writer, we used 3 to speed up the process. (lowers
latency).
Copying 500M rows is just as easy as copying a thousand, it just takes a little longer…
It would have completed the task a lot faster if we wouldn’t have been copying to a single table on DB4 at the same time. (yep, again 500M rows) This slowed down the transformation to the maximum speed of DB4. That being said, if you still had any doubt about Pentaho Data Integration being able to copy large volumes of data, …
[Read more]