Back in December, I did a detailed analysis for getting data into
Vertica from MySQL using Tungsten Replicator, all within the
Kodiak MemCloud.
I got some good numbers towards the end – 1.9 million rows/minute
into Vertica. I did this using a standard replicator deployment,
plus some tweaks to the Vertica environment. In particular:
- Integer hash for a partition for both the staging and base
tables
- Some tweaks to the queries to ensure that we used the
partitions in the most efficient manner
- Optimized the batching within the applier to hit the right
numbers for the transaction counts
That last one is a bit of a cheat because in a real-world
situation it’s much harder to be able to identify those
transaction sizes and row counts, but for testing, we’re trying
to get the best performance!
Next what I wanted to do was set up some bare metal and AWS
servers that were of an …
[Read more]