I'm working on alternative strategies to make the use and
maintenance of a multi-terabyte data warehouse implementation
tolerably fast. For example, it's clear that a reporting query on
a 275-million row table is not going to be fun by anyone's
definition, but that for most purposes, it can be pre-processed
to various aggregated tables of significantly smaller sizes.
However, what is not obvious is what would be the best strategy
for creating those tables. I'm working with MySQL 5.0 and
Business Objects' Data Integrator XI, so I have a couple of
options.
I can just CREATE TABLE ... SELECT ... to see how things work
out. This approach is simple to try, but essentially
unmaintanable; no good.
I can define the process as a BODI data flow. This is good in
many respects, as it creates a documented flow of how the
aggregates are updated, is fairly easy to hook up to the
workflows which pull in new data from source systems, and …
[Read more]