Showing entries 11 to 14
« 10 Newer Entries
Displaying posts with tag: BI (reset)
Loading a dimension table with SCD1 and SCD2 attributes

Jos, my co-author for the "Building Pentaho Solutions" book just pointed me to a recent article by Jeff Prenevost entitled "The Problem with History".AbstractJeff's topic, loading a hybrid Type 1 / Type 2 slowly changing dimension table is related to data warehousing but maybe of interest outside of that context as well.

As it turns out, the particular problem described by Jeff is non-trivial, but can be solved quite elegantly in a single SQL statment. This may be a compelling alternative to the multi-step, …

[Read more]
Maturity of Open Source ETL / Why do people fail to do their homework?

I just read this post on Matt Casters' blog. Here, Matt describes why Element 61's Jan Claes is dead wrong in the way he assesses the maturity of open source ETL tools.

Well, I've just read Jan Claes' article in the "research and insights" area of the Element61 website, and frankly, it is pretty easy to see how unsubstantiated it is. Some may be tempted to classify the article as …

[Read more]
An Infobright Review

With open source software I can install reasonably complete software and try it with my data. This way I get to see how it works in a realistic setting without having to rely on benchmarks and hoping they are a good match for my environment. And I get to do this without having to deal with commercial software sales people.

So I glad to hear the Infobright had gone open source as I have been wanting test a column based database for a while. I was even happier that it was a MySQL based engine as I would already know many of the commands. I decided to run some of the same tests I had run when comparing InnoDB and MyISAM for reporting (http://dbscience.blogspot.com/2008/08/innodb-suitability-for-reporting.html ).  InnoDB performed better than MyISAM in my reporting tests so I’m going to compare Infobright to InnoDB.

The …

[Read more]
InnoDB's Suitability for Reporting

I started using Oracle, a MVCC database, to develop reporting (data warehousing, BI, take your pick) systems years ago.  I’ve come to appreciate the scalability improvements that MVCC provides, particularly for pseudo real-time reporting applications, the ones where loads are occurring at the same time as report generation.  So when people say InnoDB, partly due to MVCC, isn’t as good as MyISAM for reporting I had to look into this in more detail.

What I found is InnoDB is a good engine for reporting.  In some ways, such as performance, it is at times better than MyISAM, and one of the downsides, such as a larger disk requirement, can be mitigated.  The trick is to for the primary key to be the one predominant access path.  In this example, the InnoDB clustered index, is purchaseDate and another column, such as orderId is added to make it unique.  This has a number of advantages.  In my experience, …

[Read more]
Showing entries 11 to 14
« 10 Newer Entries