Showing entries 1 to 9
Displaying posts with tag: dw (reset)
Data Warehousing Best Practices: Comparing Oracle to MySQL pt 2

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

See part 1 for the introduction and talking about power and hardware. This part will go over the 2nd “P”, partitioning. Learning about Oracle’s partitioning has gotten me more interested in how MySQL’s partitioning works, and I do hope that MySQL partitioning will develop to the level that Oracle partitioning does, because Oracle’s partitioning looks very nice (then again, that’s why it costs so much I guess).

Partition – …

[Read more]
Data Warehousing Best Practices: Comparing Oracle to MySQL pt 1

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

These are my notes from the session, which include comparisons of how Oracle works (which Maria gave) and how MySQL works (which I researched to figure out the difference, which is why this blog post took a month after the conference to write). Note that I am not an expert on data warehousing in either Oracle or MySQL, so these are more concepts to think about than hard-and-fast advice. In some places, I still have questions, and I am happy to have folks comment and contribute what they know.

One interesting point brought up:
Maria quoted someone (she said the name but I did not grab it) from …

[Read more]
Intro to OLAP

This is the first of a series of posts about business intelligence tools, particularly OLAP (or online analytical processing) tools using MySQL and other free open source software. OLAP tools are a part of the larger topic of business intelligence, a topic that has not had a lot of coverage on MPB. Because of this, I am going to start out talking about these topics in general, rather than getting right to gritty details of their performance.

I plan on covering the following topics:

  1. Introduction to OLAP and business intelligence. (this post)
  2. Identifying the differences between a data warehouse, and a data mart.
  3. Introduction to MDX queries and the kind of SQL which a ROLAP tool must generate to answer those queries.
  4. Performance challenges …
[Read more]
Log Buffer #182, a Carnival of the Vanities for DBAs

This is the 182nd edition of Log Buffer, the weekly review of database blogs. Make sure to read the whole edition so you do not miss where to submit your SQL limerick!

This week started out with me posting about International Women’s Day, and has me personally attending Confoo (Montreal) which is an excellent conference I hope to return to next year. I learned a lot from confoo, especially the blending nosql and sql session I attended.

This week was also the Hotsos Symposium. …

[Read more]
Air traffic queries in MyISAM and Tokutek (TokuDB)

This is next post in series
Analyzing air traffic performance with InfoBright and MonetDB
Air traffic queries in LucidDB
Air traffic queries in InfiniDB: early alpha

Let me explain the reason of choosing these engines. After initial three posts I am often asked "What is baseline ? Can we compare results with standard MySQL engines ?". So there come MyISAM to consider it as base point to see how column-oriented-analytic engines are better here.

However, take into account, that for MyISAM we need to choose proper indexes to execute queries …

[Read more]
Air traffic queries in InfiniDB: early alpha

As Calpont announced availability of InfiniDB I surely couldn't miss a chance to compare it with previously tested databases in the same environment.
See my previous posts on this topic:
Analyzing air traffic performance with InfoBright and MonetDB
Air traffic queries in LucidDB

I could not run all queries against InfiniDB and I met some hiccups during my experiment, so it was less plain experience than with other databases.

So let's go by the same steps:

Load data

InfiniDB supports MySQL's LOAD DATA statement and it's own colxml / cpimport utilities. As …

[Read more]
Scalable Star Schema Benchmark (SSB) Join Metrics


We ran a quick scalability test of Calpont join behavior across using a Star Schema Benchmark data set at a scale factor of 1000. The Star Schema Benchmark transforms a TPC-H / DBT-3 data to a more standardized data warehouse star schema data model, and the 1000 scale factor includes 6 billion rows in the primary fact table. Information on the star schema bench (SSB) can be found at http://www.cs.umb.edu/~xuedchen/research/publications/DataWarehousePerformanceDissertationProposal.pdf .

-----------------------------------------------------------------------------------------------------------------------------------------
-- Note that these queries are run without any tuning or indices created for these joins or filters.
-- Basically, this is just 1) Create tables (without index or …

[Read more]
Reporting redefined - How the Kickfire MySQL appliance simplifies data marts and analytics for the mass market.

The Kickfire appliance is designed for business intelligence and analytical workloads, as opposed to OLTP (online transaction processing) environments.  Most of the focus in the MySQL area right now revolves around increasing performance for OLTP type workloads, which makes sense as this is the traditional workload that MySQL has been used for.  In contrast,  Kickfire focuses squarely on analytic environments, delivering high performance execution of analytical and reporting queries .

A MySQL server with fast processors, fast disks (or ssd) and lot of memory will answer many OLTP queries easily.  Kickfire will outperform such a server for typical analytical queries such as aggregation over a large number of rows.

A typical OLTP query might ask “What is the shipping address for this invoice?”.  Contrast this with a typical analytical query, which asks “How much of this item did we sell in all of …

[Read more]
Real Time Data Warehousing Presentation and Video

At the March Boston MySQL User Group meeting, Jacob Nikom of MIT’s Lincoln Laboratory presented “Optimizing Concurrent Storage and Retrieval Operations for Real-Time Surveillance Applications.” In the middle of the talk, Jacob said he sometimes calls what he did in this application as “real-time data warehousing”, which was so accurate I decided to give that title to this blog post.

The slides can be downloaded in PDF format (1.3 Mb) at http://www.technocation.org/files/doc/Concurrent_database_performance_02.pdf. The 54 minute video can be downloaded (644Mb) at http://technocation.org/node/693/download or streamed directly in your browser at http://technocation.org/node/693/play. …

[Read more]
Showing entries 1 to 9