Showing entries 11 to 20 of 43
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: data warehousing (reset)
Some scaling observations on Infobright

A couple of days ago, Baron Schwartz posted some simple load and select benchmarking of MyISAM, Infobright and MonetDB, which Vadim Tkachenko followed up with a more realistic dataset and interesting figures where MonetDB beat Infobright in most queries.

Used to the parallel IEE loader, I was surprised by the apparent slow loading speed of Baron's benchmark and decided to try and replicate it. I installed Infobright 3.2 on my laptop (see, this is very unscientific) and wrote a simple perl script to generate and load an arbitrarily large data set resembling Baron's description. I'm not going to post my exact numbers, because this installation is severely …

[Read more]
A peek under the hood in Infobright 3.2 storage engine

I've been meaning to post some real-world data on the performance of the Infobright 3.2 release which happened a few weeks ago after an extended release candidate period. We're just preparing our upgrades now, so I don't have any performance notes over significant data sets or complicated queries to post quite yet.

To make up for that, I decided to address a particular annoyance of mine in the community edition, first because it hadn't been addressed in the 3.2 release (and really, I'm hoping doing this would include it into 3.2.1), and second, simply because the engine being open source means I can. I feel being OSS is one of Infobright's biggest strengths, in addition to being a pretty amazing piece of performance for such a simple, undemanding package in general, and not making use of that would be shame. Read on …

[Read more]
Scalable Star Schema Benchmark (SSB) Join Metrics


We ran a quick scalability test of Calpont join behavior across using a Star Schema Benchmark data set at a scale factor of 1000. The Star Schema Benchmark transforms a TPC-H / DBT-3 data to a more standardized data warehouse star schema data model, and the 1000 scale factor includes 6 billion rows in the primary fact table. Information on the star schema bench (SSB) can be found at http://www.cs.umb.edu/~xuedchen/research/publications/DataWarehousePerformanceDissertationProposal.pdf .

-----------------------------------------------------------------------------------------------------------------------------------------
-- Note that these queries are run without any tuning or indices created for these joins or filters.
-- Basically, this is just 1) Create tables (without index or …

[Read more]
Free Kimball Group Data Warehousing Educational Webinar

We’re sponsoring an important webinar series along with Sun/MySQL starting this week on June 25th – The Kimball Group Data Warehousing Educational Webinar Series.  This webinar series will introduce the audience to data warehousing concepts and best practices, and will cover the history and evolution of data warehousing, provide an overview of dimensional modeling, and review the full life cycle of designing and implementing a data warehouse.  Part 1, on June 25th at 1:00P PDT, is on Data Warehousing Fundamentals.

There are two key reasons why we think this webinar series is important:

  • First, we believe this webinar further advances data warehousing in the MySQL world. There is a whole new generation of database developers in the MySQL community that are at various stages of understanding data warehousing – …
[Read more]
Free Kimball Group Data Warehousing Educational Webinar

We're sponsoring an important webinar series along with Sun/MySQL starting this week on June 25th - The Kimball Group Data Warehousing Educational Webinar Series.  This webinar series will introduce the audience to data warehousing concepts and best practices, and will cover the history and evolution of data warehousing, provide an overview of dimensional modeling, and review the full life cycle of designing and implementing a data warehouse.  Part 1, on June 25th at 1:00P PDT, is on Data Warehousing Fundamentals.

There are two key reasons why we think this webinar series is important:

  • First, we believe this webinar further advances data warehousing in the MySQL world. There is a whole new generation of database developers in the MySQL community that are at various stages of understanding data warehousing - what it …
[Read more]
451 CAOS Links 2009.06.09

Vyatta raises series C funding. Greenplum launches data cloud initiative. Fedora 11. And more.

Follow 451 CAOS Links live @caostheory

# Vyatta raised $10m in series C round, led by Citrix.

# Carlo Daffara published Horses, carriages and cars an assessment of the shifting OSS business models, and a proposal of what is the optimal model.

# Greenplum delivered version 3.3 of its analytical database, launched its Enterprise Data Cloud initiative.

# Daniel Abadi asked whether betting on the MySQL mass market for data warehousing a good idea.

# Roberto Galoppini …

[Read more]
What we're looking for in a data integration tool

As our data warehousing process grows and the workflows get more complex, we've revisited the question of what tools to use in this process. Out of curiosity, I had a look at basing such a process on Hadoop/Hive for scalability reasons, but the lack of mature tools and the sacrifices on efficiency that would entail meant we're better off using something else as long as a distributed processing platform is the only thing that can get the job done. I'm also curious about the transition to continuous integration, a model I noticed showing up a couple of years ago and now getting some air under its wings as CEP, IBM's …

[Read more]
Three domains of data

My MySQL Conference presentation on Tuesday discussed my practical findings on how Infobright's technology works in developing a MySQL-based data warehouse. I also touched on a more high-level question of how to select a technology for a different kinds of data-related problem areas, and this article expands on that discussion.

As pointed out by several other speakers at the conference, the balance of CPU, memory and storage has changed significantly in the last 10 years. Two important throughput factors on a per-thread basis have flattened out: CPU cycles per second are in fact dropping as power and cooling have become limiting factors, and the number of IO operations per device have only been increasing linearly, though Flash technologies have leaped on the latter front. However, two other factors are continuing to grow on …

[Read more]
Kickfire Launches MySQL Appliance for Data Warehousing Mass Market

The Kickfire MySQL Appliance is offically launched!

We just announced today, along with a new customer, and strategic partnerships with ten leading service companies including Percona, the MySQL performance experts.

Look for more news next week from Kickfire as we head into the MySQL conference. Kickfire will also give a keynote on the first day of the conference and will make a surprise announcement! Stay tuned …

Real Time Data Warehousing Presentation and Video

At the March Boston MySQL User Group meeting, Jacob Nikom of MIT’s Lincoln Laboratory presented “Optimizing Concurrent Storage and Retrieval Operations for Real-Time Surveillance Applications.” In the middle of the talk, Jacob said he sometimes calls what he did in this application as “real-time data warehousing”, which was so accurate I decided to give that title to this blog post.

The slides can be downloaded in PDF format (1.3 Mb) at http://www.technocation.org/files/doc/Concurrent_database_performance_02.pdf. The 54 minute video can be downloaded (644Mb) at http://technocation.org/node/693/download or streamed directly in your browser at http://technocation.org/node/693/play. …

[Read more]
Showing entries 11 to 20 of 43
« 10 Newer Entries | 10 Older Entries »