Showing entries 31 to 40 of 42
« 10 Newer Entries | 2 Older Entries »
Displaying posts with tag: data warehousing (reset)
A New Hardware-Based Approach to Data Warehousing

My name is Ravi Krishnamurthy - I am the Chief Software Architect here at Kickfire. I’ll be blogging about our thoughts on database technologies for data warehousing. More specifically I’ll be talking about current challenges, directions going forward, and the simplifications for wider market deployments and other ideas.

Data Warehouse (DW) queries are known to be more complex, more demanding, and longer running than OLTP queries. Some of the distinctive features of these DW queries that produce these characteristics are:

1) Table scan: Most OLTP queries are point queries updating or inserting a few transactional data. Most DW queries on the other hand are reporting or business intelligence (BI) queries which typically touch large numbers of rows of data, often computed by sequential table scans over the large data sets.

2) Many/complex joins: Multiple tables with many joins in the …

[Read more]
MySQL and Kickfire Break Records (Again)

Following on from the announcement at the MySQL conference where Sun and Kickfire jointly announced data warehousing benchmark records, we have just announced new TPC-H benchmark records. Specifically, the Kickfire Database Appliance 2400 is the highest price/performance offering at 300GB, again breaking the $1 barrier for the first time coming in at 89 cents per QphH (Queries per hour on the TPC-H benchmark). The 2400 is also the highest performance (non-clustered) offering at 300GB.

I’m not going to further dwell on the numbers in this post other than to quickly point out another aspect of this achievement that Justin noted in his blog related to the energy savings the Kickfire …

[Read more]
Tools to generate large synthetic data sets for testing?

I need to generate large (1TB-3TB) synthetic MySQL datasets for testing, with a number of requirements:

a) custom output formatting (SQL, CSV, fixed-len row, etc)
b) referential integrity support (ie, child tables should reference PK values, no orphans,etc)
c) able to generate multiple tables in parallel
d) preferably able to operate without a GUI and/or manual intervention
e) uses a well defined templating construct for data generation
f) preferably open source

Does anyone out there know of a product that meets at least most of these requirements?

I found a PHP based data generation script ( that is extensible in its output formatting, so it should do everything I need it to do.

My First Data Warehouse

I finally finished my first data warehouse! and it only took me 3 days!

Well, to be fair, the data warehouse design was already planned and it wasn't really that big anyway, but I am still happy about it.

I was asked on Monday to do a data warehouse for my company's head quarters in Germany. I work in Beijing, so its like.... very slow to connect to there. They gave me the database design, some SQL statements to generate a few dimensions and "rough" business rules for the data.

Now, I haven't done anything like this before, but I really wanted to try. So I did it my way.

My way is to use a lot of Views with long SQL statements instead of cursors or stored procedures. I like it this way, because I feel like I can see the data and catch problems instead of programming blindly to …

[Read more]
Kickfire Launch

Today, we officially launched Kickfire. As part of our announcement we published, together with Sun Microsystems, record-breaking TPC-H benchmark numbers (data warehousing industry benchmarks) as well as a series of significant partnerships in the Open Source world.

There has been a lot of work here over the last two years to get us to this point and I am very proud of the team for getting us to where we are today. Two years ago we just had a vision; today that vision became reality – one substantiated by independent industry benchmarks.

For those of you unfamiliar with these benchmarks let me give you a brief overview to explain why we …

[Read more]
Kickfire: relational algebra in a chip

I spent the day Thursday with some of Kickfire’s engineers at their headquarters. In this article, I’d like to go over a little of the system’s architecture and some other details.

Everything in quotation marks in this article is a quote. (I don’t use quotes when I’m glossing over a technical point — at least, not in this article.)

Even though I saw one of Kickfire’s engineers running queries on the system, they didn’t let me actually take the keyboard and type into it myself. So everything I’m writing here is still second-hand knowledge. It’s an unreleased product that’s in very rapid development, so this is understandable.

Kickfire’s TPC-H benchmarks are now published, so you can see the results of what I’ve been seeing them work on. They …

[Read more]
Kickfire: stream-processing SQL queries

Some of you have noticed Kickfire, a new sponsor at this year’s MySQL Conference and Expo. Like Keith Murphy, I have been involved with them for a while now. This article explains the basics of how their technology is different from the current state of the art in complex queries on large amounts of data.

Kickfire is developing a MySQL appliance that combines a pluggable storage engine (for MySQL 5.1) with a new kind of chip. On the surface, the storage engine is not that revolutionary: it is a column-store engine with data compression and some other techniques to reduce disk I/O, which is kind of par for the course in data warehousing today. The chip is the really exciting part of the technology.

The simplest description of their chip is that it …

[Read more]
MySQL Archiver can now archive each row to a different table

One of the enhancements I added to MySQL Archiver in the recent release was listed innocently in the changelog as "Destination plugins can now rewrite the INSERT statement." Not very exciting or informative, huh? Keep reading.

High Performance MySQL, Second Edition: Backup and Recovery

Progress on High Performance MySQL, Second Edition is coming along nicely. You have probably noticed the lack of epic multi-part articles on this blog lately -- that's because I'm spending most of my spare time on the book. At this point, we have significant work done on some of the hardest chapters, like Schema Optimization and Query Optimization. I've been deep in the guts of those hard optimization chapters for a while now, so I decided to venture into lighter territory: Backup and Recovery, which is one of the few chapters we planned to "revise and expand" from the first edition, rather than completely writing from scratch. I'd love to hear your thoughts and wishes -- click through to the full article for more details on the chapter and how it's shaping up.

Archive strategies for OLTP servers, Part 3

In the first two articles in this series, I discussed archiving basics, relationships and dependencies, and specific archiving techniques for online transaction processing (OLTP) database servers. This article covers how to move the data from the OLTP source to the archive destination, what the archive destination might look like, and how to un-archive data. If you can un-archive easily and reliably, a whole new world of possibilities opens up.

Showing entries 31 to 40 of 42
« 10 Newer Entries | 2 Older Entries »