Planet MySQL Planet MySQL: Meta Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 9

Displaying posts with tag: PDI (reset)

Data Modeling
+2 Vote Up -0Vote Down

Dear data integration fans,

I’m a big fan of “appropriate” data modeling prior to doing any data integration work.  For a number of folks out there that means the creation of an Enterprise Data Warehouse model in classical Bill Inmon style.  Others prefer to use modern modeling techniques like Data Vault, created by Dan Linstedt.  However, the largest group data warehouse architects use a technique called dimensional modeling championed by Ralph Kimball.

Using a modeling technique is very important since it brings structure to your data warehouse.  The techniques used, when applied correctly of-course, are

  [Read more...]
Real-time streaming data aggregation
+0 Vote Up -0Vote Down

Dear Kettle users,

Most of you usually use a data integration engine to process data in a batch-oriented way.  Pentaho Data Integration (Kettle) is typically deployed to run monthly, nightly, hourly workloads.  Sometimes folks run micro-batches of work every minute or so.  However, it’s lesser known that our beloved transformation engine can also be used to stream data indefinitely (never ending) from a source to a target.  This sort of data integration is sometimes referred to as being “streaming“, “real-time“, “near real-time“, “continuous” and so on.  Typical examples of situations where you have a never-ending supply of data that needs to be processed the instance it becomes available are JMS (Java Message Service), RDBMS log sniffing, on-line fraud

  [Read more...]
Memory tuning fast paced ETL
+3 Vote Up -0Vote Down

Dear Kettle friends,

on occasion we need to support environments where not only a lot of data needs to be processed but also in frequent batches.  For example, a new data file with hundreds of thousands of rows arrives in a folder every few seconds.

In this setting we want to use clustering to use “commodity” computing resources in parallel.  In this blog post I’ll detail how the general architecture would look like and how to tune memory usage in this environment.

Clustering was first created around the end of 2006.  Back then it looked like this.

The master

This is the most important part of our cluster.  It takes care of administrating network configuration and topology.  It also keeps track of the state of dynamically added slave servers.

The master is started

  [Read more...]
Kettle vs Oracle REF CURSOR
+2 Vote Up -2Vote Down

Dear Kettle fans,

PDI-200 has been out there for a while now.  Jens created the feature request a little over 3 years ago.  I guess the main thing blocking this issue was not as much a technical problem but more of a licensing and dependency one (Oracle JDBC dependency and distribution license).

However, now that we have the User Defined Java Class step we can work around those pesky problems. That is because the Java code in there only gets compiled and executed at runtime so it’s perfectly fine to create any sort of dependency in there you like.

The following transformation reads a set of rows from a stored procedure as described on this web page.

In short, our UDJC step executes the following code:

begin ? := sp_get_stocks(?);

  [Read more...]
Pentaho Kettle Solutions Overview
+2 Vote Up -1Vote Down

Dear Kettle friends,

As mentioned in my previous blog post, copies of our new book Pentaho Kettle Solutions are finally shipping.  Roland, Jos and myself worked really hard on it and, as you can probably imagine, we were really happy when we finally got the physical version of our book in our hands.

So let’s take a look at what’s in this book, what the concept behind it was and give you an overview of the content…

The concept

Given the fact that Maria’s book, called

  [Read more...]
Book Review : Pentaho 3.2 Data Integration
+1 Vote Up -2Vote Down

Dear Kettle fans,

A few weeks ago, when I was stuck in the US after the MySQL User Conference, a new book was published by Packt Publishing.

That all by itself is something that is not too remarkable.  However, this time it’s a book about my brainchild Kettle. That makes this book very special to me. The full title is Pentaho 3.2 Data Integration : Beginner’s Guide (Amazon, Packt).  The title all by itself explains the purpose of this book: give the reader a quick-start when it comes to Pentaho Data Integration (Kettle).

The author María Carina

  [Read more...]
Slides from my MySQL UC 2010 presentation
+1 Vote Up -0Vote Down

As requested by a few fans out there, here are the slides of my presentation:

Pentaho Data Integration 4.0 and MySQL.pdf

I had a great time at the conference, met a lot of nice folks, friends, customers, partners and colleagues. After the conference I was unable to get back home like so many of you because of the Paul Simon singing Eyjafjallajökul volcano in Iceland.

So I ended up flying over to Orlando for a week of brutal PDI 4.0 RC1 hacking with the rest of the l33t super Pentaho development team.  However, after 2+ weeks from home, even a severe storm over Philadelphia couldn’t prevent me from getting home eventually.

Until next time,
Matt

Mapping to a database table
+0 Vote Up -0Vote Down

For some reason, the creation of a mapping to a database table poses a problem for certain people.

This is how it’s done in PDI 3.2.0 or later in the “Table Output” step:

Ogg video available over here

Until next time,
Matt

Kettle at the MySQL UC 2009
+0 Vote Up -0Vote Down

Hello Kettle fans,

Like Roland I got confirmation earlier this week that I could present my talk on “MySQL and Pentaho Data Integration in a cloud computing setting”, at the next MySQL user conference (http://www.mysql.com/news-and-events/users-conference/).

I’m very excited about the work we’ve done on the subject and it’s going to be great talking about it in April.

See you there!
Matt

Showing entries 1 to 9

Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.