Planet MySQL Planet MySQL: Meta Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 10 of 25 10 Older Entries

Displaying posts with tag: pentaho data integration (reset)

Proposals for Codebits.EU
+2 Vote Up -0Vote Down

Codebits is an annual 3-day conference about software and, well, code. It's organized by SAPO and this year's edition is to be held on November 10 thru 12 at the Pavilhão Atlântico, Sala Tejo in Lisbon, Portugal.

I've never attended SAPO Codebits before, but I heard good things about it from Datacharmer Giuseppe Maxia. The interesting thing about the way …

  [Read more...]
Real-time streaming data aggregation
+0 Vote Up -0Vote Down

Dear Kettle users,

Most of you usually use a data integration engine to process data in a batch-oriented way.  Pentaho Data Integration (Kettle) is typically deployed to run monthly, nightly, hourly workloads.  Sometimes folks run micro-batches of work every minute or so.  However, it’s lesser known that our beloved transformation engine can also be used to stream data indefinitely (never ending) from a source to a target.  This sort of data integration is sometimes referred to as being “streaming“, “real-time“, “near real-time“, “continuous” and so on.  Typical examples of situations where you have a …

  [Read more...]
Memory tuning fast paced ETL
+3 Vote Up -0Vote Down

Dear Kettle friends,

on occasion we need to support environments where not only a lot of data needs to be processed but also in frequent batches.  For example, a new data file with hundreds of thousands of rows arrives in a folder every few seconds.

In this setting we want to use clustering to use “commodity” computing resources in parallel.  In this blog post I’ll detail how the general architecture would look like and how to tune memory usage in this environment.

Clustering was first created around the end of 2006.  Back then it looked like this.

  [Read more...]
Dynamic de-normalization of attributes stored in key-value pair tables
+0 Vote Up -0Vote Down

Dear Kettlers,

A couple of years ago I wrote a post about key/value tables and how they can ruin the day of any honest person that wants to create BI solutions.  The obvious advice I gave back then was to not use those tables in the first place if you’re serious about a BI solution.  And if you have to, do some denormalization.

However, there are occasions where you need to query a source system and get some report going on them.  Let’s take a look at an example :

mysql> select * from person;
+----+-------+----------+
| id | name  | lastname |
+----+-------+----------+
|  1 | Lex   | Luthor …
  [Read more...]
Data Cleaner 2
+2 Vote Up -0Vote Down

Dear Kettle friends,

Some time ago while I visited the nice folks from Human Inference in Arnhem, I ran into Kasper Sørensen, the lead developer of DataCleaner.

DataCleaner is an open source data quality tool released (like Kettle) under the LGPL license.  It is essentially to blame for the lack of a profiling tool inside of Kettle.  That is because having DataCleaner available to our users was enough to push the …

  [Read more...]
Reading from MongoDB
+1 Vote Up -2Vote Down

Hi Folks,

Now that we’re blogging again I thought I might as well continue to do so.

Today we’re reading data from MongoDB with Pentaho Data Integration.  We haven’t had a lot of requests for MongoDB support so there is no step to read from it yet.  However, it is surprisingly simple to do with the “User Defined Java Class” step.

For the following sample to work you need to be on a recent 4.2.0-M1 build.  Get it from here.

Then download …

  [Read more...]
Parse nasty XLS with dynamic ETL
+1 Vote Up -0Vote Down

Dear Kettle friends,

Last year, right after the summer in version 4.1 of Pentaho Data Integration, we introduced the notion of dynamically inserted ETL metadata (Youtube video here).  Since then we received a lot of positive feedback on this functionality which encouraged me to extend it to a few more steps. Already with support for “CSV Input” and “Select Values” we could do a lot of dynamic things.  However, we can clearly do a lot better by extending our initiative to a few more steps: “Microsoft Excel Input” (which can also read ODS by the way), “Row Normalizer” and “Row …

  [Read more...]
Back to blogging....
+7 Vote Up -0Vote Down

It has been a while since I posted on my blog - in fact, I believe this is the first time ever that more than one month passed between posts since I started blogging. There are a couple of reasons for the lag:

  [Read more...]
Book Review : Pentaho 3.2 Data Integration
+1 Vote Up -2Vote Down

Dear Kettle fans,

A few weeks ago, when I was stuck in the US after the MySQL User Conference, a new book was published by Packt Publishing.

That all by itself is something that is not too remarkable.  However, this time it’s a book about my brainchild Kettle. That makes this book very special to me. The full title is Pentaho 3.2 Data Integration : Beginner’s Guide (Amazon, Packt). …

  [Read more...]
Slides from my MySQL UC 2010 presentation
+1 Vote Up -0Vote Down

As requested by a few fans out there, here are the slides of my presentation:

Pentaho Data Integration 4.0 and MySQL.pdf

I had a great time at the conference, met a lot of nice folks, friends, customers, partners and colleagues. After the conference I was unable to get back home like so many of you because of the Paul Simon singing Eyjafjallajökul

  [Read more...]
Showing entries 1 to 10 of 25 10 Older Entries

Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.