Showing entries 61 to 70 of 87
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: Data Integration (reset)
Step performance graphs

One of the things I’ve been working on lately in Kettle / Pentaho Data Integration is the transparency of the performance monitoring.

We don’t just need an API to get the step performance data out, but we also need to visualize this data in a simple way, something like this:

The next steps will be to also allow this data to be spooled off to a database somewhere and to be accessed remotely using Carte.

Until next time,


Ohloh top 10

People sometimes ask me if I still do a lot of development.

Well, Ohloh keeps track of that these days and it seems that between September and November 2007 I was the 7th most active contributor:

Ohloh tracks 90655 developers in 8985 projects including Firefox, Apache HTTP server, Subversion, MySQL, PHP, Open Office, the Linux kernel, Ubuntu and many more.  As such, I’m kinda proud of that 7th spot.

If version 3 of Pentaho Data Integration has any bugs left when it launches, it won’t be because I was having a vacation

Until next time,


4.3 million rows per second

Earlier today I was building a test-case in which I wanted to put a lot of Unicode data into a database table. The problem is of-course that I don’t have a lot of data, just a small Excel input file.

So I made a Cartesian product with a couple of empty row generators:

It was interesting to see how fast the second join step was generating rows:

Yes, you are reading that correctly: 717 million rows processed in 165 seconds = 4.3 million rows per second.

For those of you that would love to try this on their own machine. Here is an exclusive present for the readers of this blog in the form of a 3.0.0-RC2 preview of 2007/10/12 (88MB zip file). We’ve been fixing bugs like crazy so it’s pretty stable for us, but it’s still a few weeks until we release RC2. Don’t do anything crazy with this drop! This is purely a …

[Read more]
Pentaho reference case

Thought I’d mention that a new case study featuring Pentaho and Kettle showed up over at the Database Trends and Applications. The name of the paper is called “Loma Linda University Health Care Deploys Pentaho BI” (PDF).

To quote :

With commercial products you don’t know if you are getting what you want, but with open source you can create proofs-of-concept. And the TCO is so much lower.

Until next time!


Kettle 3 RC1

Dear Kettle fans,

Again, we leave a very busy period behind us (to start another :-)) with this announcement of this first release candidate for version 3.0.0.

Here is a link to the binary zip file and here is the source code.

What has changed since version 3.0.0-M2?

  • A new debugger (see also my blog entry on the subject)
  • Remote execution of jobs. (see also this wiki page)
  • Toolbar New Job/Trans change
  • Faster variable insertion through CTRL-SPACE
  • JavaScript enhancements for 3.0: (see also …
[Read more]
Help OpenMRS!!!

My friend and colleague Julian Hyde of Mondrian fame just blogged about this: help out the OpenMRS project , please!

The folks behind the OpenMRS are helping to improve the health-care systems in developing countries. More in particular, they are fighting AIDS with this software. OpenMRS has certainly shown to be up to the task at hand: it is currently tracking the medical conditions of over a million people in 12 countries.

Because of the exponential growth of users, this project is in urgent need of BI manpower. Julian and myself have both agreed to help out with strategical advice for the BI part of OpenMRS.

If you want to be part of the team, if you know a bit about the …

[Read more]
Back to basics

A few days ago someone made the comment that Pentaho Data Integration (Kettle) was a bit too hard to use. The person on the chat was someone that tried to load a text file into a database table and he was having a hard time doing just that.

So let’s go back to basics in this blog post and load a delimited text file into a MySQL table.

If you want to see how it’s done, click on this link to watch a real-time (non-edited) flash movie. It’s 11MB to download and is about 2-3 minutes long.

Until next time!


Kettle 3 Milestone 2 is available

UPDATE: for all you people that missed the news and come here directly, we have an RC1 now too.

Dear Kettle fans,

After a long period of bug-squashing and other frantic coding activities, we are happy to give you Kettle’s second milestone of version 3.0.0. (77MB zip file)

What has changed since M1?

  • New icons!! This is the first release to include a new set of icons and as such a fresh new look.
  • A new Mondrian Input step to read from Pentaho Analyses using MDX.
  • A new Regular Expression evaluation step
  • Access Input (don’t ask!)
  • Fixed / improved repository support
  • Improved database dialect handling (SQL Server .. problem and forcing identifiers to lower/uppercase)
[Read more]
Making the case for Kettle

Dear data integration fans,

Once in a while, there are discussions on various blogs (usually with me smack in the middle of it) debating the differences between code generation and model based execution, how this impacts the way we approach databases, the open nature of it all, etc.

With this blog entry I want to push the notion that Pentaho Data Integration (Kettle) didn’t just evolve by chance into the state it is today as a streaming, metadata driven, model based engine. I made some careful design choices early on…

Open as possible

The goal of Kettle from the beginning was to be as open as possible. My definition of “as open as possible” included:

[Read more]
Digging Mondrian

On Friday I committed code to 3.0 trunk to allow people to execute an MDX query on a Mondrian server and get the result back in a tabular format. This particular code to “flatten” an OLAP cube into rows was written by Julian Hyde, the lead developer and founder of Mondrian OLAP a.k.a. Pentaho analyses.

If you run the Pentaho demo on your box and then look at the Analyses sample, you could see something like this:

Suppose you wanted to get this exact data to work with, create analytical data, exports, … Well, now you have the option of doing it in Kettle:

What you do is create a database connection to the database that Mondrian reads from, hand it the location of the Mondrian schema (catalog). You then click on the MDX button in JPivot and copy the MDX query to the “Mondrian Input” step in Kettle. That’s all it takes.

You then can preview …

[Read more]
Showing entries 61 to 70 of 87
« 10 Newer Entries | 10 Older Entries »