Planet MySQL

Displaying posts with tag: Data Integration (reset)

Jan

2008

Posted by Matt Casters on Wed 16 Jan 2008 16:50 UTC
Tags:

Data Integration

One of the things I’ve been working on lately in Kettle / Pentaho Data Integration is the transparency of the performance monitoring.

We don’t just need an API to get the step performance data out, but we also need to visualize this data in a simple way, something like this:

The next steps will be to also allow this data to be spooled off to a database somewhere and to be accessed remotely using Carte.

Until next time,

Matt

Nov

2007

Ohloh top 10

Posted by Matt Casters on Fri 02 Nov 2007 20:31 UTC
Tags:

Open Source, Data Integration

People sometimes ask me if I still do a lot of development.

Well, Ohloh keeps track of that these days and it seems that between September and November 2007 I was the 7th most active contributor:

Ohloh tracks 90655 developers in 8985 projects including Firefox, Apache HTTP server, Subversion, MySQL, PHP, Open Office, the Linux kernel, Ubuntu and many more. As such, I’m kinda proud of that 7th spot.

If version 3 of Pentaho Data Integration has any bugs left when it launches, it won’t be because I was having a vacation

Until next time,

Matt

Oct

2007

4.3 million rows per second

Posted by Matt Casters on Fri 12 Oct 2007 14:54 UTC
Tags:

Data Integration

Earlier today I was building a test-case in which I wanted to put a lot of Unicode data into a database table. The problem is of-course that I don’t have a lot of data, just a small Excel input file.

So I made a Cartesian product with a couple of empty row generators:

It was interesting to see how fast the second join step was generating rows:

Yes, you are reading that correctly: 717 million rows processed in 165 seconds = 4.3 million rows per second.

For those of you that would love to try this on their own machine. Here is an exclusive present for the readers of this blog in the form of a 3.0.0-RC2 preview of 2007/10/12 (88MB zip file). We’ve been fixing bugs like crazy so it’s pretty stable for us, but it’s still a few weeks until we release RC2. Don’t do anything crazy with this drop! This is purely a …

[Read more]

Oct

2007

Pentaho reference case

Posted by Matt Casters on Thu 04 Oct 2007 15:22 UTC
Tags:

Databases, Data Integration, metadata

Thought I’d mention that a new case study featuring Pentaho and Kettle showed up over at the Database Trends and Applications. The name of the paper is called “Loma Linda University Health Care Deploys Pentaho BI” (PDF).

To quote :

With commercial products you don’t know if you are getting what you want, but with open source you can create proofs-of-concept. And the TCO is so much lower.

Until next time!

Matt

Oct

2007

Kettle 3 RC1

Posted by Matt Casters on Tue 02 Oct 2007 19:26 UTC
Tags:

Data Integration

Dear Kettle fans,

Again, we leave a very busy period behind us (to start another :-)) with this announcement of this first release candidate for version 3.0.0.

Here is a link to the binary zip file and here is the source code.

What has changed since version 3.0.0-M2?

A new debugger (see also my blog entry on the subject)
Remote execution of jobs. (see also this wiki page)
Toolbar New Job/Trans change
Faster variable insertion through CTRL-SPACE
JavaScript enhancements for 3.0: (see also …

[Read more]

Sep

2007

Help OpenMRS!!!

Posted by Matt Casters on Thu 27 Sep 2007 18:10 UTC
Tags:

Data Integration

My friend and colleague Julian Hyde of Mondrian fame just blogged about this: help out the OpenMRS project , please!

The folks behind the OpenMRS are helping to improve the health-care systems in developing countries. More in particular, they are fighting AIDS with this software. OpenMRS has certainly shown to be up to the task at hand: it is currently tracking the medical conditions of over a million people in 12 countries.

Because of the exponential growth of users, this project is in urgent need of BI manpower. Julian and myself have both agreed to help out with strategical advice for the BI part of OpenMRS.

If you want to be part of the team, if you know a bit about the …

[Read more]

Sep

2007

Back to basics

Posted by Matt Casters on Fri 07 Sep 2007 15:49 UTC
Tags:

Databases, Data Integration

A few days ago someone made the comment that Pentaho Data Integration (Kettle) was a bit too hard to use. The person on the chat was someone that tried to load a text file into a database table and he was having a hard time doing just that.

So let’s go back to basics in this blog post and load a delimited text file into a MySQL table.

If you want to see how it’s done, click on this link to watch a real-time (non-edited) flash movie. It’s 11MB to download and is about 2-3 minutes long.

Until next time!

Matt

Sep

2007

Kettle 3 Milestone 2 is available

Posted by Matt Casters on Wed 05 Sep 2007 23:53 UTC
Tags:

Data Integration

UPDATE: for all you people that missed the news and come here directly, we have an RC1 now too.

Dear Kettle fans,

After a long period of bug-squashing and other frantic coding activities, we are happy to give you Kettle’s second milestone of version 3.0.0. (77MB zip file)

What has changed since M1?

New icons!! This is the first release to include a new set of icons and as such a fresh new look.
A new Mondrian Input step to read from Pentaho Analyses using MDX.
A new Regular Expression evaluation step
Access Input (don’t ask!)
Fixed / improved repository support
Improved database dialect handling (SQL Server .. problem and forcing identifiers to lower/uppercase)

[Read more]

Sep

2007

Making the case for Kettle

Posted by Matt Casters on Tue 04 Sep 2007 20:39 UTC
Tags:

Databases, Data Integration

Dear data integration fans,

Once in a while, there are discussions on various blogs (usually with me smack in the middle of it) debating the differences between code generation and model based execution, how this impacts the way we approach databases, the open nature of it all, etc.

With this blog entry I want to push the notion that Pentaho Data Integration (Kettle) didn’t just evolve by chance into the state it is today as a streaming, metadata driven, model based engine. I made some careful design choices early on…

Open as possible

The goal of Kettle from the beginning was to be as open as possible. My definition of “as open as possible” included:

open source with an LGPL license (see this JBoss link [PDF] for a nice explanation)

[Read more]

Aug

2007

Digging Mondrian

Posted by Matt Casters on Mon 27 Aug 2007 20:51 UTC
Tags:

Data Integration

On Friday I committed code to 3.0 trunk to allow people to execute an MDX query on a Mondrian server and get the result back in a tabular format. This particular code to “flatten” an OLAP cube into rows was written by Julian Hyde, the lead developer and founder of Mondrian OLAP a.k.a. Pentaho analyses.

If you run the Pentaho demo on your box and then look at the Analyses sample, you could see something like this:

Suppose you wanted to get this exact data to work with, create analytical data, exports, … Well, now you have the option of doing it in Kettle:

What you do is create a database connection to the database that Mondrian reads from, hand it the location of the Mondrian schema (catalog). You then click on the MDX button in JPivot and copy the MDX query to the “Mondrian Input” step in Kettle. That’s all it takes.

You then can preview …

[Read more]

Top Authors

Oracle MySQL Blogs

Team Blogs

Vendor Blogs

Search

MySQL Links