Planet MySQL

Displaying posts with tag: Data Integration (Kettle) (reset)

Jun

2011

Posted by Nicholas Goodman on Thu 30 Jun 2011 06:07 UTC
Tags:

Open Source, Pentaho, Data Integration (Kettle)

By far, the most popular way for PDI users to load data into LucidDB is to use the PDI Streaming Loader. The streaming loader is a native PDI step that:

Enables high performance loading, directly over the network without the need for intermediate IO and shipping of data files.
Lets users choose more interesting (from a DW perspective) loading type into tables. In particular, in addition to simple INSERTs it allows for MERGE (aka UPSERT) and also UPDATE. All done, in the same, bulk loader.
Enables the metadata for the load to be managed, scheduled, and run in PDI.

However, we’ve had some known issues. In fact, until PDI 4.2 GA and LucidDB 0.9.4 GA it’s pretty problematic unless you run through the process of patching LucidDB outlined on this page: …

[Read more]

Jan

2010

Encrypt PDI passwords

Posted by Nicholas Goodman on Fri 29 Jan 2010 20:33 UTC
Tags:

Pentaho, Data Integration (Kettle)

PDI has a basic obfuscation method for making it difficult for casual people to lift passwords for DB connections. I have customers that maintain different versions of a “shared.xml” file that maintain different physical connections to databases (think development, QA/testing, and production).

In order to generate the different shared.xml, a user has to usually (per Matt Casters comment below there is a utility that allows user to do this outside of Spoon) open up PDI, created the connections, save them, and then sometimes copy and paste the sections needed to create their “dev” version of shared.xml or their “production” version of shared.xml. Many times this just to generate the password, as they can hand edit the other pieces (hostname, schema, etc).

I just committed a quick little PDI …

[Read more]

Feb

2009

Self Service Data Export using Pentaho

Posted by Nicholas Goodman on Mon 09 Feb 2009 20:17 UTC
Tags:

Pentaho, Data Integration (Kettle)

Every BI installation has power users that just want “data dumps.” They may need the dumps for a variety of reasons:

You’ve built crappy reports. They can’t get the information they need in *YOUR* reports.
They need to feed the data into another system. They want to select all customers who bought product X in time period Y to send them a recall notice. Need a dump of email / addresses to send them the notice.
They are addicted to Excel; they feel like a super hero whizzing through the data making fancy graphs and doing a few of their own ratios/calculations.
They want to munge the numbers. They will export it to Excel, throw out the data that makes them look bad, and then present it to their boss with shiny positive results.

I had a customer who needed something to “feed the data to another system.” Their original approach was to write a Pentaho Report that formatted …

[Read more]

Aug

2007

Using Kettle for EII

Posted by Nicholas Goodman on Thu 16 Aug 2007 06:51 UTC
Tags:

Open Source, Pentaho, How To, Data Integration (Kettle)

Pentaho Data Integration (aka Kettle) can be used for ETL but it can also be used in EII scenarios. For instance, you have a report that can be run from a customer service application that will allow the customer service agent to see the current issues/calls up to the minute (CRM database) but also give a strategic snapshot of the customer from the customer profitability and value data mart (data warehouse). You’d like to look a this on the same report that with data coming from two different systems with different Operating Systems and databases.

Kettle can make short work of this using the integration Pentaho provides and the ability to SLURP data from an ETL transform into a report without the need to persist to some temporary or staging table. The thing that Pentaho has NOT made short work of, is being able to use the visual report authoring tools (Report Designer and Report Design Wizard) to be able to use a Kettle transform as a …

[Read more]

Jun

2007

Kettles secret in-memory database

Posted by Nicholas Goodman on Thu 21 Jun 2007 05:52 UTC
Tags:

Pentaho, General BI, Data Integration (Kettle)

Kettles secret in-memory database is

Not actually secret
Not actually Kettles

There. I said it, and I feel much better.
In most circumstances, Kettle is used in conjunction with a database. You are typically doing something with a database: INSERTs, UPDATEs, DELETEs, UPSERTs, DIMENSION UPDATEs, etc. While I do know of some people that are using Kettle without a database (think log munching and summarization) a database is something that a Kettle developer almost always has at their disposal.

Sometimes there isn’t a database. Sometimes you don’t want the slowdown of persistence in a database. Sometimes you just want Kettle to just have an in memory blackboard across transformations. Sometimes you want to ship an example to a customer using database operations but don’t want to fuss with database install, dump files, etc.

Kettle ships with a Hypersonic driver, and therefore, …

[Read more]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links