Showing entries 21 to 30 of 87
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: Data Integration (reset)
What is Big Data?

Image by Aranda\Lasch via Flickr

One of my favorite terms at the moment is “Big Data”.  While all terms are by nature subjective, in this post I will try and explain what Big Data means to me.

So what is Big Data?Big Data is the “modern scale” at which we are defining or data usage challenges.  Big Data begins at the point where need to seriously start thinking about the technologies used to drive our information needs.

While Big Data as a term seems to refer to volume this isn’t the case.  Many existing technologies have little problem physically handling large volumes (TB or PB) of data.  Instead the Big Data challenges result out of the combination of volume and our usage demands from that data.  And those …

[Read more]
Re-Introducing UDJC

Dear Kettle fans,

Daniel & I had a lot of fun in Orlando last week. Among other things we worked on the User Defined Java Class (UDJC) step.  If you have a bit of Java Experience, this step allows you to quickly write your own plugin in a step. This step is available in recent builds of Pentaho Data Integration (Kettle) version 4.

Now, how does this work?  Well, let’s take Roland Bouman’s example : the calculation of the the date of Easter.  In this blog post, Roland explains how to calculate Easter in MySQL and Kettle using JavaScript.  OK, so what if you want this calculation to be really fast in Kettle?  Well, then you can turn to pure Java to do the job…

[Read more]
Back from Blogging Hiatus - Update 3

Image by Nathan Lanier via Flickr

<< Back from Blogging Hiatus - Update 2

IngresNo specific announcements from Ingres other than I think the VectorWise stuff is progressing well.

To me Ingres is a bit of a dark horse.  They are open source and doing reasonable revenues.  And they are active in the enterprise market (something MySQL hasn’t really achieved).  But they remain largely off the radar in commentary surrounding the DBMS industry.

My personal pick is this will start to change …

[Read more]
Back from Hiatus - Summary Update 2

Back from Hiatus - Summary Update 1

GoodDataGoodData has launched and they are providing a cloud based analytics platform for use in integration with online apps.  Starting with some initial focus on SalesForce data, but working hard on expanding the list of ISV’s who choose to provide their customers analytics via GoodData.

GoodData was started by “good guy” Czech serial entrepreneur Roman Stanek (NetBeans) and has just raised funds from Andressen Horowitz and appointed Time O’Reilly to the board.  GoodData is interesting because it is simple, accessible and available on demand.  Still early days but think Roman is on to another winner here.  Certainly …

[Read more]
Back from Hiatus - Summary Update 1

Here is a summary of the key discussions I have had over the last month.  Keep in mind, I’m no analyst.  This is largely opinion based on various conversations I have had with the relevant companies (for analyst insight see Curt Monash).

KickFireI think Kickfire has been doing it a little tough lately.  The difficulties in a startup launching a hardware appliance (and associated logistics) combined with being too focused on the MySQL customer base has impacted the growth of this interesting start up.  But they aren’t taking it lying down and have adjusted the strategy and have added a new appliance to the range.  Kickfire now seems to have a stronger focus on the enterprise

[Read more]
Is the RDBMS doomed (yada yada yada) ?

Image by Snooch2TheNooch via Flickr

I was speaking with Michael Stonebraker this morning.  I mentioned that lately many have been referencing comments he has made over the last couple of years.  And I also mentioned that many had interpreted them as he was implying the RDBMS is “doomed”.  Mike has been saying the same thing for years, but the current NoSQL movement seems to have picked up on this and highlighting one of the RDBMS's own pioneers is predicting its …

[Read more]
VectorWise


I was fortunate enough to speak with Marcin Zukowski earlier about VectorWise.  If you missed it, VectorWise came out of stealth mode a day or two ago.  The have announced a joint partnership with Ingres and essentially are claiming impressive analytic RDBMS performance gains on conventional hardware.

To start with, a key message that I think needs to be communicated here is that this is not a product announcement.  Ingres and VectorWise have announced a partnership in which they of course plan to build products together, today those products are still in the works.

VectorWise is a spin out of CWI based on research that was undertaken by Marcin and others, research …

[Read more]
The NoSQL community needs to engage the DBA’s

The NoSQL movement has been gaining some steam lately, with discussion forums and mailing lists popping up all around the web.  Despite having a career that has been centered on the RDBMS, I have made no secret that I think we have gone too far down with our RDBMS for everything mindset.  I think we need to add a few more tools back into our data toolbox. 

Today, 99.5% of new data centric developments started will use a RDBMS by default.  Maybe .5 of a % will consider using something as obtuse as a NoSQL platform.  By experience I know the majority of people discussing NoSQL platforms today are web developers.  In fact there is almost a sense of trying to trying to keep this under the radar of DBAs.  If we don’t talk to the DBAs about this stuff then they won’t bother us with all that …

[Read more]
HamsterDB

This post was a bit of a test to see if I could write a serious post about a database platform called Hamster.  I think I just made it :)

With all the noise over key/value stores recently, we should keep in mind that this technology isn’t exactly new.  It is being applied to new problems, but many of the foundations have been around for decades.  Probably the oldest of them all, Berkley DB came into existence during the mid ‘80’s and now has over 200 million deployments (according to the Oracle web site).

HamsterDB, while not having the same pedigree of Berkley, has been steadily worked on by Christoph Rupp for the last 5 years.  I spoke to Christoph yesterday about his release of a new edition of …

[Read more]
HadoopDB discussion with Daniel Abadi


I spoke to Daniel Abadi this morning about his HadoopDB announcement that came out a couple of days back.  I am sure this has been a busy time for Daniel and his team over in Yale as HadoopDB has been getting a lot of interest which I am sure will continue to build.

Some notes from our discussion:

  • HadoopDB is primarily focused on high scalability and the required availability at scale.  Daniel questions current MPP’s ability to truly scale past 100 nodes whereas Hadoop has real examples on 3000+ nodes.
  • HadoopDB like many MPP analytical database platforms uses shared nothing relational database as processing units. HadoopDB uses Postgres.  Unlike other MPP databases, HadoopDB uses Hadoop as the …
[Read more]
Showing entries 21 to 30 of 87
« 10 Newer Entries | 10 Older Entries »