Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Previous 30 Newer Entries Showing entries 31 to 60 of 86 Next 26 Older Entries

Displaying posts with tag: Data Integration (reset)

Forrester's EDM Wave
+0 Vote Up -1Vote Down

Forrester put out its Enterprise Data Management Q2 2009 report a few days ago, you can buy it from Forrester but it also seems to now be available for free from Microsoft here.  I don’t actively seek out these reports as they usually just re-enforce common knowledge (this one was no exception), however as it turned up I managed to find some time on the weekend for a quick read through.

Few surprises in this report, but some key mentions are:

  • DBMS market expected to grow 8% annually
  • IBM, Microsoft & Oracle own 88% of the DBMS market (by revenue)
  • Current market estimated at $27 billion, $32 billion by 2013
  • IBM,
  [Read more...]
Groovy Baby, Yeah
+0 Vote Up -5Vote Down


(yeah, this company is going to have to get used to the Austin Powers references.)

Groovy Corp put out a press release last night that starts the official launch of their SQL Switch relational database platform.

I have been speaking with Groovy for a few months, and while the press release is a bit noisy there is actually some interesting stuff in it.

First, an overview

  • They are an in memory RDBMS
  • They have worked with Intel to architect from the ground up for large multi processor concurrency
  • Initially they are launching as a multi-core appliance
  • They claim

  [Read more...]
The TPC Debate (yawn)
+0 Vote Up -2Vote Down

Recently on a number of sites the benefits for and against have been debated with, on occasion, these conversations falling into abuse being thrown in both directions.

From a pure technical perspective, the TPC benchmarks make little sense and are probably not relevant to 99% of organizations looking to implement a database technology.  But as a tool for generating visibility, debate and improved public awareness of a vendors technology they still have an impact. 

This is marketing, pure and simple.  Having a great TPC result is akin to an author having a great review on Amazon.  Doesn’t mean it is relevant for you but if faced with a stack of titles you haven’t yet read you’ll probably look more closely at the ones you’ve heard

  [Read more...]
Positioning your Database Start Up for Data Warehousing
+0 Vote Up -2Vote Down

Image via Wikipedia

BI/Data Warehousing is an easier market to enter for new database platform vendors.  This is for a few reasons.  Firstly, most BI deployments are custom built projects for each organization.  This means the ability to pick and choose various layers of the stack is much greater. 

Secondly, BI/DW projects success/failure metrics are often tied to database platform driven properties – performance, scalability, load times etc.  The ability to stray outside any existing database platform “standards” to choose a platform that better meets key

  [Read more...]
Positioning your Database Start Up for Enterprise OLTP
+0 Vote Up -2Vote Down

Image by RaghuP via Flickr

It is important to realize that there is less diversity in the enterprise OLTP market than at any point in the last 20 years.  Essentially this market has been boiled down to Oracle, SQL Server & DB2 (with few isolated exceptions).   Most new deployments are typically using one of the first two options.  The lack of diversity has created a stalemate or chicken &

  [Read more...]
How to Position your Database Start Up
+0 Vote Up -2Vote Down

I have been speaking with a lot or new database vendors over the last 12 months and this has prompted me to revisit a post I wrote mid last year.  The basic premise of this post is that your strategy, and the group of people you’re selling to, largely depends on the market sector you are focusing on (Enterprise OLTP, BI/DW, Cloud & Web 2.0).

A database platform by itself is a largely pointless piece of software.  The only way value is produced from a database platform is through the applications that interact with it.  Therefore the only way to be a successful database platform is by making others successful and motivated to use your platform.

Ok, so as a database platform vendor how do you enter this market then? Well there are a few strategies.  Due to the length of this article I have broken it up into Enterprise OLTP, Enterprise Data Warehousing and Cloud & Web 2.0

Amusing Database Videos
+0 Vote Up -0Vote Down

Oh my. This is just immensely funny & sad at the same time - Amusing Database Videos http://www.bigdatabaselist.com/wiki/Amusing_Database_Videos

Mapping to a database table
+0 Vote Up -0Vote Down

For some reason, the creation of a mapping to a database table poses a problem for certain people.

This is how it’s done in PDI 3.2.0 or later in the “Table Output” step:

Ogg video available over here

Until next time,
Matt

The problem with the RDBMS (Part 3) – Let's Get Real
+0 Vote Up -1Vote Down

Image by ToniVC via Flickr

  • Introduction
  • The Problem with the Relational Database (Part 1 ) –The Deployment Model
  •   [Read more...]
    Graph Databases and the Future of Large-Scale Knowledge Management
    +0 Vote Up -0Vote Down

    Image via Wikipedia

    Todd Hoff has posted a link to a Los Alamos National Lab presentation on Graph Databases.  In this paper they provide a revisit on the classic RDBMS vs Graph database debate.

    The Relational Database hasn’t maintained its dominance out of dumb luck.  Instead the RDBMS has consistently outperformed while providing the most general use capability of all the variety of platforms that have been

      [Read more...]
    The Argument For & Against Map/Reduce
    +0 Vote Up -0Vote Down

    The last 24 months has seen the introduction of Map/Reduce functionality into the data processing arena in various forms.  Map/Reduce is a framework for developing scalable data processing functionality, and was popularized by Google (see this earlier post).

    Pure players like Hadoop are starting to find their own niche, helped by organizations such as Cloudera.  However there has been a number of for & against arguments relating to Map/Reduce functionality inside the database.

    These arguments are now really serving a moot point.  Customers have recognized value in Map/Reduce prompting some (b)leading edge database vendors to

      [Read more...]
    Top 10 interesting companies in Data Management
    +0 Vote Up -1Vote Down

    A bit of fun for a Sunday.  Below is the list of my top 10 interesting companies in Data Management right now.  Interesting to me means doing new stuff and being somewhat disruptive, or have a “watch and see” quality about them.  Note this is about companies not data management applications. 

    While I find a bunch of other data management applications interesting (PNUTS, Cassandra, Redis etc) these aren’t really encapsulated in a company with a go to market strategy.

    10gen - They are making interesting noises not sure about delivery yet
    Amazon – SimpleDB is neat, but not a grown up data platform yet
    Aster Data – Doing funky things with

      [Read more...]
    Google Goodies and Lego
    +0 Vote Up -0Vote Down

    Dear Kettle friends,

    Will Gorman and Mike D’Amour, Senior Developers at Pentaho, are presenting Pentaho’s Google integration work at the Google I/O Developer Conference. (at the Sandbox area to be specific)   Yesterday, Pentaho announced that much.

    Here are a few of the integration points:

    • Google maps dashboard (available in the Pentaho BI server you can download)
    • A new Google Docs step was created for Pentaho Data Integration Enterprise Edition
    • Running (AVI, 30MB) the Pentaho BI server on
      [Read more...]
    PDI cloud : massive performance roundup
    +0 Vote Up -0Vote Down

    Dear Kettle fans,

    As expected there was a lot of interest in cloud computing at the MySQL conference last week.  It felt really good to be able to pass the Bayon Technologies white paper around to friends, contacts and analysts.  It’s one thing to demonstrate a certain scalability on your blog, it’s another entirely to have a smart man like Nicholas Goodman do the math.

    Sorting massive amounts of rows is hard problem to take on.  Making it scale on low-cost EC2

      [Read more...]
    Next week : MySQL UC
    +0 Vote Up -0Vote Down

    Dear Kettle & MySQL fans!

    I’m really looking forward to go to the MySQL User Conference next week, not just because I’m speaking in 2 sessions again, but perhaps also because these are “interesting” times for MySQL and Sun Microsystems.  Pivotal times it would seem.

    Here are the 2 sessions I’m going to do:

    • Cloud Computing with MySQL and Kettle : I’m particularly happy that MySQL accepted this session: it will demonstrate how easy it has become to do cloud computing exercises with tools like MySQL and Kettle.
      [Read more...]
    Resource exporter
    +0 Vote Up -0Vote Down

    Dear Kettle fans,

    One of the things that’s been on my TODO list for a while was the creation of a resource exporter

    Resource exporter?

    It’s called “Resource exporter” and not “Job exporter” or “Transformation exporter” because it is intended to export more than just a single job or transformation.  It exports all linked resources of a job or transformation.

    The means that if you have a job that has 5 transformation job entries, you will be exporting 6 resources (1 job and 5 transformations).  If those transformations use 3 sub-transformations (mappings) you will in total export 9 resources.

    The whole idea behind this exercise is to be able to create a package (for example to send to someone) that has all needed resources contained in a

      [Read more...]
    Pentaho Partner Summit ‘09
    +0 Vote Up -0Vote Down

    Dear reader,

    In a little over 3 weeks, April 2nd and 3rd, we’re organizing a Pentaho Partner Summit at the Quadrus Conference Center in Menlo Park near San Francisco.

    If you are (as the invitation describes) an “Executive, luminary, current or prospective partner from around the world” and if you come over you’ll meet myself, Julian Hyde and perhaps a couple of other architects as well.  That is outside of a host of other interesting people like Zack Urlocker (MySQL) and of course Richard Daley our CEO. We’ll be doing a couple of lengthy sessions on Kettle and Mondrian among other things.

    See you

      [Read more...]
    Is the Relational Database Doomed?
    +0 Vote Up -0Vote Down

    Recently, a lot of new non-relational databases have cropped up both inside and outside the cloud. One key message this sends is, "if you want vast, on-demand scalability, you need a non-relational database".

    If that is true, then is this a sign that the once mighty relational database finally has a chink in its armor? Is this a sign that relational databases have had their day and will decline over time? In this post, we'll look at the current trend of moving away from relational databases in certain situations and what this means for the future of the relational database.[more]

    Kickfire: Data Analytics for the Masses
    +0 Vote Up -0Vote Down

    You may not realize it, but the data analytics market is buzzing. There are new vendors emerging, new products popping up, new deals being done, and several new strategies being pursued. Vendors are predominately chasing big data, with battles lines being drawn by solution providers that cater to between roughly 100 TB and 10 PB data sets. The battle was inevitable because the world is producing data at a phenomenal rate, and we have an increasing need to analyze them within shorter time frames. In this post we analyze one of these vendors, Kickfire.

    Yet while the big names in town are capturing the headlines, in reality only a small percentage of businesses today need to be able to analyze petabytes of data. Today, the rest of us are more likely to deal with

      [Read more...]
    Top 10 Data Management Issues for 2009
    +0 Vote Up -0Vote Down

    So it’s that time of year again when everyone puts out their predictions for the year ahead.  I think predictions are a bit of a waste of time because to be interesting predictions have to be big, but a year really isn’t all that long so actual changes over the course of 2009 are likely to be just small progressions.  So instead I have been thinking about the top issues that we face heading into 2009 and here is my Top 10 list for issues in Data Management.  In this post I avoid offering solutions to these issues, while I have several ideas on solutions these can be the subject of subsequent posts.

    10 - Limits on Scalability

    While scalability is on my list it is at number 10 because against popular belief, scalability is only an issue for a very small number of data based applications.  Almost all data based


      [Read more...]
    Kettle at the MySQL UC 2009
    +0 Vote Up -0Vote Down

    Hello Kettle fans,

    Like Roland I got confirmation earlier this week that I could present my talk on “MySQL and Pentaho Data Integration in a cloud computing setting”, at the next MySQL user conference (http://www.mysql.com/news-and-events/users-conference/).

    I’m very excited about the work we’ve done on the subject and it’s going to be great talking about it in April.

    See you there!
    Matt

    Kettle workshop at KHM
    +0 Vote Up -0Vote Down

    Good news Kettle fans!

    Our community is bound to become a bit larger as a whole group of students (38) at the Katholieke Hogeschool Mechelen (Batchelor level) will receive a one day workshop with Pentaho Data Integration (Kettle).  This workshop will take place in early November, most likely the 4th.

    It’s interesting to see that during that day we’ll be able to go through most of the work involved in reading and staging the data, data cleansing and a few slowly changing dimensions with a fact table.  On top of that we’ll explain how to use Pentaho Data Integration in that setting.  When time permits we’ll show how to set up a metadata model on top of that data to create reports on it.  On top of that the students will get an idea about what exactly

      [Read more...]
    Dead wrong
    +0 Vote Up -0Vote Down

    Belgian consultancy company Element 61 has just posted an opinion piece under the disguise of a review on open source ETL.

    What a load of utter nonsens.  Try reading this:

    Instead of using SQL statements to transform data, an Open Source ETL tool gives the developer a standard set of functions, error handling rules and database connections. The integration of all these different components is done by the Open Source ETL tool provider. The straightforward transformations can be implemented very quickly, without the hassle of writing queries, connecting to data sources or writing your own error handling process. When there are complex transformations to make, Open Source ETL tools will often not offer out-of-the-box solutions.

    Well Mr Jan Claes, we’re perfectly

      [Read more...]
    T-Dose 2008
    +0 Vote Up -0Vote Down

    Roland Bouman and I will be doing a presentation together at T-Dose on October 25th:

    Building Open Source BI solutions with Pentaho and MySQL

    It’s a free conference, feel free to join us there for a chat and/or a drink!

    Until then,
    Matt

    Getting started with Kettle
    +0 Vote Up -0Vote Down

    For those people starting with Kettle (Pentaho Data Integration) we created a Getting Started page on our Wiki.

    Since I realized that for some people, simple and easy can never be simple and easy enough I created 8 mini-flash demos :

      [Read more...]
    Pentaho changes
    +0 Vote Up -0Vote Down

    I’m back at my favorite spot at the Orlando airport:

    This week has gone bye so fast it’s kinda scary.  I got dragged into one meeting after another design session after another knowledge transfer opportunity for 5 days in a row.  After our long working days, the discussions and talks just continued over dinner and beers.

    It was great to meet everyone and as always we had a good time around the office and at the Ale House.  I even managed to stay sober this time around.  Well at least most of the time.

    As always, the thing that struck me the most was how fast Pentaho changes.  It’s almost like visiting a different company every time I drop in.  Since I don’t see the day-to-day changes around the office, the difference between the first time I visited (15 people) and now (70+) is striking.  The office

      [Read more...]
    Parallel CSV reader
    +0 Vote Up -0Vote Down

    I almost forgot I wrote the code a while back. Someone asked me about it yesterday, so I dusted the parallel CSV reader code off this morning and here are the results:

    This test basically reads a file with 10M customer records (generated), sized 919169988 bytes in 18.3 seconds. (50MB/s) Obviously, my poor laptop disk can’t deliver at that speed, so these test results are obtained by utilizing the excellent Linux caching system

    In any case, the caching system simulates faster disk subsystem.

    On my computer, the system doesn’t really scale linearly (especially in this case, the OS uses up some CPU power too) , but the speedup is noticeable from 25.8 to 18.3 seconds. (about 30% faster)

    The interesting thing is that if

      [Read more...]
    IRC ##pentaho
    +0 Vote Up -0Vote Down

    Of-course there are the crazies, but usually we have a good time over on ##pentaho IRC.

    Yesterday we had our very first community event when Doug “Spanky” Moran hosted a dial-in to talk about what was up in the community.

    Today, I learned about regular andresF his blog.

    Internet Relay Chat is old technology that has existed for quite a while now, but to me it doesn’t lose it’s appeal. Over at FOSDEM I learned that there are companies like MySQL that have private channels to communicate “non-intrusively” with colleagues. “Maybe some developer can help me with this stupid problem. I’ll just drop a question on the channel.” It’s a good idea, we should consider it for Pentaho too.

    Until next time,

    Matt

    Rolling back transactions
    +0 Vote Up -0Vote Down

    Pentaho Data Integration (Kettle) never was a real transactional database engine, and never pretended to be that. It was designed to handle large data volumes and slam a commit in between every couple of thousand rows to prevent the databases from chocking on the logging problem.

    However, more and more people are using Kettle transformations in a transactional way. They want to have the option to roll back any change that happened to a database during the execution of a transformation in case anything goes wrong.

    Well, we have been working on that in the past, but never quite got it right… until today actually. As part of bug report 724 I lifted the decision to commit or roll back all databases to the transformation level.

    Take for example a look at this

      [Read more...]
    Step performance graphs
    +0 Vote Up -0Vote Down

    One of the things I’ve been working on lately in Kettle / Pentaho Data Integration is the transparency of the performance monitoring.

    We don’t just need an API to get the step performance data out, but we also need to visualize this data in a simple way, something like this:

    The next steps will be to also allow this data to be spooled off to a database somewhere and to be accessed remotely using Carte.

    Until next time,

    Matt

    Previous 30 Newer Entries Showing entries 31 to 60 of 86 Next 26 Older Entries

    Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

    Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.