My article on how to make the real-time processing of information from traditional transactional stores into Hadoop a reality has been published over at TDWI:[Read more]
June 16, 2014 By Severalnines
MongoDB is great at storing clickstream data, but using it to analyze millions of documents can be challenging. Hadoop provides a way of processing and analyzing data at large scale. Since it is a parallel system, workloads can be split on multiple nodes and computations on large datasets can be done in relatively short timeframes. MongoDB data can be moved into Hadoop using ETL tools like Talend or Pentaho Data Integration (Kettle).
In this blog, we’ll show you how to integrate your MongoDB and Hadoop datastores using Talend. We have a MongoDB database collecting clickstream data from several websites. We’ll create a job in Talend to extract the documents from MongoDB, transform and then load them into HDFS. We will also show you how to schedule this job to be executed every 5 minutes.
We have an application …[Read more]
I’m pleased to say that Continuent will be at the Hadoop Summit in San Jose next week (3-5 June). Sadly I will not be attending as I’m taking an exam next week, but my colleagues Robert Hodges, Eero Teerikorpi and Petri Versunen will be there to answer any questions you have about Continuent products, and, of course, Hadoop replication support built into Tungsten Replicator 3.0.
If you are at the conference, please go along and say hi to the team. And, as always, if there are any questions please let them or me know.[Read more]
Getting data into Hadoop is not difficult, but it is complex if you want to load 'live' or semi-live data into your Hadoop cluster from your Oracle and MySQL databases. There are plenty of solutions available, from manually dumping and loading to the good and bad sides of using a tool like Sqoop. Neither are easy and both prone to the problems of lag between the moment you perform the dump and
An article about moving data into Hadoop in real-time has just been published over at DBTA, written by me and my CEO Robert Hodges.
In the article I talk about one of the major issues for all people deploying databases in the modern heterogenous world – how do we move and migrate data effectively between entirely different database systems in a way that is efficient and usable. How do you get the data you need to the database you need it in. If your source is a transactional database, how does that data get moved into Hadoop in a way that makes the data usable to be queried by Hive, Impala or HBase?
You can read the full article here: Real-Time Data Movement: The Key to Enabling Live Analytics With Hadoop
Filed under: …
So I’ve submitted my talks for the Tech14 UK Oracle User Group conference which is in Liverpool this year. I’m not going to give away the topics, but you can imagine they are going to be about data translation and movement and how to get your various databases talking together.
I can also say, after having seen other submissions for talks this year (as I’m helping to judge), that the conference is shaping up to be very interesting. There’s a good spread of different topics this year, but I know from having talked to the organisers that they are looking for more submissions in the areas of Operating Systems, Engineered Systems and Development (mobile and cloud).
If you’ve got a paper, presentation, or idea for one that you think would be useful, …[Read more]
Background: If you did not read my first blog post about why I am sharing my thoughts on the benchmarks published by Mark Callaghan on Small Datum you may want to skim through it now for a little context: “Thoughts on Small Datum – Part 1”
Last time, in “Thoughts on Small Datum – Part 2” I shared my cliff notes and a graph on Mark Callaghan’s (@markcallaghan) March 11th insertion rate benchmarks using flash storage media. In those tests he compares MySQL outfitted with the …[Read more]
The title of this post should really be, “Maybe He Should Try Taking a Walk in Your Shoes.”
The he I’m referring to is economist and author, Tim Harford. The you is the people who use NewSQL and NoSQL approaches to mine big data with database platforms like MySQL and MongoDB (or, preferably, our high-performance distributions of them, TokuDB and TokuMX).
Why should Mr. Harford take that walk? Well, he recently penned an article on big data in …[Read more]
On March 11th, Mark, a former Google and now Facebook database guru, published an insertion rate benchmark comparing MySQL outfitted with the InnoDB storage engine with two NoSQL alternatives — basic MongoDB and …[Read more]
A little background…
When I ventured into sales and marketing (I’m an engineer by education) I learned I would often have to interpret and simply summarize the business value that is sometimes hidden in benchmarks. Simply put, the people who approve the purchase of products like TokuDB® and TokuMX™ appreciate the executive summary.
Therefore, I plan to publish a multipart series here on TokuView where I will share my simple summaries and thoughts on business value for the benchmarks Mark Callaghan (@markcallaghan), a former Google and now Facebook database guru, is publishing on his blog, Small Datum.
I’m going to start with his first benchmark post and work my way forward to …[Read more]