Hadoop is an increasingly popular means of analyzing transaction data from MySQL. Up until now mechanisms for moving data between MySQL and Hadoop have been rather limited. Continuent Tungsten Replicator provides enterprise-quality replication from MySQL to Hadoop under a GPL V2 license. Continuent Tungsten handles MySQL transaction types including INSERT/UPDATE/DELETE operations and can
February 27, 2014 By Severalnines
Stockholm MongoDB User Group Meetup: “MongoDB and Hadoop” Monday, March 3, 2014 starting @ 5:00 PM
Join us next Monday as we host the Stockholm MongoDB User Group Meetup in Kista, or the Wireless Valley as it is also referred to.
Our very own Vinay Joosery will be speaking about how to best automate the management & deployment of database clusters, specifically MongoDB clusters though the same principles apply for MySQL, MariaDB and Percona XtraDB based clusters. Henrik Ingo of MongoDB will be talking about Analytics with MongoDB & Hadoop. And Jim Dowling, a Senior Researcher at the Swedish Institute of Computer Science, will talk about a Hadoop PaaS platform.
So whether you’re from the MySQL or NoSQL world, there’ll be plenty of good content here to walk away with in addition to …
[Read more]
I blogged a couple of weeks ago about my upcoming MySQL/Hadoop talk at SCaLE 12X.
Unfortunately I had to cancel. A few days after writing the
article I came down with an eye problem that is fixed but
prevents me from flying anywhere for a few weeks. That's a pity
as I was definitely looking forward to attending the conference
and explaining how Tungsten replicates transactions from MySQL
into HDFS.
Meanwhile, we are still moving at full steam with Hadoop-related
work at Continuent, which is the basis for the next major
replication release, Tungsten Replicator 3.0.0. Binary builds and
documentation will go up in a few days. There will also be many
more public talks about Hadoop support, starting in April at
Over the last few years there has been an increasing interest in
immutable data management. This is a big change from the
traditional update-in-place approach many database systems
use today, where new values delete old values, which are then
lost. With immutable data you record everything, generally using
methods that append data from successive transactions rather than
replacing them. In some DBMS types you can access the older
values, while in others the system transparently uses the old
values to solve useful problems like implementing eventual
Baron Schwartz recently pointed out that it can be hard to get
decent transaction processing performance based on append-only
methods like append-only B-trees. This is not a very
strong argument against immutable data per se. …
Moving data between databases is hard. Without ever intending it, I seem to have spent a lifetime working on solutions for getting data into and out of databases, but more frequently between. In fact, my first job out of university was migrating data from BRS/Text, a free-text database (probably what we would call a NoSQL) into a more structured Oracle.
Today I spend some of my time working in Big Data, more often than not, migrating information from existing data stores into Big Data so that they can be analysed, something I covered in more detail here:
It's my pleasure to be presenting at SCaLE
12X on the subject of real-time data loading from MySQL to Hadoop.
This is the first public talk on work at Continuent that
enables Tungsten Replicator to move transactions from
MySQL to HDFS (Hadoop Distributed File System). I will
explain how replication to Hadoop works, how to set it up, and
offer a few words on constructing views of MySQL data using tools
like Hive.
As usual with replication everything we are doing on Hadoop
replication is open source. Builds and documentation will
be publicly available …
Oracle is the most powerful DBMS in the world. However, Oracle's expensive and complex replication makes it difficult to build highly available applications or move data in real-time to data warehouses and popular databases like MySQL. In this webinar-on-demand you will learn how Continuent Tungsten solves problems with Oracle replication at a fraction of the cost of other solutions and with less
The third, and final article in my series on migrating data to and from Hadoop and SQL databases is now available:
Big data is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions that use SQL. Today, DB2®, Oracle, Microsoft® SQL Server MySQL, and PostgreSQL dominate the SQL space and still make up a considerable proportion of the overall market. In this final article of the series, we will look at more automated solutions for migrating data to and from Hadoop. In the previous articles, we concentrated on methods that take exports or otherwise formatted and extracted data from your SQL source, load that into Hadoop in some way, then process or parse it. But if you want to analyze big data, you probably don’t want to wait while exporting the data. Here, we’re going to look at some methods and tools that enable a …
[Read more]I have spent the better part of the last month at Big Data conferences trying to see behind the $2.5 million in marketing smoke to see what is really going to be showing up on the to-do list of DBAs. The first bit of news is that half the vendors at shows like Strata or Big Data Techon will probably be gone by this time next year. So picking a vendor right now is a little iffy. Hadoop’s ecosystem is flourishing and will surely be around for some time but the vendors are playing musical chairs.
But we are Open Source and we do not need vendors! Well, yes and no. The good folks at Cloudera and Horton Works have done you a big favor by providing wonderful tutorials that are worth your time to see. Recently two former MySQL-ers, Sarah Sproehnle and Ian Wrigley, have put together Udacity that concisely teaches Hadoop technology and Cloudera deserves a round of applause for this …
[Read more]
In the previous article we introduced Hadoop as the most popular Big Data toolset on the market today. We had just started talking about MapReduce as the major framework that makes Hadoop distinctive. So let’s continue the discussion where we left off.
MapReduce is really the key to understanding Hadoop’s parallel processing functionality as it enables data in various formats (XML, text, binary, log, SQL, ect) to be divided up and mapped out to many computers nodes and then recombined back to produce a final data set.
[Read more]