June 16, 2014
MongoDB is great at storing clickstream data, but using it to analyze millions of documents can be challenging. Hadoop provides a way of processing and analyzing data at large scale. Since it is a parallel system, workloads can be split on multiple nodes and computations on large datasets can be done in relatively short timeframes. MongoDB data can be moved into Hadoop using ETL tools like Talend or Pentaho Data Integration (Kettle).
In this blog, we’ll show you how to integrate your MongoDB and Hadoop datastores using Talend. We have a MongoDB database collecting clickstream data from several websites. We’ll create a job in Talend to extract the documents from MongoDB, transform and then [Read more...]
The Hadoop Summit, a leading Apache Hadoop industry conference, has grown significantly over the years, and throughout the day, theCUBE, led by hosts John Furrier and Jeff Kelly, featured the best of thought leaders, use cases, data scientists, data analysts, and developers at the event. Watch yesterday's interview with Robert Hodges (CEO, Continuent) on real-time data loading from Oracle and
In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala.
This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the MySQL server to execute queries. This allows for an easy “migration” of existing analytical tools. The results are quite interesting and promising.
The InfiniDB documentation is not very clear on step-by-step instructions so I’ve created this [Read more...]
Oracle's expensive and complex replication makes it difficult to build cost-effective applications that move data in real-time to data warehouses (Oracle, Hadoop, Vertica) and popular databases like MySQL. Fortunately, Continuent Tungsten offers a solution.In this virtual course, you will learn how Continuent Tungsten solves problems with Oracle replication at a fraction of the cost of other
Getting data into Hadoop is not difficult, but it is complex if you want to load 'live' or semi-live data into your Hadoop cluster from your Oracle and MySQL databases. There are plenty of solutions available, from manually dumping and loading to the good and bad sides of using a tool like Sqoop. Neither are easy and both prone to the problems of lag between the moment you perform the dump and
May 16, 2014
We won’t bore you with buzzwords like volume, velocity and variety. This post is for MySQL users who want to get their hands dirty with Hadoop, so roll up your sleeves and prepare for work. Why would you ever want to move MySQL data into Hadoop? One good reason is archival and analytics. You might not want to delete old data, but rather move it into Hadoop and make it available for further analysis at a later stage.
In this post, we are going to deploy a Hadoop Cluster and export data in bulk from a Galera Cluster using Apache Sqoop. Sqoop is a well-proven approach for bulk data loading from a relational [Read more...]
SAN JOSE, CA– May 6, 2014 – Continuent, Inc., a leading provider of open source database clustering and replication solutions, today announced that their recently announced Tungsten Replicator 3.0 solution has been certified by Cloudera, the leader in enterprise analytic data management powered by Apache Hadoop™. Continuent Tungsten Replicator 3.0 enables organizations to quickly and easily
Don't miss your opportunity to learn about Continuent Tungsten via our free "Setup & Operate" webcast series. These free webcasts include live presentations and interactive Q&A.Webcast OverviewsSetup & Operate Tungsten ReplicatorMay 15th, 10:00 am PDTTungsten Replicator is an innovative and reliable tool that can solve your most complex replication problems. We will introduce Replicator
ICTexpo Helsinki 2014 offers two effective days full of innovations, inspiration and information - the biggest professional IT show in the Nordics with large scale of solutions to help you to take your business to the next level. Continuent will be exhibiting in Red Hat Village [booth 5f31], which gathers the most significant enterprise level companies from the Open Source ecosystem in Finland
Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting on top of that. For the examples below I will use the “ontime flight performance” data from my previous post (Increasing MySQL performance with parallel query execution). I’ve used the [Read more...]
Continuent would like to better understand the relationships and data flows that exist between different database systems that you are using to understand your replication and data integration needs better. In particular, we'd like to know about any heterogeneous data exchanges, including manual dump/load and automated process, and whether non-database sources, such as Twitter and Facebook,
Lean, mean MySQL and hulking Hadoop clusters may seem like an odd couple, but tying them together is now priority #1 for many MySQL users. This keynote talk on 1st day of this year's Percona Live MySQL Conference & Expo 2014 explores the data management trends spurring integration, how the MySQL community is stepping up, and where the integration may go in the future. Robert Hodges, CEO at
Hadoop is an increasingly popular means of analyzing transaction data from single MySQL or multiple MySQL servers. Up until now mechanisms for moving data between MySQL and Hadoop have been rather limited. The new Continuent Tungsten Replicator 3.0 provides enterprise-quality replication from MySQL to Hadoop. Tungsten Replicator 3.0 is 100% open source, released under a GPL V2 license, and
Keynotes and Sessions:
Keynote: Getting Serious about MySQL and Hadoop at ContinuentRobert Hodges (CEO, Continuent)
Hadoop for MySQL PeopleChris Schneider (Database Architect, Groupon.com)
From Dolphins to Elephants: Real-Time MySQL to Hadoop ReplicationMC Brown (Director of Documentation, Continuent), Linas Virbalas (Senior Software Engineer, Continuent)
Virtually Available MySQL, or How to
Continuent, a leading provider of database clustering and replication software has five (5) new positions open:
Senior Database Availability and Clustering Engineer
Senior Database Replication Engineer
Data Replication Sales Engineer
Clustering and Replication Test Development Engineer
If you want to get in on the ground floor of a growing company in a challenging field
Hadoop is an increasingly popular means of analyzing transaction data from MySQL. Up until now mechanisms for moving data between MySQL and Hadoop have been rather limited. Continuent Tungsten Replicator provides enterprise-quality replication from MySQL to Hadoop under a GPL V2 license. Continuent Tungsten handles MySQL transaction types including INSERT/UPDATE/DELETE operations and can
February 27, 2014
Stockholm MongoDB User Group Meetup: “MongoDB and Hadoop”
Monday, March 3, 2014 starting @ 5:00 PM
Join us next Monday as we host the Stockholm MongoDB User Group Meetup in Kista, or the Wireless Valley as it is also referred to.
Our very own Vinay Joosery will be speaking about how to best automate the management & deployment of database clusters, specifically MongoDB clusters though the same principles apply for MySQL, MariaDB and Percona XtraDB based clusters. Henrik Ingo of MongoDB will be talking about Analytics with MongoDB & Hadoop. And Jim Dowling, a Senior Researcher at the Swedish Institute of Computer Science, will talk [Read more...]
Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.