Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 30 of 139 Next 30 Older Entries

Displaying posts with tag: hadoop (reset)

Hadoop BoF Session at OSCON
+0 Vote Up -0Vote Down

I have a BoF session next week at OSCON next week:

Migrating Data from MySQL and Oracle into Hadoop

The session is at 7pm Tuesday night – look for rooms D135 and/or D137/138.

Correction: We are now in  E144 on Tuesday with the Hadoop get together first at 7pm, and the Data Migration to follow at 8pm.

I’m actually going to be joined by Gwen Shapira from Cloudera, who has a BoF session on Hadoop next door at the same time, along with Eric Herman from Booking.com. We’ll use the opportunity to talk all things Hadoop, but particularly the ingestion of data from MySQL and other databases into the Hadoop datastore.

As always, it’d be great to meet anybody interested in Hadoop at the BoF, please come along and

  [Read more...]
Making Real-Time Analytics a Reality — TDWI -The Data Warehousing Institute
+0 Vote Up -0Vote Down

My article on how to make the real-time processing of information from traditional transactional stores into Hadoop a reality has been published over at TDWI:

Making Real-Time Analytics a Reality — TDWI -The Data Warehousing Institute.


Big Data Integration & ETL - Moving Live Clickstream Data from MongoDB to Hadoop for Analytics
+1 Vote Up -0Vote Down
June 16, 2014 By Severalnines

MongoDB is great at storing clickstream data, but using it to analyze millions of documents can be challenging. Hadoop provides a way of processing and analyzing data at large scale. Since it is a parallel system, workloads can be split on multiple nodes and computations on large datasets can be done in relatively short timeframes. MongoDB data can be moved into Hadoop using ETL tools like Talend or Pentaho Data Integration (Kettle).

 

In this blog, we’ll show you how to integrate your MongoDB and Hadoop datastores using Talend. We have a MongoDB database collecting clickstream data from several websites. We’ll create a job in Talend to extract the documents from MongoDB, transform and then

  [Read more...]
theCube @ Hadoop Summit 2014 - Robert Hodges (Continuent) with John Furrier and Jeff Kelly on on real-time data loading from Oracle and MySQL into Hadoop.
+0 Vote Up -0Vote Down
The Hadoop Summit, a leading Apache Hadoop industry conference, has grown significantly over the years, and throughout the day, theCUBE, led by hosts John Furrier and Jeff Kelly, featured the best of thought leaders, use cases, data scientists, data analysts, and developers at the event. Watch yesterday's interview with Robert Hodges (CEO, Continuent) on real-time data loading from Oracle and
Using InfiniDB MySQL server with Hadoop cluster for data analytics
+1 Vote Up -0Vote Down

In my previous post about Hadoop and Impala I benchmarked performance of analytical queries in Impala.

This time I’ve tried InfiniDB for Hadoop (open-source version) on the modern hardware with an 8-node Hadoop cluster. One of the main advantages (at least for me) of InifiniDB for Hadoop is that it stores the data inside the Hadoop cluster but uses the MySQL server to execute queries. This allows for an easy “migration” of existing analytical tools. The results are quite interesting and promising.

Quick How-To

The InfiniDB documentation is not very clear on step-by-step instructions so I’ve created this

  [Read more...]
Webinar-on-Demand: Set Up & Operate Open Source Oracle Replication
+0 Vote Up -0Vote Down
Oracle's expensive and complex replication makes it difficult to build cost-effective applications that move data in real-time to data warehouses (Oracle, Hadoop, Vertica) and popular databases like MySQL. Fortunately, Continuent Tungsten offers a solution.In this virtual course, you will learn how Continuent Tungsten solves problems with Oracle replication at a fraction of the cost of other
Continuent at Hadoop Summit
+1 Vote Up -0Vote Down

I’m pleased to say that Continuent will be at the Hadoop Summit in San Jose next week (3-5 June). Sadly I will not be attending as I’m taking an exam next week, but my colleagues Robert Hodges, Eero Teerikorpi and Petri Versunen will be there to answer any questions you have about Continuent products, and, of course, Hadoop replication support built into Tungsten Replicator 3.0.

If you are at the conference, please go along and say hi to the team. And, as always, if there are any questions please let them or me know.


Webinar-on-demand: Set up & operate real-time data loading into Hadoop
+1 Vote Up -0Vote Down
Getting data into Hadoop is not difficult, but it is complex if you want to load 'live' or semi-live data into your Hadoop cluster from your Oracle and MySQL databases. There are plenty of solutions available, from manually dumping and loading to the good and bad sides of using a tool like Sqoop. Neither are easy and both prone to the problems of lag between the moment you perform the dump and
Real-Time Data Movement: The Key to Enabling Live Analytics With Hadoop
+0 Vote Up -0Vote Down

An article about moving data into Hadoop in real-time has just been published over at DBTA, written by me and my CEO Robert Hodges.

In the article I talk about one of the major issues for all people deploying databases in the modern heterogenous world – how do we move and migrate data effectively between entirely different database systems in a way that is efficient and usable. How do you get the data you need to the database you need it in. If your source is a transactional database, how does that data get moved into Hadoop in a way that makes the data usable to be queried by Hive, Impala or HBase?

You can read the full article here: Real-Time Data Movement: The Key to

  [Read more...]
Archival and Analytics - Importing MySQL data into Hadoop Cluster using Sqoop
+1 Vote Up -0Vote Down
May 16, 2014 By Severalnines

We won’t bore you with buzzwords like volume, velocity and variety. This post is for MySQL users who want to get their hands dirty with Hadoop, so roll up your sleeves and prepare for work. Why would you ever want to move MySQL data into Hadoop? One good reason is archival and analytics. You might not want to delete old data, but rather move it into Hadoop and make it available for further analysis at a later stage. 

 

In this post, we are going to deploy a Hadoop Cluster and export data in bulk from a Galera Cluster using Apache Sqoop. Sqoop is a well-proven approach for bulk data loading from a relational

  [Read more...]
Cross your Fingers for Tech14, see you at OSCON
+0 Vote Up -0Vote Down

So I’ve submitted my talks for the Tech14 UK Oracle User Group conference which is in Liverpool this year. I’m not going to give away the topics, but you can imagine they are going to be about data translation and movement and how to get your various databases talking together.

I can also say, after having seen other submissions for talks this year (as I’m helping to judge), that the conference is shaping up to be very interesting. There’s a good spread of different topics this year, but I know from having talked to the organisers that they are looking for more submissions in the areas of Operating Systems, Engineered Systems and

  [Read more...]
Continuent Delivers Real-Time Data to Cloudera | Business Wire
+0 Vote Up -0Vote Down
SAN JOSE, CA– May 6, 2014 – Continuent, Inc., a leading provider of open source database clustering and replication solutions, today announced that their recently announced Tungsten Replicator 3.0 solution has been certified by Cloudera, the leader in enterprise analytic data management powered by Apache Hadoop™. Continuent Tungsten Replicator 3.0 enables organizations to quickly and easily 
Setup & operate Tungsten webinar series
+1 Vote Up -0Vote Down
Don't miss your opportunity to learn about Continuent Tungsten via our free "Setup & Operate" webcast series. These free webcasts include live presentations and interactive Q&A.Webcast OverviewsSetup & Operate Tungsten ReplicatorMay 15th, 10:00 am PDTTungsten Replicator is an innovative and reliable tool that can solve your most complex replication problems. We will introduce Replicator
See you at ICTexpo Helsinki 2014
+0 Vote Up -0Vote Down
ICTexpo Helsinki 2014 offers two effective days full of innovations, inspiration and information - the biggest professional IT show in the Nordics with large scale of solutions to help you to take your business to the next level. Continuent will be exhibiting in Red Hat Village [booth 5f31], which gathers the most significant enterprise level companies from the Open Source ecosystem in Finland
Using Apache Hadoop and Impala together with MySQL for data analysis
+0 Vote Up -0Vote Down

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from  MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting on top of that. For the examples below I will use the “ontime flight performance” data from my previous post (Increasing MySQL performance with parallel query execution). I’ve used the

  [Read more...]
"Minute-to-win-it" Blue Studio by Beats - Heterogenous Replication Survey
+0 Vote Up -0Vote Down
Continuent would like to better understand the relationships and data flows that exist between different database systems that you are using to understand your replication and data integration needs better. In particular, we'd like to know about any heterogeneous data exchanges, including manual dump/load and automated process, and whether non-database sources, such as Twitter and Facebook,
Reflections on return to MySQL Community and Ecosystem
+1 Vote Up -0Vote Down
After a four year hiatus, my participation in last week’s Percona Live MySQL Users conference marked my official return to the MySQL Community and Ecosystem. As with earlier renditions this year’s “UC” was very well attended with a healthy mix of familiar faces and new blood, all coming together to discuss, present and explore the boundaries of the most popular and widely used open source database on the planet.  There were many good, informative keynote and technical sessions, BoFs and the exhibit hall was packed most of the operating hours with those interested in what the MySQL ecosystem is up to.  I also found it very refreshing that Oracle was among the most active in presenting useful technical content around their current and future MySQL open source product releases. All in  [Read more...]
PerconaLive Keynote: Getting Serious about MySQL and Hadoop at Continuent
+0 Vote Up -0Vote Down
Lean, mean MySQL and hulking Hadoop clusters may seem like an odd couple, but tying them together is now priority #1 for many MySQL users. This keynote talk on 1st day of this year's Percona Live MySQL Conference & Expo 2014 explores the data management trends spurring integration, how the MySQL community is stepping up, and where the integration may go in the future. Robert Hodges, CEO at
Tungsten Replicator 3.0 is Cloudera Enterprise 5 Certified
+0 Vote Up -0Vote Down

One of the key platforms I’ve been testing on for the MySQL to Hadoop replication has been Cloudera, largely driven by customer requirements, but it’s also one of the easiest way to get started with Hadoop.

What I’m even more pleased about is the fact that we are proud to announce that Tungsten Replicator 3.0 is certified for use on the new Cloudera Enterprise 5 platform. That means that we’re sure that replicating your data from MySQL to Cloudera 5 and have it work without causing problems or difficulties on the Hadoop

  [Read more...]
Continuent Replication to Hadoop – Now in Stereo!
+0 Vote Up -0Vote Down

Hopefully by now you have already seen that we are working on Hadoop replication. I’m happy to say that it is going really well. I’ve managed to push a few terabytes of data and different data sets through into Hadoop on Cloudera, HortonWorks, and Amazon’s Elastic MapReduce (EMR). For those who have been following my long association with the IBM InfoSphere BigInsights Hadoop product, and I’m pleased to say that it’s working there too. I’ve had to adapt Robert’s original script to work with the different versions of the underlying Hadoop tools and systems to make it compatible. The actual performance and process is unchanged; you just use a different JS-based batchloader script to work with different tools.

Robert has also been simplifying some of the core functionality, such as configuring some fixed pre-determined

  [Read more...]
Webinar-on-Demand: Real-Time Data Loading from MySQL to Hadoop
+0 Vote Up -0Vote Down
Hadoop is an increasingly popular means of analyzing transaction data from single MySQL or multiple MySQL servers. Up until now mechanisms for moving data between MySQL and Hadoop have been rather limited. The new Continuent Tungsten Replicator 3.0 provides enterprise-quality replication from MySQL to Hadoop. Tungsten Replicator 3.0 is 100% open source, released under a GPL V2 license, and
Don't miss these Tungsten talks at Percona Live MySQL Conference & Expo
+0 Vote Up -0Vote Down
Keynotes and Sessions: Keynote: Getting Serious about MySQL and Hadoop at ContinuentRobert Hodges (CEO, Continuent) Hadoop for MySQL PeopleChris Schneider (Database Architect, Groupon.com) From Dolphins to Elephants: Real-Time MySQL to Hadoop ReplicationMC Brown (Director of Documentation, Continuent), Linas Virbalas (Senior Software Engineer, Continuent) Virtually Available MySQL, or How to
We're hiring!
+1 Vote Up -0Vote Down
Continuent, a leading provider of database clustering and replication software has five (5) new positions open: Build/Test Engineer Senior Database Availability and Clustering Engineer Senior Database Replication Engineer Data Replication Sales Engineer Clustering and Replication Test Development Engineer If you want to get in on the ground floor of a growing company in a challenging field
Real-Time Data Loading from MySQL to Hadoop using Tungsten Replicator 3.0 Webinar
+0 Vote Up -0Vote Down

To follow-up and describe some of the methods and techniques behind replicating into Hadoop from MySQL in real-time, and how this can be combined into your data workflow, Continuent are running a webinar with me presenting that will go over the details and provide a demo of the data replication process.

Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0

Hadoop is an increasingly popular means of analyzing transaction data from MySQL. Up until now mechanisms for moving data between MySQL and Hadoop have been rather limited. The new Continuent Tungsten Replicator 3.0 provides enterprise-quality replication from MySQL to Hadoop. Tungsten Replicator 3.0 is 100% open source, released under a GPL V2 license, and available for download at

  [Read more...]
MySQL to Hadoop Step-By-Step
+0 Vote Up -0Vote Down

We had a great webinar on Thursday about replicating from MySQL to Hadoop (watch the whole thing). It was great, but one of the questions at the end was ‘is there an easy way to test’.

Sadly we can’t go giving out convenient ready-to-run downloads of these things because of licensing and and other complexities, so I want to try and make it as simple and straightforward as possible by giving you the directions to complete. I’m going to be point to the Continuent Documentation every now and then so this is not too crowded, but we should get through it pretty easily.

Major Decisions

For this to work: 

  • We’ll setup two VMs, one the master
  [Read more...]
Real-time data loading from MySQL to Hadoop
+0 Vote Up -0Vote Down
Hadoop is an increasingly popular means of analyzing transaction data from MySQL. Up until now mechanisms for moving data between MySQL and Hadoop have been rather limited. Continuent Tungsten Replicator provides enterprise-quality replication from MySQL to Hadoop under a GPL V2 license.  Continuent Tungsten handles MySQL transaction types including INSERT/UPDATE/DELETE operations and can
MongoDB and Hadoop - Stockholm MongoDB User Group Meetup - Monday, March 3, 2014
+0 Vote Up -0Vote Down
February 27, 2014 By Severalnines

 

Stockholm MongoDB User Group Meetup: “MongoDB and Hadoop”

Monday, March 3, 2014 starting @ 5:00 PM

 

Join us next Monday as we host the Stockholm MongoDB User Group Meetup in Kista, or the Wireless Valley as it is also referred to. 

 

Our very own Vinay Joosery will be speaking about how to best automate the management & deployment of database clusters, specifically MongoDB clusters though the same principles apply for MySQL, MariaDB and Percona XtraDB based clusters. Henrik Ingo of MongoDB will be talking about Analytics with MongoDB & Hadoop. And Jim Dowling, a Senior Researcher at the Swedish Institute of Computer Science, will talk

  [Read more...]
No Hadoop Fun for Me at SCaLE 12X :(
+0 Vote Up -0Vote Down
I blogged a couple of weeks ago about my upcoming MySQL/Hadoop talk at SCaLE 12X. Unfortunately I had to cancel. A few days after writing the article I came down with an eye problem that is fixed but prevents me from flying anywhere for a few weeks. That's a pity as I was definitely looking forward to attending the conference and explaining how Tungsten replicates transactions from MySQL into HDFS.

Meanwhile, we are still moving at full steam with Hadoop-related work at Continuent, which is the basis for the next major replication release, Tungsten Replicator 3.0.0. Binary builds and documentation will go up in a few days. There will also be many more public talks about Hadoop support, starting in

  [Read more...]
Why Aren't All Data Immutable?
+0 Vote Up -0Vote Down
Over the last few years there has been an increasing interest in immutable data management. This is a big change from the traditional update-in-place approach many database systems use today, where new values delete old values, which are then lost. With immutable data you record everything, generally using methods that append data from successive transactions rather than replacing them.  In some DBMS types you can access the older values, while in others the system transparently uses the old values to solve useful problems like implementing eventual consistency.

Baron Schwartz recently pointed out that it can be hard to get decent transaction processing performance based on append-only methods like append-only B-trees.  This is not a very strong argument

  [Read more...]
Getting Data into Hadoop in real-time
+0 Vote Up -0Vote Down

Moving data between databases is hard. Without ever intending it, I seem to have spent a lifetime working on solutions for getting data into and out of databases, but more frequently between. In fact, my first job out of university was migrating data from BRS/Text, a free-text database (probably what we would call a NoSQL) into a more structured Oracle.

Today I spend some of my time working in Big Data, more often than not, migrating information from existing data stores into Big Data so that they can be analysed, something I covered in more detail here:

http://www.ibm.com/developerworks/library/bd-sqltohadoop1/index.html
http://www.ibm.com/developerworks/library/bd-sqltohadoop2/index.html


  [Read more...]
Showing entries 1 to 30 of 139 Next 30 Older Entries

Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.