Planet MySQL

Displaying posts with tag: big data (reset)

Feb

2014

Posted by Dave Stokes on Tue 25 Feb 2014 19:30 UTC
Tags:

Replication, MySQL Cluster, big data

How do you get data from a MySQL Cluster into Hadoop? Easy, replicate from the cluster to a stand alone MySQL instance and from there use the MySQL Hadoop Applier to HDFS.

This question came from a long time MySQL user who has jumped into the Big Data world.

Feb

2014

No Hadoop Fun for Me at SCaLE 12X :(

Posted by Robert Hodges on Thu 20 Feb 2014 20:25 UTC
Tags:

Replication, hadoop, tungsten, big data, mariadb, MySQL

I blogged a couple of weeks ago about my upcoming MySQL/Hadoop talk at SCaLE 12X. Unfortunately I had to cancel. A few days after writing the article I came down with an eye problem that is fixed but prevents me from flying anywhere for a few weeks. That's a pity as I was definitely looking forward to attending the conference and explaining how Tungsten replicates transactions from MySQL into HDFS.

Meanwhile, we are still moving at full steam with Hadoop-related work at Continuent, which is the basis for the next major replication release, Tungsten Replicator 3.0.0. Binary builds and documentation will go up in a few days. There will also be many more public talks about Hadoop support, starting in April at …

[Read more]

Feb

2014

Why Aren't All Data Immutable?

Posted by Robert Hodges on Tue 18 Feb 2014 04:00 UTC
Tags:

postgresql, Oracle, IT Industry, hadoop, SaaS, big data, vertica, MySQL

Over the last few years there has been an increasing interest in immutable data management. This is a big change from the traditional update-in-place approach many database systems use today, where new values delete old values, which are then lost. With immutable data you record everything, generally using methods that append data from successive transactions rather than replacing them. In some DBMS types you can access the older values, while in others the system transparently uses the old values to solve useful problems like implementing eventual consistency.

Baron Schwartz recently pointed out that it can be hard to get decent transaction processing performance based on append-only methods like append-only B-trees. This is not a very strong argument against immutable data per se. …

[Read more]

Feb

2014

Getting Data into Hadoop in real-time

Posted by MC Brown on Wed 12 Feb 2014 13:21 UTC
Tags:

Databases, hadoop, data mining, continuent, big data, MySQL, Coalface

Moving data between databases is hard. Without ever intending it, I seem to have spent a lifetime working on solutions for getting data into and out of databases, but more frequently between. In fact, my first job out of university was migrating data from BRS/Text, a free-text database (probably what we would call a NoSQL) into a more structured Oracle.

Today I spend some of my time working in Big Data, more often than not, migrating information from existing data stores into Big Data so that they can be analysed, something I covered in more detail here:

http://www.ibm.com/developerworks/library/bd-sqltohadoop1/index.html
http://www.ibm.com/developerworks/library/bd-sqltohadoop2/index.html
…

[Read more]

Feb

2014

Fun with MySQL and Hadoop at SCaLE 12X

Posted by Robert Hodges on Sat 08 Feb 2014 01:07 UTC
Tags:

Open Source, Replication, hadoop, tungsten, big data, mariadb, MySQL

It's my pleasure to be presenting at SCaLE 12X on the subject of real-time data loading from MySQL to Hadoop. This is the first public talk on work at Continuent that enables Tungsten Replicator to move transactions from MySQL to HDFS (Hadoop Distributed File System). I will explain how replication to Hadoop works, how to set it up, and offer a few words on constructing views of MySQL data using tools like Hive.

As usual with replication everything we are doing on Hadoop replication is open source. Builds and documentation will be publicly available …

[Read more]

Jan

2014

Amazon’s Big Data Suite – Part 2

Posted by Hovhannes Avoyan on Mon 06 Jan 2014 10:53 UTC
Tags:

News, big data, Amazon Web Services, Industry Info, what is big data, amazon aws

In Part 1 we started our study of Amazon Services and looked at Amazon EC2. In this part, we will look at other Amazon services like EMR, DynamoDB and RDS.

1. Amazon Elastic Map Reduce

Amazon EMR is a web service which makes cloud computing very easy. Amazon’s EMR cluster comes preconfigured with Hadoop, which as mentioned earlier is a data processing and storage framework. This preconfiguration makes it very easy to start analysing your data in no time. Amazon EMR has applications in machine learning, financial analysis, bioinformatics etc.

Just like EC2, you can launch any number of EMR instances as you need and you will only be charged for the computing power you have used. EMR is preconfigured …

[Read more]

Dec

2013

SQL to Hadoop and back again, Part 3: Direct transfer and live data exchange

Posted by MC Brown on Mon 23 Dec 2013 10:32 UTC
Tags:

Articles, hadoop, big data, MySQL, ibmdeveloperworks

The third, and final article in my series on migrating data to and from Hadoop and SQL databases is now available:

Big data is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions that use SQL. Today, DB2®, Oracle, Microsoft® SQL Server MySQL, and PostgreSQL dominate the SQL space and still make up a considerable proportion of the overall market. In this final article of the series, we will look at more automated solutions for migrating data to and from Hadoop. In the previous articles, we concentrated on methods that take exports or otherwise formatted and extracted data from your SQL source, load that into Hadoop in some way, then process or parse it. But if you want to analyze big data, you probably don’t want to wait while exporting the data. Here, we’re going to look at some methods and tools that enable a …

[Read more]

Nov

2013

Log Buffer #346, A Carnival of the Vanities for DBAs

Posted by The Pythian Group on Fri 15 Nov 2013 14:39 UTC
Tags:

Oracle, Log Buffer, SQL Server, big data, NoSQL, MySQL

Economist says that Physics suggest that storms will get worse as the planet warms. Typhoon Haiyan in Philippines, bush-fires in Australia, floods in China, and extreme unpredictable weather across the planet is a sober reminder. Good news is that technology and awareness is rising, and so is the data. Database technologies are playing their part to intelligently store that data and enabling the stakeholders to analyze and get meaningful results to predict and counter the extreme conditions. This Log Buffer Edition appreciates these efforts.

Big Data:

Big Data Tools that You Need to Know About – Hadoop & NoSQL.

Dave Stokes is …

[Read more]

Nov

2013

Copying MySQL Data to Hadoop with Minimal Loss of Blood Part 2

Posted by Dave Stokes on Thu 14 Nov 2013 19:09 UTC
Tags:

hadoop, big data, MySQL

I have spent the better part of the last month at Big Data conferences trying to see behind the $2.5 million in marketing smoke to see what is really going to be showing up on the to-do list of DBAs. The first bit of news is that half the vendors at shows like Strata or Big Data Techon will probably be gone by this time next year. So picking a vendor right now is a little iffy. Hadoop’s ecosystem is flourishing and will surely be around for some time but the vendors are playing musical chairs.

But we are Open Source and we do not need vendors! Well, yes and no. The good folks at Cloudera and Horton Works have done you a big favor by providing wonderful tutorials that are worth your time to see. Recently two former MySQL-ers, Sarah Sproehnle and Ian Wrigley, have put together Udacity that concisely teaches Hadoop technology and Cloudera deserves a round of applause for this …

[Read more]

Nov

2013

Big Data Tools that You Need to Know About – Hadoop & NoSQL – Part 2

Posted by Hovhannes Avoyan on Wed 13 Nov 2013 11:08 UTC
Tags:

News, hadoop, big data, NoSQL, Apache Hadoop, Industry Info

In the previous article we introduced Hadoop as the most popular Big Data toolset on the market today. We had just started talking about MapReduce as the major framework that makes Hadoop distinctive. So let’s continue the discussion where we left off.

MapReduce is really the key to understanding Hadoop’s parallel processing functionality as it enables data in various formats (XML, text, binary, log, SQL, ect) to be divided up and mapped out to many computers nodes and then recombined back to produce a final data set.

…

[Read more]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links