|Showing entries 1 to 30 of 153||Next 30 Older Entries|
Earlier this month I blogged about our new Hadoop applier, I published the docs for that this week (http://docs.continuent.com/tungsten-replicator-3.0/deployment-hadoop.html) as part of the Tungsten Replicator 3.0 documentation (http://docs.continuent.com/tungsten-replicator-3.0/index.html). It contains some additional interesting nuggets that will appear in future blog posts.
How do you get data from a MySQL Cluster into Hadoop? Easy, replicate from the cluster to a stand alone MySQL instance and from there use the MySQL Hadoop Applier to HDFS.
This question came from a long time MySQL user who has jumped into the Big Data world.
Moving data between databases is hard. Without ever intending it, I seem to have spent a lifetime working on solutions for getting data into and out of databases, but more frequently between. In fact, my first job out of university was migrating data from BRS/Text, a free-text database (probably what we would call a NoSQL) into a more structured Oracle.
Today I spend some of my time working in Big Data, more often than not, migrating information from existing data stores into Big Data so that they can be analysed, something I covered in more detail here:[Read more...]
In Part 1 we started our study of Amazon Services and looked at Amazon EC2. In this part, we will look at other Amazon services like EMR, DynamoDB and RDS.
Amazon EMR is a web service which makes cloud computing very easy. Amazon’s EMR cluster comes preconfigured with Hadoop, which as mentioned earlier is a data processing and storage framework. This preconfiguration makes it very easy to start analysing your data in no time. Amazon EMR has applications in machine learning, financial analysis, bioinformatics etc.
Just like EC2, you can launch any number of EMR instances as you need and you will only be charged for the computing power you have used.[Read more...]
The third, and final article in my series on migrating data to and from Hadoop and SQL databases is now available:
Big data is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions that use SQL. Today, DB2®, Oracle, Microsoft® SQL Server MySQL, and PostgreSQL dominate the SQL space and still make up a considerable proportion of the overall market. In this final article of the series, we will look at more automated solutions for migrating data to and from Hadoop. In the previous articles, we concentrated on methods that take exports or otherwise formatted and extracted data from your SQL source, load that into Hadoop in some way, then process or parse it. But if you want to analyze big data,
Economist says that Physics suggest that storms will get worse as the planet warms. Typhoon Haiyan in Philippines, bush-fires in Australia, floods in China, and extreme unpredictable weather across the planet is a sober reminder. Good news is that technology and awareness is rising, and so is the data. Database technologies are playing their part to intelligently store that data and enabling the stakeholders to analyze and get meaningful results to predict and counter the extreme conditions. This Log Buffer Edition appreciates these efforts.
Big Data Tools that You Need to Know About – Hadoop & NoSQL.[Read more...]
I have spent the better part of the last month at Big Data conferences trying to see behind the $2.5 million in marketing smoke to see what is really going to be showing up on the to-do list of DBAs. The first bit of news is that half the vendors at shows like Strata or Big Data Techon will probably be gone by this time next year. So picking a vendor right now is a little iffy. Hadoop’s ecosystem is flourishing and will surely be around for some time but the vendors are playing musical chairs.
But we are Open Source and we do not need vendors! Well, yes and no. The good folks at Cloudera and Horton Works have done you a big favor by providing wonderful tutorials that are worth your time to see. Recently two former MySQL-ers, Sarah Sproehnle and Ian Wrigley, have put together[Read more...]
In the previous article we introduced Hadoop as the most popular Big Data toolset on the market today. We had just started talking about MapReduce as the major framework that makes Hadoop distinctive. So let’s continue the discussion where we left off.
MapReduce is really the key to understanding Hadoop’s parallel processing functionality as it enables data in various formats (XML, text, binary, log, SQL, ect) to be divided up and mapped out to many computers nodes and then recombined back to produce a final data set.
In Part 3 of this series we found out that Big Data is a huge revenue generator for business that is expected to drive $232 billion in spending through 2016. In this installment we’ll continue to explore why Big Data must become a critical part of any business strategy in the 21st century.
First, the Bad News
Okay, so we get the notion of what Big Data is and why it’s important for business. So what can be done about it? The first point is to recognize your business strategy and needs as well as the limitations in your current infrastructure. Traditional organizational data warehouses are based on structured, well-organized data sets. Think Oracle,[Read more...]
You love MySQL for its ease of deployment – but are you worried about how your application will perform when it starts to scale?
SPEAKER: Gerry Narvaja, Tokutek
DATE: Wednesday, November 6th
TIME: 1pm ET
Join this interactive webinar with Gerry Narvaja of Tokutek as he walks through the potential pitfalls when using MySQL for Big Data applications, how you can avoid unnecessary tolls on time and resources and tips on how to get the most out of your MySQL applications with open source TokuDB.
Attend this webinar to learn how to:
The second article in a series covering Big Data and SQL interaction is available now:
“Big data” is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions that use SQL. Today, DB2®, Oracle, Microsoft® SQL Server MySQL, and PostgreSQL dominate the SQL space and still make up a considerable proportion of the overall market. Here in Part 2, we will concentrate on how to use HBase and Hive for exchanging data with your SQL data stores. From the outside, the two systems seem to be largely similar, but the systems have very different goals and aims. Let\’s start by looking at how the two systems differ and how we can take advantage of that in our big data requirements.[Read more...]
Ask ten DBAs for a definition of ‘Big Data’ and you well get more than ten replies. And the majority of those replies will lead you to Hadoop. Hadoop has been the most prominent of the big data frameworks in the open source world. Over 80% of the Hadoop instances in the world are feed their data from MySQL1. But Hadoop is made up of many parts, some confusing and many that do not play nicely with each other. It is analogous to being given a pile of automotive parts from different models and tyring to come up with a car at the end of the day. So what if you do if you are wanting to copy some of your relational data into Hadoop and want to avoid the equivilent of scraped knuckles? The answer is Bigtop and what follows is a way to get a one node does all system running so you can experiement with Hadoop, Map/Reduce, Hive, and all[Read more...]
I will be talking about Big Data with MySQL and Hadoop at MySQL Connect 2013 (Sept. 21-22) in San Francisco as well as at Percona University at Washington, DC (September 12, 2013). Apache Hadoop is a very popular Big Data solution and we can nowadays easily integrate it with MySQL. I will start with a brief introduction of Apache Hadoop and its components (HFDS, Map/Reduce, Hive, HBase/HCatalog, Flume, Scoop, etc). Next I will show 2 major Big Data scenarios:
Those who deal with big data probably know about Disco – a distributed computing framework aimed to provide a MapReduce platform for big data processing Python applications. We are proud to say that we are one of the largest users of Disco in the Netherlands. As an owner of multiple high-traffic portals with lots of […]
Read the original article at The Needle in Big Data Noise
Join 5500 others and follow Sean Hull on twitter @hullsean. Also take a look at: I hacked Disqus Digests to discover new blogs Who the heck is Bayes Thomas Bayes was a scientist & thinker, Fellow of the Royal Society, and back in 1763 author of “An Essay toward Solving a Problem in the Doctrine [...]
For more articles like these go to Sean Hull's Scalable StartupsRelated posts:
With this version, the source code is now freely available under the GPL License v2. For more details, see our blog here. Open source pioneer Mozilla has been using TokuDB to manage its MySQL-driven Datazilla Data cluster, an open-source system for managing and visualizing performance data.
Date: May 2nd
Time: 2 PM EST / 11 AM PST
In the past TokuDB has been free for evaluation; the new TokuDB Community Edition extends free use to deployed environments. With this release Tokutek is also planning on making available a TokuDB Enterprise Edition, which includes technical support,[Read more...]
We wanted to thank everyone for naming Tokutek the Corporate Contributor of the Year 2013 for ongoing contribution to the MySQL community.
The MySQL Community Awards are given annually to the people and companies that support the MySQL ecosystem. The MySQL Community Award for Corporate Contributor of the Year recognizes a company or other organization or entity that has made valuable contributions to the MySQL ecosystem either in terms of open source code, knowledge,[Read more...]
|Showing entries 1 to 30 of 153||Next 30 Older Entries|