Showing entries 51 to 60 of 164
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: hadoop (reset)
Real-time data loading from MySQL to Hadoop

Hadoop is an increasingly popular means of analyzing transaction data from MySQL. Up until now mechanisms for moving data between MySQL and Hadoop have been rather limited. Continuent Tungsten Replicator provides enterprise-quality replication from MySQL to Hadoop under a GPL V2 license.  Continuent Tungsten handles MySQL transaction types including INSERT/UPDATE/DELETE operations and can

MongoDB and Hadoop - Stockholm MongoDB User Group Meetup - Monday, March 3, 2014

February 27, 2014 By Severalnines

 

Stockholm MongoDB User Group Meetup: “MongoDB and Hadoop” Monday, March 3, 2014 starting @ 5:00 PM

 

Join us next Monday as we host the Stockholm MongoDB User Group Meetup in Kista, or the Wireless Valley as it is also referred to. 

 

Our very own Vinay Joosery will be speaking about how to best automate the management & deployment of database clusters, specifically MongoDB clusters though the same principles apply for MySQL, MariaDB and Percona XtraDB based clusters. Henrik Ingo of MongoDB will be talking about Analytics with MongoDB & Hadoop. And Jim Dowling, a Senior Researcher at the Swedish Institute of Computer Science, will talk about a Hadoop PaaS platform. 

 

So whether you’re from the MySQL or NoSQL world, there’ll be plenty of good content here to walk away with in addition to …

[Read more]
No Hadoop Fun for Me at SCaLE 12X :(

I blogged a couple of weeks ago about my upcoming MySQL/Hadoop talk at SCaLE 12X. Unfortunately I had to cancel. A few days after writing the article I came down with an eye problem that is fixed but prevents me from flying anywhere for a few weeks. That's a pity as I was definitely looking forward to attending the conference and explaining how Tungsten replicates transactions from MySQL into HDFS.

Meanwhile, we are still moving at full steam with Hadoop-related work at Continuent, which is the basis for the next major replication release, Tungsten Replicator 3.0.0. Binary builds and documentation will go up in a few days. There will also be many more public talks about Hadoop support, starting in April at …

[Read more]
Why Aren't All Data Immutable?

Over the last few years there has been an increasing interest in immutable data management. This is a big change from the traditional update-in-place approach many database systems use today, where new values delete old values, which are then lost. With immutable data you record everything, generally using methods that append data from successive transactions rather than replacing them.  In some DBMS types you can access the older values, while in others the system transparently uses the old values to solve useful problems like implementing eventual consistency.

Baron Schwartz recently pointed out that it can be hard to get decent transaction processing performance based on append-only methods like append-only B-trees.  This is not a very strong argument against immutable data per se. …

[Read more]
Getting Data into Hadoop in real-time

Moving data between databases is hard. Without ever intending it, I seem to have spent a lifetime working on solutions for getting data into and out of databases, but more frequently between. In fact, my first job out of university was migrating data from BRS/Text, a free-text database (probably what we would call a NoSQL) into a more structured Oracle.

Today I spend some of my time working in Big Data, more often than not, migrating information from existing data stores into Big Data so that they can be analysed, something I covered in more detail here:

http://www.ibm.com/developerworks/library/bd-sqltohadoop1/index.html
http://www.ibm.com/developerworks/library/bd-sqltohadoop2/index.html

[Read more]
Fun with MySQL and Hadoop at SCaLE 12X

It's my pleasure to be presenting at SCaLE 12X on the subject of real-time data loading from MySQL to Hadoop.  This is the first public talk on work at Continuent that enables Tungsten Replicator to move transactions from MySQL to HDFS (Hadoop Distributed File System).  I will explain how replication to Hadoop works, how to set it up, and offer a few words on constructing views of MySQL data using tools like Hive.

As usual with replication everything we are doing on Hadoop replication is open source.  Builds and documentation will be publicly available …

[Read more]
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics

Oracle is the most powerful DBMS in the world. However, Oracle's expensive and complex replication makes it difficult to build highly available applications or move data in real-time to data warehouses and popular databases like MySQL. In this webinar-on-demand you will learn how Continuent Tungsten solves problems with Oracle replication at a fraction of the cost of other solutions and with less

SQL to Hadoop and back again, Part 3: Direct transfer and live data exchange

The third, and final article in my series on migrating data to and from Hadoop and SQL databases is now available:

Big data is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions that use SQL. Today, DB2®, Oracle, Microsoft® SQL Server MySQL, and PostgreSQL dominate the SQL space and still make up a considerable proportion of the overall market. In this final article of the series, we will look at more automated solutions for migrating data to and from Hadoop. In the previous articles, we concentrated on methods that take exports or otherwise formatted and extracted data from your SQL source, load that into Hadoop in some way, then process or parse it. But if you want to analyze big data, you probably don’t want to wait while exporting the data. Here, we’re going to look at some methods and tools that enable a …

[Read more]
Copying MySQL Data to Hadoop with Minimal Loss of Blood Part 2

I have spent the better part of the last month at Big Data conferences trying to see behind the $2.5 million in marketing smoke to see what is really going to be showing up on the to-do list of DBAs. The first bit of news is that half the vendors at shows like Strata or Big Data Techon will probably be gone by this time next year. So picking a vendor right now is a little iffy. Hadoop’s ecosystem is flourishing and will surely be around for some time but the vendors are playing musical chairs.

But we are Open Source and we do not need vendors! Well, yes and no. The good folks at Cloudera and Horton Works have done you a big favor by providing wonderful tutorials that are worth your time to see. Recently two former MySQL-ers, Sarah Sproehnle and Ian Wrigley, have put together Udacity that concisely teaches Hadoop technology and Cloudera deserves a round of applause for this …

[Read more]
Big Data Tools that You Need to Know About – Hadoop & NoSQL – Part 2

 

In the previous article we introduced Hadoop as the most popular Big Data toolset on the market today. We had just started talking about MapReduce as the major framework that makes Hadoop distinctive. So let’s continue the discussion where we left off.

 

MapReduce is really the key to understanding Hadoop’s parallel processing functionality as it enables data in various formats (XML, text, binary, log, SQL, ect) to be divided up and mapped out to many computers nodes and then recombined back to produce a final data set.

 

 

[Read more]
Showing entries 51 to 60 of 164
« 10 Newer Entries | 10 Older Entries »