Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 30 of 168 Next 30 Older Entries

Displaying posts with tag: big data (reset)

Hadoop BoF Session at OSCON
+0 Vote Up -0Vote Down

I have a BoF session next week at OSCON next week:

Migrating Data from MySQL and Oracle into Hadoop

The session is at 7pm Tuesday night – look for rooms D135 and/or D137/138.

Correction: We are now in  E144 on Tuesday with the Hadoop get together first at 7pm, and the Data Migration to follow at 8pm.

I’m actually going to be joined by Gwen Shapira from Cloudera, who has a BoF session on Hadoop next door at the same time, along with Eric Herman from Booking.com. We’ll use the opportunity to talk all things Hadoop, but particularly the ingestion of data from MySQL and other databases into the Hadoop datastore.

As always, it’d be great to meet anybody interested in Hadoop at the BoF, please come along and

  [Read more...]
Making Real-Time Analytics a Reality — TDWI -The Data Warehousing Institute
+0 Vote Up -0Vote Down

My article on how to make the real-time processing of information from traditional transactional stores into Hadoop a reality has been published over at TDWI:

Making Real-Time Analytics a Reality — TDWI -The Data Warehousing Institute.


Big Data Integration & ETL - Moving Live Clickstream Data from MongoDB to Hadoop for Analytics
+1 Vote Up -0Vote Down
June 16, 2014 By Severalnines

MongoDB is great at storing clickstream data, but using it to analyze millions of documents can be challenging. Hadoop provides a way of processing and analyzing data at large scale. Since it is a parallel system, workloads can be split on multiple nodes and computations on large datasets can be done in relatively short timeframes. MongoDB data can be moved into Hadoop using ETL tools like Talend or Pentaho Data Integration (Kettle).

 

In this blog, we’ll show you how to integrate your MongoDB and Hadoop datastores using Talend. We have a MongoDB database collecting clickstream data from several websites. We’ll create a job in Talend to extract the documents from MongoDB, transform and then

  [Read more...]
Continuent at Hadoop Summit
+1 Vote Up -0Vote Down

I’m pleased to say that Continuent will be at the Hadoop Summit in San Jose next week (3-5 June). Sadly I will not be attending as I’m taking an exam next week, but my colleagues Robert Hodges, Eero Teerikorpi and Petri Versunen will be there to answer any questions you have about Continuent products, and, of course, Hadoop replication support built into Tungsten Replicator 3.0.

If you are at the conference, please go along and say hi to the team. And, as always, if there are any questions please let them or me know.


Webinar-on-demand: Set up & operate real-time data loading into Hadoop
+1 Vote Up -0Vote Down
Getting data into Hadoop is not difficult, but it is complex if you want to load 'live' or semi-live data into your Hadoop cluster from your Oracle and MySQL databases. There are plenty of solutions available, from manually dumping and loading to the good and bad sides of using a tool like Sqoop. Neither are easy and both prone to the problems of lag between the moment you perform the dump and
Real-Time Data Movement: The Key to Enabling Live Analytics With Hadoop
+0 Vote Up -0Vote Down

An article about moving data into Hadoop in real-time has just been published over at DBTA, written by me and my CEO Robert Hodges.

In the article I talk about one of the major issues for all people deploying databases in the modern heterogenous world – how do we move and migrate data effectively between entirely different database systems in a way that is efficient and usable. How do you get the data you need to the database you need it in. If your source is a transactional database, how does that data get moved into Hadoop in a way that makes the data usable to be queried by Hive, Impala or HBase?

You can read the full article here: Real-Time Data Movement: The Key to

  [Read more...]
Cross your Fingers for Tech14, see you at OSCON
+0 Vote Up -0Vote Down

So I’ve submitted my talks for the Tech14 UK Oracle User Group conference which is in Liverpool this year. I’m not going to give away the topics, but you can imagine they are going to be about data translation and movement and how to get your various databases talking together.

I can also say, after having seen other submissions for talks this year (as I’m helping to judge), that the conference is shaping up to be very interesting. There’s a good spread of different topics this year, but I know from having talked to the organisers that they are looking for more submissions in the areas of Operating Systems, Engineered Systems and

  [Read more...]
Thoughts on Small Datum – Part 3
+0 Vote Up -0Vote Down

Background: If you did not read my first blog post about why I am sharing my thoughts on the benchmarks published by Mark Callaghan on Small Datum you may want to skim through it now for a little context: Thoughts on Small Datum – Part 1”

~~~~~~~~~~~~~~~~~~~~~~~~

Last time, in Thoughts on Small Datum – Part 2 I shared my cliff notes and a graph on Mark Callaghan’s (@markcallaghan) March 11th insertion rate benchmarks using flash storage media. In those tests he compares MySQL (http://www.mysql.com/) outfitted with the

  [Read more...]
Maybe You Should Try Taking a Walk in My Shoes
+0 Vote Up -0Vote Down

The title of this post should really be, “Maybe He Should Try Taking a Walk in Your Shoes.”

The he I’m referring to is economist and author, Tim Harford. The you is the people who use NewSQL and NoSQL approaches to mine big data with database platforms like MySQL (http://www.mysql.com" target="_blank) and MongoDB (or, preferably, our high-performance distributions of them, TokuDB and TokuMX).

Why should Mr. Harford take that walk? Well, he recently

  [Read more...]
Thoughts on Small Datum – Part 2
+0 Vote Up -0Vote Down

If you did not read my first blog post about Mark Callaghan’s (@markcallaghan) benchmarks as documented in his blog, Small Datum, you may want to skim through it now for a little context.

——————-

On March 11th, Mark, a former Google and now Facebook database guru, published an insertion rate benchmark comparing MySQL (http://www.mysql.com) outfitted with the InnoDB storage engine with two NoSQL alternatives — basic MongoDB and TokuMX (the Tokutek high-performance

  [Read more...]
Thoughts on Small Datum – Part 1
+0 Vote Up -0Vote Down

A little background…

When I ventured into sales and marketing (I’m an engineer by education) I learned I would often have to interpret and simply summarize the business value that is sometimes hidden in benchmarks. Simply put, the people who approve the purchase of products like TokuDB® and TokuMX™ appreciate the executive summary.

Therefore, I plan to publish a multipart series here on TokuView where I will share my simple summaries and thoughts on business value for the benchmarks Mark Callaghan (@markcallaghan), a former Google and now Facebook database guru, is publishing on his blog, Small Datum.

I’m going to start with his first benchmark post and work my way forward to

  [Read more...]
Tungsten Replicator 3.0 is Cloudera Enterprise 5 Certified
+0 Vote Up -0Vote Down

One of the key platforms I’ve been testing on for the MySQL to Hadoop replication has been Cloudera, largely driven by customer requirements, but it’s also one of the easiest way to get started with Hadoop.

What I’m even more pleased about is the fact that we are proud to announce that Tungsten Replicator 3.0 is certified for use on the new Cloudera Enterprise 5 platform. That means that we’re sure that replicating your data from MySQL to Cloudera 5 and have it work without causing problems or difficulties on the Hadoop

  [Read more...]
Continuent Replication to Hadoop – Now in Stereo!
+0 Vote Up -0Vote Down

Hopefully by now you have already seen that we are working on Hadoop replication. I’m happy to say that it is going really well. I’ve managed to push a few terabytes of data and different data sets through into Hadoop on Cloudera, HortonWorks, and Amazon’s Elastic MapReduce (EMR). For those who have been following my long association with the IBM InfoSphere BigInsights Hadoop product, and I’m pleased to say that it’s working there too. I’ve had to adapt Robert’s original script to work with the different versions of the underlying Hadoop tools and systems to make it compatible. The actual performance and process is unchanged; you just use a different JS-based batchloader script to work with different tools.

Robert has also been simplifying some of the core functionality, such as configuring some fixed pre-determined

  [Read more...]
Real-Time Data Loading from MySQL to Hadoop using Tungsten Replicator 3.0 Webinar
+0 Vote Up -0Vote Down

To follow-up and describe some of the methods and techniques behind replicating into Hadoop from MySQL in real-time, and how this can be combined into your data workflow, Continuent are running a webinar with me presenting that will go over the details and provide a demo of the data replication process.

Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0

Hadoop is an increasingly popular means of analyzing transaction data from MySQL. Up until now mechanisms for moving data between MySQL and Hadoop have been rather limited. The new Continuent Tungsten Replicator 3.0 provides enterprise-quality replication from MySQL to Hadoop. Tungsten Replicator 3.0 is 100% open source, released under a GPL V2 license, and available for download at

  [Read more...]
March 20 Webinar: How to Scale MySQL for Big Data Applications
+0 Vote Up -0Vote Down

You may think that you have to buy, install, and get up to speed on a new database if you want to work with large amounts of data, but you can do more than you think with the MySQL you already have.
Register Now!

SPEAKER: Jon Tobin, Tokutek
DATE: Thursday, March 20th
TIME: 1pm ET

Without having to change your application or do special tuning you can increase performance and save significant time and money when you need to scale.

Join Tokutek’s Jon Tobin as he demonstrates how to use MySQL or MariaDB in Big Data applications by simply upgrading the storage engine with TokuDB, and how to effectively evaluate TokuDB for increased performance, compression and agility.

During this webinar you will learn:

  • How to dramatically increase performance without having to rewrite



  [Read more...]
Big Data: Three questions to Aerospike.
+0 Vote Up -0Vote Down

“Many tools now exist to run database software without installing software. From vagrant boxes, to one click cloud install, to a cloud service that doesn’t require any installation, developer ease of use has always been a path to storage platform success.”–Brian Bulkowski.

The fifth interview in the “Big Data: three questions to “ series of interviews, is with Brian Bulkowski, Aerospike co-founder and CTO.

RVZ

Q1. What is your current product offering?

Brian Bulkowski: Aerospike is the first in-memory NoSQL database optimized for flash or solid state drives (SSDs).
In-memory for speed and NoSQL for scale. Our

  [Read more...]
Real-Time Replication from MySQL to Cassandra
+1 Vote Up -0Vote Down

Earlier this month I blogged about our new Hadoop applier, I published the docs for that this week (http://docs.continuent.com/tungsten-replicator-3.0/deployment-hadoop.html) as part of the Tungsten Replicator 3.0 documentation (http://docs.continuent.com/tungsten-replicator-3.0/index.html). It contains some additional interesting nuggets that will appear in future blog posts.

The main part of that functionality that performs the actual applier for Hadoop is based around a JavaScript applier engine – there will eventually be docs for that as part of the Batch Applier content (

  [Read more...]
MySQL Cluster to Hadoop
Employee +1 Vote Up -0Vote Down

How do you get data from a MySQL Cluster into Hadoop? Easy, replicate from the cluster to a stand alone MySQL instance and from there use the MySQL Hadoop Applier to HDFS.

This question came from a long time MySQL user who has jumped into the Big Data world.


No Hadoop Fun for Me at SCaLE 12X :(
+0 Vote Up -0Vote Down
I blogged a couple of weeks ago about my upcoming MySQL/Hadoop talk at SCaLE 12X. Unfortunately I had to cancel. A few days after writing the article I came down with an eye problem that is fixed but prevents me from flying anywhere for a few weeks. That's a pity as I was definitely looking forward to attending the conference and explaining how Tungsten replicates transactions from MySQL into HDFS.

Meanwhile, we are still moving at full steam with Hadoop-related work at Continuent, which is the basis for the next major replication release, Tungsten Replicator 3.0.0. Binary builds and documentation will go up in a few days. There will also be many more public talks about Hadoop support, starting in

  [Read more...]
Why Aren't All Data Immutable?
+0 Vote Up -0Vote Down
Over the last few years there has been an increasing interest in immutable data management. This is a big change from the traditional update-in-place approach many database systems use today, where new values delete old values, which are then lost. With immutable data you record everything, generally using methods that append data from successive transactions rather than replacing them.  In some DBMS types you can access the older values, while in others the system transparently uses the old values to solve useful problems like implementing eventual consistency.

Baron Schwartz recently pointed out that it can be hard to get decent transaction processing performance based on append-only methods like append-only B-trees.  This is not a very strong argument

  [Read more...]
Getting Data into Hadoop in real-time
+0 Vote Up -0Vote Down

Moving data between databases is hard. Without ever intending it, I seem to have spent a lifetime working on solutions for getting data into and out of databases, but more frequently between. In fact, my first job out of university was migrating data from BRS/Text, a free-text database (probably what we would call a NoSQL) into a more structured Oracle.

Today I spend some of my time working in Big Data, more often than not, migrating information from existing data stores into Big Data so that they can be analysed, something I covered in more detail here:

http://www.ibm.com/developerworks/library/bd-sqltohadoop1/index.html
http://www.ibm.com/developerworks/library/bd-sqltohadoop2/index.html


  [Read more...]
Fun with MySQL and Hadoop at SCaLE 12X
+0 Vote Up -0Vote Down
It's my pleasure to be presenting at SCaLE 12X on the subject of real-time data loading from MySQL to Hadoop.  This is the first public talk on work at Continuent that enables Tungsten Replicator to move transactions from MySQL to HDFS (Hadoop Distributed File System).  I will explain how replication to Hadoop works, how to set it up, and offer a few words on constructing views of MySQL data using tools like Hive.

As usual with replication everything we are doing on Hadoop replication is open source.  Builds and documentation will be

  [Read more...]
Amazon’s Big Data Suite – Part 2
+0 Vote Up -0Vote Down

 

In Part 1 we started our study of Amazon Services and looked at Amazon EC2. In this part, we will look at other Amazon services like EMR, DynamoDB and RDS.

 

  • 1.      Amazon Elastic Map Reduce
  • Amazon EMR is a web service which makes cloud computing very easy. Amazon’s EMR cluster comes preconfigured with Hadoop, which as mentioned earlier is a data processing and storage framework. This preconfiguration makes it very easy to start analysing your data in no time. Amazon EMR has applications in machine learning, financial analysis, bioinformatics etc.

     

    Just like EC2, you can launch any number of EMR instances as you need and you will only be charged for the computing power you have used.

      [Read more...]
    SQL to Hadoop and back again, Part 3: Direct transfer and live data exchange
    +1 Vote Up -0Vote Down

    The third, and final article in my series on migrating data to and from Hadoop and SQL databases is now available:

    Big data is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions that use SQL. Today, DB2®, Oracle, Microsoft® SQL Server MySQL, and PostgreSQL dominate the SQL space and still make up a considerable proportion of the overall market. In this final article of the series, we will look at more automated solutions for migrating data to and from Hadoop. In the previous articles, we concentrated on methods that take exports or otherwise formatted and extracted data from your SQL source, load that into Hadoop in some way, then process or parse it. But if you want to analyze big data,

      [Read more...]
    Log Buffer #346, A Carnival of the Vanities for DBAs
    +0 Vote Up -0Vote Down

    Economist says that Physics suggest that storms will get worse as the planet warms. Typhoon Haiyan in Philippines, bush-fires in Australia, floods in China, and extreme unpredictable weather across the planet is a sober reminder. Good news is that technology and awareness is rising, and so is the data. Database technologies are playing their part to intelligently store that data and enabling the stakeholders to analyze and get meaningful results to predict and counter the extreme conditions. This Log Buffer Edition appreciates these efforts.

    Big Data:

    Big Data Tools that You Need to Know About – Hadoop & NoSQL.

      [Read more...]
    Copying MySQL Data to Hadoop with Minimal Loss of Blood Part 2
    Employee +2 Vote Up -0Vote Down

    I have spent the better part of the last month at Big Data conferences trying to see behind the $2.5 million in marketing smoke to see what is really going to be showing up on the to-do list of DBAs. The first bit of news is that half the vendors at shows like Strata or Big Data Techon will probably be gone by this time next year. So picking a vendor right now is a little iffy. Hadoop’s ecosystem is flourishing and will surely be around for some time but the vendors are playing musical chairs.

    But we are Open Source and we do not need vendors! Well, yes and no. The good folks at Cloudera and Horton Works have done you a big favor by providing wonderful tutorials that are worth your time to see. Recently two former MySQL-ers, Sarah Sproehnle and Ian Wrigley, have put together

      [Read more...]
    Big Data Tools that You Need to Know About – Hadoop & NoSQL – Part 2
    +0 Vote Up -0Vote Down

     

    In the previous article we introduced Hadoop as the most popular Big Data toolset on the market today. We had just started talking about MapReduce as the major framework that makes Hadoop distinctive. So let’s continue the discussion where we left off.

     

    MapReduce is really the key to understanding Hadoop’s parallel processing functionality as it enables data in various formats (XML, text, binary, log, SQL, ect) to be divided up and mapped out to many computers nodes and then recombined back to produce a final data set.

     

      [Read more...]
    Big Data – What is it and Why is it Important – Part 4
    +0 Vote Up -0Vote Down

     

    In Part 3 of this series we found out that Big Data is a huge revenue generator for business that is expected to drive $232 billion in spending through 2016. In this installment we’ll continue to explore why Big Data must become a critical part of any business strategy in the 21st century.

     

    First, the Bad News

     

    Okay, so we get the notion of what Big Data is and why it’s important for business. So what can be done about it? The first point is to recognize your business strategy and needs as well as the limitations in your current infrastructure. Traditional organizational data warehouses are based on structured, well-organized data sets. Think Oracle,

      [Read more...]
    November 6 Webinar: 5 Pitfalls to Avoid with MySQL and Big Data
    +0 Vote Up -0Vote Down

    You love MySQL for its ease of deployment – but are you worried about how your application will perform when it starts to scale?

    SPEAKER: Gerry Narvaja, Tokutek
    DATE: Wednesday, November 6th
    TIME: 1pm ET
    Register Now!

    Join this interactive webinar with Gerry Narvaja of Tokutek as he walks through the potential pitfalls when using MySQL for Big Data applications, how you can avoid unnecessary tolls on time and resources and tips on how to get the most out of your MySQL applications with open source TokuDB.

    Attend this webinar to learn how to:

    • dramatically increase performance without having to rewrite code
    • reduce the total cost of your servers and



      [Read more...]
    SQL to Hadoop and back again, Part 2: Leveraging HBase and Hive
    +0 Vote Up -0Vote Down

    The second article in a series covering Big Data and SQL interaction is available now:

    “Big data” is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions that use SQL. Today, DB2®, Oracle, Microsoft® SQL Server MySQL, and PostgreSQL dominate the SQL space and still make up a considerable proportion of the overall market. Here in Part 2, we will concentrate on how to use HBase and Hive for exchanging data with your SQL data stores. From the outside, the two systems seem to be largely similar, but the systems have very different goals and aims. Let\’s start by looking at how the two systems differ and how we can take advantage of that in our big data requirements.

      [Read more...]
    Showing entries 1 to 30 of 168 Next 30 Older Entries

    Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

    Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.