Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Showing entries 1 to 30 of 157 Next 30 Older Entries

Displaying posts with tag: big data (reset)

Tungsten Replicator 3.0 is Cloudera Enterprise 5 Certified
+0 Vote Up -0Vote Down

One of the key platforms I’ve been testing on for the MySQL to Hadoop replication has been Cloudera, largely driven by customer requirements, but it’s also one of the easiest way to get started with Hadoop.

What I’m even more pleased about is the fact that we are proud to announce that Tungsten Replicator 3.0 is certified for use on the new Cloudera Enterprise 5 platform. That means that we’re sure that replicating your data from MySQL to Cloudera 5 and have it work without causing problems or difficulties on the Hadoop

  [Read more...]
Continuent Replication to Hadoop – Now in Stereo!
+0 Vote Up -0Vote Down

Hopefully by now you have already seen that we are working on Hadoop replication. I’m happy to say that it is going really well. I’ve managed to push a few terabytes of data and different data sets through into Hadoop on Cloudera, HortonWorks, and Amazon’s Elastic MapReduce (EMR). For those who have been following my long association with the IBM InfoSphere BigInsights Hadoop product, and I’m pleased to say that it’s working there too. I’ve had to adapt Robert’s original script to work with the different versions of the underlying Hadoop tools and systems to make it compatible. The actual performance and process is unchanged; you just use a different JS-based batchloader script to work with different tools.

Robert has also been simplifying some of the core functionality, such as configuring some fixed pre-determined

  [Read more...]
Real-Time Data Loading from MySQL to Hadoop using Tungsten Replicator 3.0 Webinar
+0 Vote Up -0Vote Down

To follow-up and describe some of the methods and techniques behind replicating into Hadoop from MySQL in real-time, and how this can be combined into your data workflow, Continuent are running a webinar with me presenting that will go over the details and provide a demo of the data replication process.

Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0

Hadoop is an increasingly popular means of analyzing transaction data from MySQL. Up until now mechanisms for moving data between MySQL and Hadoop have been rather limited. The new Continuent Tungsten Replicator 3.0 provides enterprise-quality replication from MySQL to Hadoop. Tungsten Replicator 3.0 is 100% open source, released under a GPL V2 license, and available for download at

  [Read more...]
March 20 Webinar: How to Scale MySQL for Big Data Applications
+0 Vote Up -0Vote Down

You may think that you have to buy, install, and get up to speed on a new database if you want to work with large amounts of data, but you can do more than you think with the MySQL you already have.
Register Now!

SPEAKER: Jon Tobin, Tokutek
DATE: Thursday, March 20th
TIME: 1pm ET

Without having to change your application or do special tuning you can increase performance and save significant time and money when you need to scale.

Join Tokutek’s Jon Tobin as he demonstrates how to use MySQL or MariaDB in Big Data applications by simply upgrading the storage engine with TokuDB, and how to effectively evaluate TokuDB for increased performance, compression and agility.

During this webinar you will learn:

  • How to dramatically increase performance without having to rewrite



  [Read more...]
Big Data: Three questions to Aerospike.
+0 Vote Up -0Vote Down

“Many tools now exist to run database software without installing software. From vagrant boxes, to one click cloud install, to a cloud service that doesn’t require any installation, developer ease of use has always been a path to storage platform success.”–Brian Bulkowski.

The fifth interview in the “Big Data: three questions to “ series of interviews, is with Brian Bulkowski, Aerospike co-founder and CTO.

RVZ

Q1. What is your current product offering?

Brian Bulkowski: Aerospike is the first in-memory NoSQL database optimized for flash or solid state drives (SSDs).
In-memory for speed and NoSQL for scale. Our

  [Read more...]
Real-Time Replication from MySQL to Cassandra
+1 Vote Up -0Vote Down

Earlier this month I blogged about our new Hadoop applier, I published the docs for that this week (http://docs.continuent.com/tungsten-replicator-3.0/deployment-hadoop.html) as part of the Tungsten Replicator 3.0 documentation (http://docs.continuent.com/tungsten-replicator-3.0/index.html). It contains some additional interesting nuggets that will appear in future blog posts.

The main part of that functionality that performs the actual applier for Hadoop is based around a JavaScript applier engine – there will eventually be docs for that as part of the Batch Applier content (

  [Read more...]
MySQL Cluster to Hadoop
Employee +1 Vote Up -0Vote Down

How do you get data from a MySQL Cluster into Hadoop? Easy, replicate from the cluster to a stand alone MySQL instance and from there use the MySQL Hadoop Applier to HDFS.

This question came from a long time MySQL user who has jumped into the Big Data world.


No Hadoop Fun for Me at SCaLE 12X :(
+0 Vote Up -0Vote Down
I blogged a couple of weeks ago about my upcoming MySQL/Hadoop talk at SCaLE 12X. Unfortunately I had to cancel. A few days after writing the article I came down with an eye problem that is fixed but prevents me from flying anywhere for a few weeks. That's a pity as I was definitely looking forward to attending the conference and explaining how Tungsten replicates transactions from MySQL into HDFS.

Meanwhile, we are still moving at full steam with Hadoop-related work at Continuent, which is the basis for the next major replication release, Tungsten Replicator 3.0.0. Binary builds and documentation will go up in a few days. There will also be many more public talks about Hadoop support, starting in

  [Read more...]
Why Aren't All Data Immutable?
+0 Vote Up -0Vote Down
Over the last few years there has been an increasing interest in immutable data management. This is a big change from the traditional update-in-place approach many database systems use today, where new values delete old values, which are then lost. With immutable data you record everything, generally using methods that append data from successive transactions rather than replacing them.  In some DBMS types you can access the older values, while in others the system transparently uses the old values to solve useful problems like implementing eventual consistency.

Baron Schwartz recently pointed out that it can be hard to get decent transaction processing performance based on append-only methods like append-only B-trees.  This is not a very strong argument

  [Read more...]
Getting Data into Hadoop in real-time
+0 Vote Up -0Vote Down

Moving data between databases is hard. Without ever intending it, I seem to have spent a lifetime working on solutions for getting data into and out of databases, but more frequently between. In fact, my first job out of university was migrating data from BRS/Text, a free-text database (probably what we would call a NoSQL) into a more structured Oracle.

Today I spend some of my time working in Big Data, more often than not, migrating information from existing data stores into Big Data so that they can be analysed, something I covered in more detail here:

http://www.ibm.com/developerworks/library/bd-sqltohadoop1/index.html
http://www.ibm.com/developerworks/library/bd-sqltohadoop2/index.html


  [Read more...]
Fun with MySQL and Hadoop at SCaLE 12X
+0 Vote Up -0Vote Down
It's my pleasure to be presenting at SCaLE 12X on the subject of real-time data loading from MySQL to Hadoop.  This is the first public talk on work at Continuent that enables Tungsten Replicator to move transactions from MySQL to HDFS (Hadoop Distributed File System).  I will explain how replication to Hadoop works, how to set it up, and offer a few words on constructing views of MySQL data using tools like Hive.

As usual with replication everything we are doing on Hadoop replication is open source.  Builds and documentation will be

  [Read more...]
Amazon’s Big Data Suite – Part 2
+0 Vote Up -0Vote Down

 

In Part 1 we started our study of Amazon Services and looked at Amazon EC2. In this part, we will look at other Amazon services like EMR, DynamoDB and RDS.

 

  • 1.      Amazon Elastic Map Reduce
  • Amazon EMR is a web service which makes cloud computing very easy. Amazon’s EMR cluster comes preconfigured with Hadoop, which as mentioned earlier is a data processing and storage framework. This preconfiguration makes it very easy to start analysing your data in no time. Amazon EMR has applications in machine learning, financial analysis, bioinformatics etc.

     

    Just like EC2, you can launch any number of EMR instances as you need and you will only be charged for the computing power you have used.

      [Read more...]
    SQL to Hadoop and back again, Part 3: Direct transfer and live data exchange
    +1 Vote Up -0Vote Down

    The third, and final article in my series on migrating data to and from Hadoop and SQL databases is now available:

    Big data is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions that use SQL. Today, DB2®, Oracle, Microsoft® SQL Server MySQL, and PostgreSQL dominate the SQL space and still make up a considerable proportion of the overall market. In this final article of the series, we will look at more automated solutions for migrating data to and from Hadoop. In the previous articles, we concentrated on methods that take exports or otherwise formatted and extracted data from your SQL source, load that into Hadoop in some way, then process or parse it. But if you want to analyze big data,

      [Read more...]
    Log Buffer #346, A Carnival of the Vanities for DBAs
    +0 Vote Up -0Vote Down

    Economist says that Physics suggest that storms will get worse as the planet warms. Typhoon Haiyan in Philippines, bush-fires in Australia, floods in China, and extreme unpredictable weather across the planet is a sober reminder. Good news is that technology and awareness is rising, and so is the data. Database technologies are playing their part to intelligently store that data and enabling the stakeholders to analyze and get meaningful results to predict and counter the extreme conditions. This Log Buffer Edition appreciates these efforts.

    Big Data:

    Big Data Tools that You Need to Know About – Hadoop & NoSQL.

      [Read more...]
    Copying MySQL Data to Hadoop with Minimal Loss of Blood Part 2
    Employee +2 Vote Up -0Vote Down

    I have spent the better part of the last month at Big Data conferences trying to see behind the $2.5 million in marketing smoke to see what is really going to be showing up on the to-do list of DBAs. The first bit of news is that half the vendors at shows like Strata or Big Data Techon will probably be gone by this time next year. So picking a vendor right now is a little iffy. Hadoop’s ecosystem is flourishing and will surely be around for some time but the vendors are playing musical chairs.

    But we are Open Source and we do not need vendors! Well, yes and no. The good folks at Cloudera and Horton Works have done you a big favor by providing wonderful tutorials that are worth your time to see. Recently two former MySQL-ers, Sarah Sproehnle and Ian Wrigley, have put together

      [Read more...]
    Big Data Tools that You Need to Know About – Hadoop & NoSQL – Part 2
    +0 Vote Up -0Vote Down

     

    In the previous article we introduced Hadoop as the most popular Big Data toolset on the market today. We had just started talking about MapReduce as the major framework that makes Hadoop distinctive. So let’s continue the discussion where we left off.

     

    MapReduce is really the key to understanding Hadoop’s parallel processing functionality as it enables data in various formats (XML, text, binary, log, SQL, ect) to be divided up and mapped out to many computers nodes and then recombined back to produce a final data set.

     

      [Read more...]
    Big Data – What is it and Why is it Important – Part 4
    +0 Vote Up -0Vote Down

     

    In Part 3 of this series we found out that Big Data is a huge revenue generator for business that is expected to drive $232 billion in spending through 2016. In this installment we’ll continue to explore why Big Data must become a critical part of any business strategy in the 21st century.

     

    First, the Bad News

     

    Okay, so we get the notion of what Big Data is and why it’s important for business. So what can be done about it? The first point is to recognize your business strategy and needs as well as the limitations in your current infrastructure. Traditional organizational data warehouses are based on structured, well-organized data sets. Think Oracle,

      [Read more...]
    November 6 Webinar: 5 Pitfalls to Avoid with MySQL and Big Data
    +0 Vote Up -0Vote Down

    You love MySQL for its ease of deployment – but are you worried about how your application will perform when it starts to scale?

    SPEAKER: Gerry Narvaja, Tokutek
    DATE: Wednesday, November 6th
    TIME: 1pm ET
    Register Now!

    Join this interactive webinar with Gerry Narvaja of Tokutek as he walks through the potential pitfalls when using MySQL for Big Data applications, how you can avoid unnecessary tolls on time and resources and tips on how to get the most out of your MySQL applications with open source TokuDB.

    Attend this webinar to learn how to:

    • dramatically increase performance without having to rewrite code
    • reduce the total cost of your servers and



      [Read more...]
    SQL to Hadoop and back again, Part 2: Leveraging HBase and Hive
    +0 Vote Up -0Vote Down

    The second article in a series covering Big Data and SQL interaction is available now:

    “Big data” is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions that use SQL. Today, DB2®, Oracle, Microsoft® SQL Server MySQL, and PostgreSQL dominate the SQL space and still make up a considerable proportion of the overall market. Here in Part 2, we will concentrate on how to use HBase and Hive for exchanging data with your SQL data stores. From the outside, the two systems seem to be largely similar, but the systems have very different goals and aims. Let\’s start by looking at how the two systems differ and how we can take advantage of that in our big data requirements.

      [Read more...]
    Data Analytics at NBCUniversal. Interview with Matthew Eric Bassett.
    +0 Vote Up -0Vote Down
    “The most valuable thing I’ve learned in this role is that judicious use of a little bit of knowledge can go a long way. I’ve seen colleagues and other companies get caught up in the “Big Data” craze by spend hundreds of thousands of pounds sterling on a Hadoop cluster that sees a few megabytes [...]
    Copying MySQL Data to Hadoop with Minimal Loss of Blood Part 1
    Employee +1 Vote Up -0Vote Down

    Ask ten DBAs for a definition of ‘Big Data’ and you well get more than ten replies. And the majority of those replies will lead you to Hadoop. Hadoop has been the most prominent of the big data frameworks in the open source world. Over 80% of the Hadoop instances in the world are feed their data from MySQL1. But Hadoop is made up of many parts, some confusing and many that do not play nicely with each other. It is analogous to being given a pile of automotive parts from different models and tyring to come up with a car at the end of the day. So what if you do if you are wanting to copy some of your relational data into Hadoop and want to avoid the equivilent of scraped knuckles? The answer is Bigtop and what follows is a way to get a one node does all system running so you can experiement with Hadoop, Map/Reduce, Hive, and all

      [Read more...]
    Big Data.. So what? Part 2
    +0 Vote Up -2Vote Down
    Sorry for this delay in providing part 2 of this series, but stuff happened that had really high priority, and in addition I was on vacation. But now I'm back in business!

    So, last time I left you with some open thought on why Big Data can be useful, but that we also need new analysis tools as well as new ways of visualizing data for this to be truly useful. As for analysis, lets have a look at text, which should be simple enough, right? And sometimes it is simple. One useful analysis tool that is often overlooked is Google. Let's give it a shot, just for fun: if I think of two fierce competitors, somehow, that we can compare, say Oracle and MySQL.. Oracle is much older, both as a technology and as a company and in addition owns the MySQL brand these days. But on the other hand, the Web is where MySQL has it's sweet spot. Just Googling for MySQL and Oracle shows

      [Read more...]
    Big Data with MySQL and Hadoop at MySQL Connect 2013
    +1 Vote Up -0Vote Down

    I will be talking about Big Data with MySQL and Hadoop at MySQL Connect 2013 (Sept. 21-22) in San Francisco as well as at Percona University at Washington, DC (September 12, 2013). Apache Hadoop is a very popular Big Data solution and we can nowadays easily integrate it with MySQL. I will start with a brief introduction of Apache Hadoop and its components (HFDS, Map/Reduce, Hive, HBase/HCatalog, Flume, Scoop, etc). Next I will show 2 major Big Data scenarios:

    • From file to Hadoop to MySQL. This is an example of “ELT” process: Extract data from external source; Load data into Hadoop; Transform
      [Read more...]
    Big Data.. So what? Part 1
    +0 Vote Up -0Vote Down
    This is the first blog post in a series where I hope to raise a bit above the technical stuff and instead focus on how we can put Big Data to effective use. I ran a SkySQL Webinar on the subject recently that you might also want to watch, and a recording is available here:http://bit.ly/17TTQnJ

    Yes, so what? Why do you need or want all that data? All data you need from your customers you have in your Data Warehouse, and all data you need on the market you are in, you can get from some analyst? Right?

    Well, yes, that is one source of data, but there is more to it than that. The deal with Data is that once you have enough of it, you can start to see things you haven't seen before. Trend analysis is only relevant when you have enough data, and the more you have, the more accurate it gets.Big Data is



      [Read more...]
    Big Data from Space: the “Herschel” telescope.
    +0 Vote Up -0Vote Down
    ” One of the biggest challenges with any project of such a long duration is coping with change. There are many aspects to coping with change, including changes in requirements, changes in technology, vendor stability, changes in staffing and so on”–Jon Brumfitt. On May 14, 2009, the European Space Agency launched an Arianne 5 rocket [...]
    Big data processing with Disco
    +0 Vote Up -0Vote Down

    Those who deal with big data probably know about Disco – a distributed computing framework aimed to provide a MapReduce platform for big data processing Python applications. We are proud to say that we are one of the largest users of Disco in the Netherlands. As an owner of multiple high-traffic portals with lots of […]

    The post Big data processing with Disco appeared first on Spil Games Engineering.

    On Oracle NoSQL Database –Interview with Dave Segleau.
    +0 Vote Up -0Vote Down
    “We went down the path of building Oracle NoSQL database because of explicit request from some of our largest Oracle Berkeley DB installations that wanted to move away from maintaining home grown sharding implementations and very much wanted an out of box technology that can replicate the robustness of what they had built “out of [...]
    A new big data structure for streaming counters - bit length encoding
    +1 Vote Up -0Vote Down
    One of the challenges of big data is that it is, well, big. Computers are optimized for math on 64 bits or less. Any bigger, and extra steps have to be taken to work with the data which is very expensive. This is why a BIGINT is 64 bits.  In MySQL DECIMAL can store more than 64 bits of data using fixed precision.  Large numbers can use FLOAT or DECIMAL but those data types are lossy.

    DECIMAL is an expensive encoding. Fixed precision math is expensive and you eventually run out of precision at which point you can't store any more data, right?

    What happens when you want to store a counter that is bigger than the maximum DECIMAL?  FLOAT is lossy.  What if you need an /exact/ count of a very big number without using very much space?

    I've developed an encoding method that allows you to store very large counters in a very small amount of space. It takes





      [Read more...]
    On PostgreSQL. Interview with Tom Kincaid.
    +0 Vote Up -1Vote Down
    “Application designers need to start by thinking about what level of data integrity they need, rather than what they want, and then design their technology stack around that reality. Everyone would like a database that guarantees perfect availability, perfect consistency, instantaneous response times, and infinite throughput, but it´s not possible to create a product with [...]
    The Needle in Big Data Noise
    +0 Vote Up -0Vote Down

    Read the original article at The Needle in Big Data Noise

    Join 5500 others and follow Sean Hull on twitter @hullsean. Also take a look at: I hacked Disqus Digests to discover new blogs Who the heck is Bayes Thomas Bayes was a scientist & thinker, Fellow of the Royal Society, and back in 1763 author of “An Essay toward Solving a Problem in the Doctrine [...]

    For more articles like these go to Sean Hull's Scalable Startups

    Related posts:
  • Big Data – What is it and why is it important?
  • NYC Tech Firms Are Hiring – Map
  •   [Read more...]
    Showing entries 1 to 30 of 157 Next 30 Older Entries

    Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

    Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.