Home |  MySQL Buzz |  FAQ |  Feeds |  Submit your blog feed |  Feedback |  Archive |  Aggregate feed RSS 2.0 English Deutsch Español Français Italiano 日本語 Русский Português 中文
Previous 30 Newer Entries Showing entries 31 to 60 of 141 Next 30 Older Entries

Displaying posts with tag: hadoop (reset)

Why Aren't All Data Immutable?
+0 Vote Up -0Vote Down
Over the last few years there has been an increasing interest in immutable data management. This is a big change from the traditional update-in-place approach many database systems use today, where new values delete old values, which are then lost. With immutable data you record everything, generally using methods that append data from successive transactions rather than replacing them.  In some DBMS types you can access the older values, while in others the system transparently uses the old values to solve useful problems like implementing eventual consistency.

Baron Schwartz recently pointed out that it can be hard to get decent transaction processing performance based on append-only methods like append-only B-trees.  This is not a very strong argument

  [Read more...]
Getting Data into Hadoop in real-time
+0 Vote Up -0Vote Down

Moving data between databases is hard. Without ever intending it, I seem to have spent a lifetime working on solutions for getting data into and out of databases, but more frequently between. In fact, my first job out of university was migrating data from BRS/Text, a free-text database (probably what we would call a NoSQL) into a more structured Oracle.

Today I spend some of my time working in Big Data, more often than not, migrating information from existing data stores into Big Data so that they can be analysed, something I covered in more detail here:

http://www.ibm.com/developerworks/library/bd-sqltohadoop1/index.html
http://www.ibm.com/developerworks/library/bd-sqltohadoop2/index.html


  [Read more...]
Fun with MySQL and Hadoop at SCaLE 12X
+0 Vote Up -0Vote Down
It's my pleasure to be presenting at SCaLE 12X on the subject of real-time data loading from MySQL to Hadoop.  This is the first public talk on work at Continuent that enables Tungsten Replicator to move transactions from MySQL to HDFS (Hadoop Distributed File System).  I will explain how replication to Hadoop works, how to set it up, and offer a few words on constructing views of MySQL data using tools like Hive.

As usual with replication everything we are doing on Hadoop replication is open source.  Builds and documentation will be

  [Read more...]
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
+0 Vote Up -0Vote Down
Oracle is the most powerful DBMS in the world. However, Oracle's expensive and complex replication makes it difficult to build highly available applications or move data in real-time to data warehouses and popular databases like MySQL. In this webinar-on-demand you will learn how Continuent Tungsten solves problems with Oracle replication at a fraction of the cost of other solutions and with less
SQL to Hadoop and back again, Part 3: Direct transfer and live data exchange
+1 Vote Up -0Vote Down

The third, and final article in my series on migrating data to and from Hadoop and SQL databases is now available:

Big data is a term that has been used regularly now for almost a decade, and it — along with technologies like NoSQL — are seen as the replacements for the long-successful RDBMS solutions that use SQL. Today, DB2®, Oracle, Microsoft® SQL Server MySQL, and PostgreSQL dominate the SQL space and still make up a considerable proportion of the overall market. In this final article of the series, we will look at more automated solutions for migrating data to and from Hadoop. In the previous articles, we concentrated on methods that take exports or otherwise formatted and extracted data from your SQL source, load that into Hadoop in some way, then process or parse it. But if you want to analyze big data,

  [Read more...]
Copying MySQL Data to Hadoop with Minimal Loss of Blood Part 2
Employee +2 Vote Up -0Vote Down

I have spent the better part of the last month at Big Data conferences trying to see behind the $2.5 million in marketing smoke to see what is really going to be showing up on the to-do list of DBAs. The first bit of news is that half the vendors at shows like Strata or Big Data Techon will probably be gone by this time next year. So picking a vendor right now is a little iffy. Hadoop’s ecosystem is flourishing and will surely be around for some time but the vendors are playing musical chairs.

But we are Open Source and we do not need vendors! Well, yes and no. The good folks at Cloudera and Horton Works have done you a big favor by providing wonderful tutorials that are worth your time to see. Recently two former MySQL-ers, Sarah Sproehnle and Ian Wrigley, have put together

  [Read more...]
Big Data Tools that You Need to Know About – Hadoop & NoSQL – Part 2
+0 Vote Up -0Vote Down

 

In the previous article we introduced Hadoop as the most popular Big Data toolset on the market today. We had just started talking about MapReduce as the major framework that makes Hadoop distinctive. So let’s continue the discussion where we left off.

 

MapReduce is really the key to understanding Hadoop’s parallel processing functionality as it enables data in various formats (XML, text, binary, log, SQL, ect) to be divided up and mapped out to many computers nodes and then recombined back to produce a final data set.

 

  [Read more...]
New MySQL features, related technologies at Percona Live London
+0 Vote Up -0Vote Down

The upcoming Percona Live London conference, November 11-12, features quite a number of talks about the latest MySQL features and related technologies. There will be a lots of talks about the new MySQL 5.6 features:

  [Read more...]
SQL to Hadoop and back again, Part 1: Basic data interchange techniques
+0 Vote Up -0Vote Down

I’ve got a new article, which is part of a new three-part series, on moving data between SQL and Hadoop, both the export to Hadoop and importing processed content back into an SQL store.

In this first one, we look at the basic mechanics and considerations before you start the migration of data, such as the data format, content, and export techniques.

Read: SQL to Hadoop and back again, Part 1: Basic data interchange techniques


Data Analytics at NBCUniversal. Interview with Matthew Eric Bassett.
+0 Vote Up -0Vote Down
“The most valuable thing I’ve learned in this role is that judicious use of a little bit of knowledge can go a long way. I’ve seen colleagues and other companies get caught up in the “Big Data” craze by spend hundreds of thousands of pounds sterling on a Hadoop cluster that sees a few megabytes [...]
Percona Live London 2013: an insider’s view of the schedule
+1 Vote Up -0Vote Down

With the close of call for papers earlier this month, the Percona Live London conference committee was in full swing this past week reviewing all of the many submissions for November’s Percona Live London MySQL Conference.

The submissions are far ranging and cover some really interesting topics, making the lineup for Percona Live London really strong! What the committee looks for in a submission is how much “value” a talk will bring to the

  [Read more...]
MySQL webinar: ‘Introduction to open source column stores’
+1 Vote Up -0Vote Down

Join me Wednesday, September 18 at 10 a.m. PDT for an hour-long webinar where I will introduce the basic concepts behind column store technology. The webinar’s title is: “Introduction to open source column stores.”

What will be discussed?

This webinar will talk about Infobright, LucidDB, MonetDB, Hadoop (Impala) and other column stores

  • I will compare features between major column stores (both open and closed source).
  • Some benchmarks will be used to demonstrate the basic
  [Read more...]
Copying MySQL Data to Hadoop with Minimal Loss of Blood Part 1
Employee +1 Vote Up -0Vote Down

Ask ten DBAs for a definition of ‘Big Data’ and you well get more than ten replies. And the majority of those replies will lead you to Hadoop. Hadoop has been the most prominent of the big data frameworks in the open source world. Over 80% of the Hadoop instances in the world are feed their data from MySQL1. But Hadoop is made up of many parts, some confusing and many that do not play nicely with each other. It is analogous to being given a pile of automotive parts from different models and tyring to come up with a car at the end of the day. So what if you do if you are wanting to copy some of your relational data into Hadoop and want to avoid the equivilent of scraped knuckles? The answer is Bigtop and what follows is a way to get a one node does all system running so you can experiement with Hadoop, Map/Reduce, Hive, and all

  [Read more...]
Big Data with MySQL and Hadoop at MySQL Connect 2013
+1 Vote Up -0Vote Down

I will be talking about Big Data with MySQL and Hadoop at MySQL Connect 2013 (Sept. 21-22) in San Francisco as well as at Percona University at Washington, DC (September 12, 2013). Apache Hadoop is a very popular Big Data solution and we can nowadays easily integrate it with MySQL. I will start with a brief introduction of Apache Hadoop and its components (HFDS, Map/Reduce, Hive, HBase/HCatalog, Flume, Scoop, etc). Next I will show 2 major Big Data scenarios:

  • From file to Hadoop to MySQL. This is an example of “ELT” process: Extract data from external source; Load data into Hadoop; Transform
  [Read more...]
MySQL and Hadoop integration
+0 Vote Up -0Vote Down

Dolphin and Elephant: an Introduction

This post is intended for MySQL DBAs or Sysadmins who need to start using Apache Hadoop and want to integrate those 2 solutions. In this post I will cover some basic information about the Hadoop, focusing on Hive as well as MySQL and Hadoop/Hive integration.

First of all, if you were dealing with MySQL or any other relational database most of your professional life (like I was), Hadoop may look different. Very different. Apparently, Hadoop is the opposite to any relational database. Unlike the database where we have a set of tables and indexes, Hadoop works with a set of text files. And… there are no indexes at all. And yes, this may be shocking,

  [Read more...]
On Oracle NoSQL Database –Interview with Dave Segleau.
+0 Vote Up -0Vote Down
“We went down the path of building Oracle NoSQL database because of explicit request from some of our largest Oracle Berkeley DB installations that wanted to move away from maintaining home grown sharding implementations and very much wanted an out of box technology that can replicate the robustness of what they had built “out of [...]
What technologies are you running alongside MySQL?
+2 Vote Up -0Vote Down

In many environments MySQL is not the only technology used to store in-process data.

Quite frequently, especially with large-scale or complicated applications, we use MySQL alongside other technologies for certain tasks of reporting, caching as well as main data-store for portions of application.

What technologies for data storage and processing do you use alongside MySQL in your environment? Please feel free to elaborate in the comments about your use case and experiences!

Note: There is a poll embedded within this post, please visit the site to participate in this post's poll.

The post

  [Read more...]
On PostgreSQL. Interview with Tom Kincaid.
+0 Vote Up -1Vote Down
“Application designers need to start by thinking about what level of data integrity they need, rather than what they want, and then design their technology stack around that reality. Everyone would like a database that guarantees perfect availability, perfect consistency, instantaneous response times, and infinite throughput, but it´s not possible to create a product with [...]
MySQL Applier For Hadoop: Implementation
Employee +4 Vote Up -0Vote Down

This is a follow up post, describing the implementation details of Hadoop Applier, and steps to configure and install it. Hadoop Applier integrates MySQL with Hadoop providing the real-time replication of INSERTs to HDFS, and hence can be consumed by the data stores working on top of Hadoop. You can know more about the design rationale and per-requisites in the previous post.

Design and Implementation:

Hadoop Applier replicates rows inserted into a table in MySQL to the Hadoop Distributed File System(HDFS). It uses an API provided by libhdfs, a C library to manipulate files in HDFS.

The library comes pre-compiled with Hadoop distributions.It






  [Read more...]
The Data Day, A few days: April 22-26 2013
+0 Vote Up -0Vote Down

Pivotal launches. SkySQL and Mony Program merge. And much, much more

Our report on the changes in the MySQL ecosystem is now available for 451 clients and non-clients alike at bit.ly/451mysql

— Matt Aslett (@maslett) April 25, 2013

For 451 Research clients: VMware expands Serengeti’s horizons with updated Hadoop virtualization project bit.ly/17muQFI

— Matt Aslett (@maslett) April 26, 2013

For 451 Research clients: SkySQL, Monty Program merge to support MariaDB following formation of MariaDB Foundation bit.ly/10dsdjf

  [Read more...]
Biggest MySQL related news in the last 24 hours, Day 2
+1 Vote Up -2Vote Down

Continuing on from yesterday, the biggest news that I’ve noted in the past 24 hours:

  • The commitment from Oracle’s MySQL team to release a new GA about once every 24 months, with a Developer Milestone Release (DMR), with “GA quality” every 4-6 months. Tomas Ulin announced MySQL 5.7 DMR1 (milestone 11) [download, release notes, manual]. He also announced MySQL Cluster 7.3 DMR2 [download,
  •   [Read more...]
    MySQL Applier For Hadoop: Real time data export from MySQL to HDFS
    Employee +2 Vote Up -0Vote Down

    MySQL replication enables data to be replicated from one MySQL database server (the master) to one or more MySQL database servers (the slaves). However, imagine the number of use cases being served if the slave (to which data is replicated) isn't restricted to be a MySQL server; but it can be any other database server or platform with replication events applied in real-time! 
    This is what the new Hadoop Applier empowers you to do.
    An example of such a slave could be a data warehouse system such as Apache Hive, which uses HDFS as a data store. If you have a Hive metastore associated with HDFS(Hadoop Distributed File System), the Hadoop Applier can populate Hive


      [Read more...]
    Announcing the MySQL Applier for Apache Hadoop
    Employee_Team +5 Vote Up -0Vote Down

    Enabling Real-Time MySQL to HDFS Integration

    Batch processing delivered by Map/Reduce remains central to Apache Hadoop, but as the pressure to gain competitive advantage from “speed of thought” analytics grows, so Hadoop itself is undergoing significant evolution. The development of technologies allowing real time queries, such as Apache Drill, Cloudera Impala and the Stinger Initiative are emerging, supported by new generations of resource management with Apache YARN

    To support this growing emphasis on real-time operations, we are releasing a new

      [Read more...]
    Deploying Cloudera Impala on EC2 with Example Live Demo
    +0 Vote Up -0Vote Down

    A little while ago I blogged about (and open sourced) an Impala-powered soccer visualization demo, designed to demonstrate just how responsive Impala queries can be. Since not everyone has the time or resources to run the project themselves, we’ve decided to host it ourselves on an EC2 instance. You can try the visualization; we’ve also opened up the Impala web interface, where you can see query profiles and performance numbers, and Hue (username and password are both ‘test’), where you can run your own queries on the dataset.

    Deploying  [Read more...]

    The Data Day, Two days: February 11/12 2013
    +0 Vote Up -0Vote Down

    ClearStory sheds light on data analysis service. Illuminating ‘dark data’. More.

    For 451 clients: ClearStory bags $9m in series A funding, sheds light on its data analysis service bit.ly/Y6v8sV By Krishna Roy

    — Matt Aslett (@maslett) February 12, 2013

    For 451 clients: Global IDs makes ‘big data’ MDM play via cloud and Hadoop, touts profitable growth bit.ly/Y6v6kL By Krishna Roy

    — Matt Aslett (@maslett) February 12, 2013

    ScaleBase releases version 2.0 of its MySQL database scalability software bit.ly/WGtEtN

      [Read more...]
    MySQL-State of the Union. Interview with Tomas Ulin.
    +8 Vote Up -0Vote Down
    “With MySQL 5.6, developers can now commingle the “best of both worlds” with fast key-value look up operations and complex SQL queries to meet user and application specific requirements” –Tomas Ulin. On February 5, 2013, Oracle announced the general availability of MySQL 5.6. I have interviewed Tomas Ulin, Vice President for the MySQL Engineering team [...]
    The Data Day, Two days: February 7/8 2013
    +0 Vote Up -0Vote Down

    Teradata results. Funding for DataXu. The chemistry of data. And more.

    For 451 Research clients: Oracle launches major update to MySQL open source database bit.ly/TSONAt

    — Matt Aslett (@maslett) February 8, 2013

    For 451 clients: Analyzing the chemistry of data bit.ly/TSOV2R By @451wendy Treating sensitive data like dangerous chemicals

    — Matt Aslett (@maslett) February 8, 2013

    Teradata: Q4 net income $112m on revenue up 10% to $740m, FY net income $419m on revenue up 13% to $2.7bn. bit.ly/14FNS8L

      [Read more...]
    Data Science vs. Data Analytics
    +1 Vote Up -0Vote Down
    As this topic came up a few times this week for discussion at various places, I thought of composing a post on “Data Scientist vs. Data Analytics Engineer”; even though[...]
    On Big Data, Analytics and Hadoop. Interview with Daniel Abadi.
    +0 Vote Up -0Vote Down
    “Some people even think that “Hadoop” and “Big Data” are synonymous (though this is an over-characterization). Unfortunately, Hadoop was designed based on a paper by Google in 2004 which was focused on use cases involving unstructured data (e.g. extracting words and phrases from Webpages in order to create Google’s Web index). Since it was not [...]
    Distributed Clustering Services
    +0 Vote Up -0Vote Down
    Apart from my consulting as part of ScaleIn, I also invest to bootstrap companies with really disruptive ideas; and in the process met few database specific companies who are already[...]
    Previous 30 Newer Entries Showing entries 31 to 60 of 141 Next 30 Older Entries

    Planet MySQL © 1995, 2014, Oracle Corporation and/or its affiliates   Legal Policies | Your Privacy Rights | Terms of Use

    Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.