Planet MySQL

Displaying posts with tag: Data Science (reset)

Oct

2015

Using Apache Spark and MySQL for Data Analysis

Posted by Alexander Rubin of MySQL Performance Blog on Wed 07 Oct 2015 22:49 UTC
Tags:

hadoop, MySQL, Data Science, Apache Spark, data analysis

What is Spark

Apache Spark is a cluster computing framework, similar to Apache Hadoop. Wikipedia has a great description of it:

Apache Spark is an open source cluster computing framework originally developed in the AMPLab at University of California, Berkeley but was later donated to the Apache Software Foundation where it remains today. In contrast to Hadoop’s two-stage disk-based MapReduce paradigm, Spark’s multi-stage in-memory primitives provides performance up to 100 times faster for certain applications. By allowing user programs to load data into a cluster’s memory and query it repeatedly, Spark is well-suited to machine learning algorithms.

In contrast to popular belief, Spark does not require all data to fit into memory but will use caching to speed up the operations …

[Read more]

Apr

2014

Using Apache Hadoop and Impala together with MySQL for data analysis

Posted by Alexander Rubin of MySQL Performance Blog on Mon 21 Apr 2014 13:43 UTC
Tags:

scalability, hadoop, Hive, MySQL, Performance, Impala, Data Science

Apache Hadoop is commonly used for data analysis. It is fast for data loads and scalable. In a previous post I showed how to integrate MySQL with Hadoop. In this post I will show how to export a table from MySQL to Hadoop, load the data to Cloudera Impala (columnar format) and run a reporting on top of that. For the examples below I will use the “ontime flight performance” data from my previous post (Increasing MySQL performance with parallel query execution). I’ve used the Cloudera Manager v.4 to install Apache Hadoop and Impala. For this test …

[Read more]

Jul

2013

MySQL and Hadoop integration

Posted by Alexander Rubin of MySQL Performance Blog on Thu 11 Jul 2013 10:00 UTC
Tags:

hadoop, sqoop, Hive, Insight for DBAs, MySQL, Apache Hadoop, Data Science, no sql

Dolphin and Elephant: an Introduction

This post is intended for MySQL DBAs or Sysadmins who need to start using Apache Hadoop and want to integrate those 2 solutions. In this post I will cover some basic information about the Hadoop, focusing on Hive as well as MySQL and Hadoop/Hive integration.

First of all, if you were dealing with MySQL or any other relational database most of your professional life (like I was), Hadoop may look different. Very different. Apparently, Hadoop is the opposite to any relational database. Unlike the database where we have a set of tables and indexes, Hadoop works with a set of text files. And… there are no indexes at all. And yes, this may be shocking, but all scans are sequential (full “table” scans in MySQL terms).

So, when does Hadoop makes sense?

First, Hadoop is great if you need to …

[Read more]

Dec

2012

Data Science vs. Data Analytics

Posted by Venu Anuganti on Mon 10 Dec 2012 19:33 UTC
Tags:

database, analytics, hadoop, data warehouse, bigdata, MySQL, Data Analytics, Data Science, DataAnalytics, DataScience, Difference between data science and data analytics, How to hire data scientist, ROle of data analytics, Role of Data Scientist, What is Data science

As this topic came up a few times this week for discussion at various places, I thought of composing a post on “Data Scientist vs. Data Analytics Engineer”; even though[...]

Nov

2012

Typical “Big” Data Architecture

Posted by Venu Anuganti on Fri 30 Nov 2012 22:15 UTC
Tags:

postgresql, sql, database, scalability, ETL, hadoop, data warehouse, MapReduce, hbase, reporting, cloudera, NoSQL, vertica, Hive, bigdata, MySQL, SAS, Big Data Architecture, Big Data Warehouse, Data Architecture, Impala, NoSQL and BigData, Data Analytics, Data Science, kognitio, druid

Here is the typical “Big” data architecture, that covers most components involved in the data pipeline. More or less, we have the same architecture in production in number of places[...]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links