Planet MySQL

Displaying posts with tag: Apache Spark (reset)

Apr

2020

MySQL table to JSON with 10 lines of Spark

Posted by Kasra Madadipouya on Tue 21 Apr 2020 16:55 UTC
Tags:

Programming, big data, json, MySQL, spark, Apache Spark, MySQL to JSON

Apache Spark is the de facto framework of the big data world. Any serious organization that’s dealing with big data uses Spark almost exclusively. Though, it has some caveats. For the starter, it’s hard to use. And it’s very confusing to get started with, even for those with a solid …

The post MySQL table to JSON with 10 lines of Spark appeared first on Geeky Hacker.

Oct

2018

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Posted by Uber Engineering on Tue 30 Oct 2018 15:00 UTC
Tags:

Hardware, Architecture, data center, big data, cassandra, redis, capacity planning, MySQL, Apache Hadoop, Apache Spark, Uber, Uber Engineering, Cluster Management, Peloton, Unified Resource Scheduler, Workload Cluster

Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data center hardware. At Uber, cluster management …

The post Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads appeared first on Uber Engineering Blog.

Oct

2018

Uber’s Big Data Platform: 100+ Petabytes with Minute Latency

Posted by Uber Engineering on Wed 17 Oct 2018 16:00 UTC
Tags:

Apache, engineering, storage, Architecture, hadoop, data warehouse, big data, json, MySQL, Data Modeling, latency, Apache Hadoop, Docker, Apache Spark, Uber Data, PostgresSQL, hoodie, Apache Parquet, Hudi, Uber Eng

Uber is committed to delivering safer and more reliable transportation across our global markets. To accomplish this, Uber relies heavily on making data-driven decisions at every level, from forecasting rider demand during high traffic events to identifying and addressing bottlenecks…

The post Uber’s Big Data Platform: 100+ Petabytes with Minute Latency appeared first on Uber Engineering Blog.

Jan

2018

Sneak Peek of the Percona Live 2018 Open Source Database Conference Breakout Sessions!

Posted by MySQL Performance Blog on Mon 15 Jan 2018 17:50 UTC
Tags:

conference, Santa Clara, cloud, aws, mariadb, mongodb, Amazon RDS, percona live, MySQL, Amazon Aurora, Apache Spark, open source database, Percona Live 2018, Early Bird

Take a look at the sneak peek of the breakout sessions for the Percona Live 2018 Open Source Database Conference, taking place April 23-25, 2018 at the Santa Clara Convention Center in Santa Clara, California. Early Bird registration discounts are available until February 4, 2018, and sponsorship opportunities are still available.

Conference breakout sessions will feature a range of in-depth talks related to each of the key areas. Breakout session examples include:

Database Security as a Function: Scaling to Your Organization’s Needs – Laine Campbell, …

[Read more]

Jun

2017

On Apache Ignite, Apache Spark and MySQL. Interview with Nikita Ivanov

Posted by Roberto V. Zicari on Fri 30 Jun 2017 13:40 UTC
Tags:

Uncategorized, sql, memcached, data warehousing, analytics, hadoop, mysq, Gridgain, SaaS, big data, vertica, redis, internet of things, machine learning, Tableau, Apache Ignite, Nikita Ivanov, proxysql, Apache Spark, vitess, ClickHouse, Apache Ignite In-Memory SQL Grid, Apache Kafka, ETL processes, in-memory computing, in-memory data grids, Spark Streaming

“Spark and Ignite can complement each other very well. Ignite can provide shared storage for Spark so state can be passed from one Spark application or job to another. Ignite can also be used to provide distributed SQL with indexing that accelerates Spark SQL by up to 1,000x.”–Nikita Ivanov.

I have interviewed Nikita Ivanov,CTO of GridGain.
Main topics of the interview are Apache Ignite, Apache Spark and MySQL, and how well they perform on big data analytics.

RVZ

Q1. What are the main technical challenges of SaaS development projects?

Nikita Ivanov: SaaS requires that the applications be highly responsive, reliable and web-scale. SaaS development projects face many of the same challenges as …

[Read more]

Mar

2017

Column Store Database Benchmarks: MariaDB ColumnStore vs. Clickhouse vs. Apache Spark

Posted by Alexander Rubin of MySQL Performance Blog on Fri 17 Mar 2017 18:12 UTC
Tags:

benchmark, column store, big data, MySQL, Apache Spark, Column Store Database, ClickHouse, MariaDB ColumnStore

This blog shares some column store database benchmark results, and compares the query performance of MariaDB ColumnStore v. 1.0.7 (based on InfiniDB), Clickhouse and Apache Spark.

I’ve already written about ClickHouse (Column Store database).

The purpose of the benchmark is to see how these three solutions work on a single big server, with many CPU cores and large amounts of RAM. Both systems are massively parallel (MPP) database systems, so they should use many cores for SELECT queries.

For the benchmarks, I chose …

[Read more]

Aug

2016

How Apache Spark makes your slow MySQL queries 10x faster (or more)

Posted by Alexander Rubin of MySQL Performance Blog on Wed 17 Aug 2016 15:26 UTC
Tags:

MySQL, Apache Spark

In this blog post, we’ll discuss how to improve the performance of slow MySQL queries using Apache Spark.

Introduction

In my previous blog post, I wrote about using Apache Spark with MySQL for data analysis and showed how to transform and analyze a large volume of data (text files) with Apache Spark. Vadim also performed a benchmark comparing performance of MySQL and Spark with Parquet columnar format (using Air traffic performance data). That works great, but what if we don’t want to move our data from MySQL to another storage (i.e., …

[Read more]

Jan

2016

Making Apache Spark Four Times Faster

Posted by MySQL Performance Blog on Fri 15 Jan 2016 22:52 UTC
Tags:

MySQL, Apache Spark

This is a followup to my previous post Apache Spark with Air ontime performance data.

To recap an interesting point in that post: when using 48 cores with the server, the result was worse than with 12 cores. I wanted to understand the reason is was true, so I started digging. My primary suspicion was that Java (I never trust Java) was not good dealing with 100GB of memory.

There are few links pointing to the potential issues with a huge HEAP:

http://stackoverflow.com/questions/214362/java-very-large-heap-sizes
…

[Read more]

Jan

2016

Apache Spark with Air ontime performance data

Posted by MySQL Performance Blog on Fri 08 Jan 2016 01:28 UTC
Tags:

Benchmarks, MySQL, Apache Spark

There is a growing interest in Apache Spark, so I wanted to play with it (especially after Alexander Rubin’s Using Apache Spark post).

To start, I used the recently released Apache Spark 1.6.0 for this experiment, and I will play with “Airlines On-Time Performance” database from
http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time. You can find the scripts I used here https://github.com/Percona-Lab/ontime-airline-performance. The uncompressed dataset …

[Read more]

Oct

2015

Using Apache Spark and MySQL for Data Analysis

Posted by Alexander Rubin of MySQL Performance Blog on Wed 07 Oct 2015 22:49 UTC
Tags:

hadoop, MySQL, Data Science, Apache Spark, data analysis

What is Spark

Apache Spark is a cluster computing framework, similar to Apache Hadoop. Wikipedia has a great description of it:

Apache Spark is an open source cluster computing framework originally developed in the AMPLab at University of California, Berkeley but was later donated to the Apache Software Foundation where it remains today. In contrast to Hadoop’s two-stage disk-based MapReduce paradigm, Spark’s multi-stage in-memory primitives provides performance up to 100 times faster for certain applications. By allowing user programs to load data into a cluster’s memory and query it repeatedly, Spark is well-suited to machine learning algorithms.

In contrast to popular belief, Spark does not require all data to fit into memory but will use caching to speed up the operations …

[Read more]

Top Authors

Oracle MySQL Blogs

Vendor Blogs

MySQL Links