Showing entries 1 to 6
Displaying posts with tag: parallel query (reset)
MySQL 8.0.14: A Road to Parallel Query Execution is Wide Open!

For a very long time – since when multiple CPU cores were commonly available – I dreamed about MySQL having the ability to execute queries in parallel. This feature was lacking from MySQL, and I wrote a lots of posts on how to emulate parallel queries in MySQL using different methods: from simple parallel bash script to using Apache Spark to using ClickHouse together with MySQL. I have watched parallelism coming to PostgreSQL, to new databases like TiDB, to …

[Read more]
Using Parallel Query with Amazon Aurora for MySQL

Parallel query execution is my favorite, non-existent, feature in MySQL. In all versions of MySQL – at least at the time of writing – when you run a single query it will run in one thread, effectively utilizing one CPU core only. Multiple queries run at the same time will be using different threads and will utilize more than one CPU core.

On multi-core machines – which is the majority of the hardware nowadays – and in the cloud, we have multiple cores available for use. With faster disks (i.e. SSD) we can’t utilize the full potential of IOPS with just one thread.

AWS Aurora (based on MySQL 5.6) now has a version which will support parallelism for SELECT queries (utilizing the read capacity of storage nodes underneath the Aurora cluster). In this article, we will look at how this can improve the reporting/analytical query performance in MySQL. I will compare AWS Aurora with MySQL …

[Read more]
Now available in swanhart-tools: NATIVE asynchronous query execution for any MySQL client!

There is often a need to run queries in the background in MySQL. This generally is accomplished using a message queue (like gearman), or by using extensions to a client (PHP has such extensions) or by using MariaDB's native async query interface (C client only I think).

While such solutions work well, they don't work for say a GO client, or for the mysql command line client itself.

I present "async"; part of the Swanhart Toolkit (http://github.com/greenlion/swanhart-tools). Async is a stored procedure and event based solution for asynchronous queries.

It consists of:


  • A queue table to hold the SQL to run, the state of execution, error messages, etc
  • A settings table that controls the number of parallel threads to use for executing queries
  • A stored routine …
[Read more]
Now available in swanhart-tools: NATIVE asynchronous query execution for any MySQL client!

There is often a need to run queries in the background in MySQL. This generally is accomplished using a message queue (like gearman), or by using extensions to a client (PHP has such extensions) or by using MariaDB's native async query interface (C client only I think).

While such solutions work well, they don't work for say a GO client, or for the mysql command line client itself.

I present "async"; part of the Swanhart Toolkit (http://github.com/greenlion/swanhart-tools). Async is a stored procedure and event based solution for asynchronous queries.

It consists of:


  • A queue table to hold the SQL to run, the state of execution, error messages, etc
  • A settings table that controls the number of parallel threads to use for executing queries
  • A stored routine …
[Read more]
Big Data: InfiniDB vs Spider: What else ?

Many of my recent engagements have been all around strategy to implement Real Time Big Data Analytics: Computing hardware cost of extending a single table collection with MariaDB and Parallel Query found in the Spider storage engine to offload columnar MPP storage like InfiniDB or Vertica.

As of today Parallel Query is only available from releases of MariaDB Spider supported by spiral arms. The more efficient way to use parallel query with Spider can be done on group by, and count queries that use a single spider table. In such case Spider Engine will execute query push down AKA map reduce.

Spider gets multiple levels of parallel execution for a single partitioned tables.

First level is per backend server:
The way to actually tell spider to scan different backends in concurrency is to set  spider_sts_bg_mode=1

Other level is per …

[Read more]
Parallel Query for MySQL with Shard-Query

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitioned tables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the partition hint, Shard-Query can add parallelism to any partitioning method (even subpartioning) on 5.6 but it is limited to RANGE/LIST partitioning methods on early versions.

The output from Shard-Query is from the commandline client, but you can use MySQL proxy to communicate with Shard-Query too.

In the examples I am going to use the schema from the Star Schema Benchmark.  I generated data for scale factor 10, which means about 6GB of data in the largest table. I am going to show a few different queries, and …

[Read more]
Showing entries 1 to 6