Showing entries 1 to 8
Displaying posts with tag: parallel query (reset)
NDB Parallel Query, part 5

In this part we are going to analyze a bit more complex query than before.
This query is a 6-way join.

The query is:
SELECT
        supp_nation,
        cust_nation,
        l_year,
        SUM(volume) AS revenue
FROM
        (
                SELECT
                        n1.n_name AS supp_nation,
                        n2.n_name AS cust_nation,
                        extract(year FROM l_shipdate) as l_year,
                        l_extendedprice * (1 - l_discount) AS …

[Read more]
NDB Parallel Query, part 4

In this part we will discuss how NDB batch handling works. Query execution of
complex SQL queries means that more rows can be delivered than the receiver is
capable of receiving. This means that we need to create a data flow from the
producer where the data resides and the query executor in the MySQL Server.

The MySQL Server uses a record where the storage engine have to copy the result
row into the record. This means that the storage of batches of rows is taken
care of by the storage engine.

When NDB performs a range scan it will decide on the possible parallelism before
the scan is started. The NDB API have to allocate enough memory to ensure that
we have memory prepared to receive the rows as they arrive in a flow of result
rows from the data nodes. It is possible to set batch size of hundreds and even
thousands of rows for a query.

The …

[Read more]
NDB Parallel Query, part 3

In the previous part we showed how NDB will parallelise a simple
2-way join query from TPC-H. In this part we will describe how
the pushdown of joins to a storage engine works in the MySQL Server.

First a quick introduction to how a SQL engine handles a query.
The query normally goes through 5 different phases:
1) Receive query on the client connection
2) Query parsing
3) Query optimisation
4) Query execution
5) Send result of query on client connection

The result of 1) is a text string that contains the SQL query to
execute. In this simplistic view of the SQL engine we will ignore
any such things as prepared statements and other things making the
model more complex.

The text string is parsed by 2) into a data structure that represents
the query in objects that match concepts in the SQL engine.

Query …

[Read more]
MySQL 8.0.14: A Road to Parallel Query Execution is Wide Open!

For a very long time – since when multiple CPU cores were commonly available – I dreamed about MySQL having the ability to execute queries in parallel. This feature was lacking from MySQL, and I wrote a lots of posts on how to emulate parallel queries in MySQL using different methods: from simple parallel bash script to using Apache Spark to using ClickHouse together with MySQL. I have watched parallelism coming to PostgreSQL, to new databases like TiDB, to …

[Read more]
Using Parallel Query with Amazon Aurora for MySQL

Parallel query execution is my favorite, non-existent, feature in MySQL. In all versions of MySQL – at least at the time of writing – when you run a single query it will run in one thread, effectively utilizing one CPU core only. Multiple queries run at the same time will be using different threads and will utilize more than one CPU core.

On multi-core machines – which is the majority of the hardware nowadays – and in the cloud, we have multiple cores available for use. With faster disks (i.e. SSD) we can’t utilize the full potential of IOPS with just one thread.

AWS Aurora (based on MySQL 5.6) now has a version which will support parallelism for SELECT queries (utilizing the read capacity of storage nodes underneath the Aurora cluster). In this article, we will look at how this can improve the reporting/analytical query performance in MySQL. I will compare AWS Aurora with MySQL …

[Read more]
Now available in swanhart-tools: NATIVE asynchronous query execution for any MySQL client!

There is often a need to run queries in the background in MySQL. This generally is accomplished using a message queue (like gearman), or by using extensions to a client (PHP has such extensions) or by using MariaDB's native async query interface (C client only I think).

While such solutions work well, they don't work for say a GO client, or for the mysql command line client itself.

I present "async"; part of the Swanhart Toolkit (http://github.com/greenlion/swanhart-tools). Async is a stored procedure and event based solution for asynchronous queries.

It consists of:


  • A queue table to hold the SQL to run, the state of execution, error messages, etc
  • A settings table that controls the number of parallel threads to use for executing queries
  • A stored routine …
[Read more]
Big Data: InfiniDB vs Spider: What else ?

Many of my recent engagements have been all around strategy to implement Real Time Big Data Analytics: Computing hardware cost of extending a single table collection with MariaDB and Parallel Query found in the Spider storage engine to offload columnar MPP storage like InfiniDB or Vertica.

As of today Parallel Query is only available from releases of MariaDB Spider supported by spiral arms. The more efficient way to use parallel query with Spider can be done on group by, and count queries that use a single spider table. In such case Spider Engine will execute query push down AKA map reduce.

Spider gets multiple levels of parallel execution for a single partitioned tables.

First level is per backend server:
The way to actually tell spider to scan different backends in concurrency is to set  spider_sts_bg_mode=1

Other level is per …

[Read more]
Parallel Query for MySQL with Shard-Query

While Shard-Query can work over multiple nodes, this blog post focuses on using Shard-Query with a single node.  Shard-Query can add parallelism to queries which use partitioned tables.  Very large tables can often be partitioned fairly easily. Shard-Query can leverage partitioning to add paralellism, because each partition can be queried independently. Because MySQL 5.6 supports the partition hint, Shard-Query can add parallelism to any partitioning method (even subpartioning) on 5.6 but it is limited to RANGE/LIST partitioning methods on early versions.

The output from Shard-Query is from the commandline client, but you can use MySQL proxy to communicate with Shard-Query too.

In the examples I am going to use the schema from the Star Schema Benchmark.  I generated data for scale factor 10, which means about 6GB of data in the largest table. I am going to show a few different queries, and …

[Read more]
Showing entries 1 to 8