In this part we are going to analyze a bit more complex query
than before.
This query is a 6-way join.
The query is:
SELECT
supp_nation,
cust_nation,
l_year,
SUM(volume) AS revenue
FROM
(
SELECT
n1.n_name AS supp_nation,
n2.n_name AS cust_nation,
extract(year FROM l_shipdate) as
l_year,
l_extendedprice * (1 - l_discount) AS …
In the previous part we showed how NDB will parallelise a
simple
2-way join query from TPC-H. In this part we will describe
how
the pushdown of joins to a storage engine works in the MySQL
Server.
First a quick introduction to how a SQL engine handles a
query.
The query normally goes through 5 different phases:
1) Receive query on the client connection
2) Query parsing
3) Query optimisation
4) Query execution
5) Send result of query on client connection
The result of 1) is a text string that contains the SQL query
to
execute. In this simplistic view of the SQL engine we will
ignore
any such things as prepared statements and other things making
the
model more complex.
The text string is parsed by 2) into a data structure that
represents
the query in objects that match concepts in the SQL engine.
Query …
Summary: An alternate approach, offered in response to our
original post, provides excellent improvements
for smaller databases, but clustered indexes offer better
performance as database size increases. (This posting is by
Dave.)
Jay Pipes suggested an alternate approach to improving MySQL
performance of Query 17 on a TPC-H-like database.
-
- Add the index (l_partkey, l_quantity) to the lineitem
table.
- Re-write the query as:
select sum(li.l_extendedprice) / 7.0 as avg_yearly from lineitem li inner join part p on li.l_partkey = p.p_partkey inner join ( select l_partkey, 0.2 * avg(l_quantity) as quantity from lineitem group by l_partkey ) as quantities on …
Executive Summary: A query like TPC-H Query 17 can be sped up by
large factors by using straight_joins and clustering indexes. (This entry posted
by Dave.)
In a previous post, we wrote about queries like TPC-H query 2,
and the use of straight_join to improve performance.
This week, we consider Query 17, described by the TPC-H
documentation as
“The Small-Quantity-Order Revenue Query considers parts of a
given brand and with a given container type and determines the
average lineitem quantity of such parts ordered for all orders
(past and pending) in the 7-year database. What would the
average yearly gross (undiscounted) loss in revenue if orders for
these parts with a quantity of this average were no longer
taken?”
Our initial run on Q17 (same hardware as before) timed out …
[Read more]Following on from the announcement at the MySQL conference where Sun and Kickfire jointly announced data warehousing benchmark records, we have just announced new TPC-H benchmark records. Specifically, the Kickfire Database Appliance 2400 is the highest price/performance offering at 300GB, again breaking the $1 barrier for the first time coming in at 89 cents per QphH (Queries per hour on the TPC-H benchmark). The 2400 is also the highest performance (non-clustered) offering at 300GB.
I’m not going to further dwell on the numbers in this post other than to quickly point out another aspect of this achievement that Justin noted in his blog related to the energy savings the Kickfire …
[Read more]Today, we officially launched Kickfire. As part of our announcement we published, together with Sun Microsystems, record-breaking TPC-H benchmark numbers (data warehousing industry benchmarks) as well as a series of significant partnerships in the Open Source world.
There has been a lot of work here over the last two years to get us to this point and I am very proud of the team for getting us to where we are today. Two years ago we just had a vision; today that vision became reality – one substantiated by independent industry benchmarks.
For those of you unfamiliar with these benchmarks let me give you a brief overview to explain why we …
[Read more]