In my opinion, one of the best things that happened to Planet MySQL
lately, is Explain Extended, a blog by Alex Bolenok (also
known as Quassnoi on Stackoverflow).

I never had the pleasure of meeting Alex in person, but his
articles are always interesting and of high quality, and the SQL
wizardry he pulls off is downright inspiring. I really feel
humbled by the creativity of some of his solutions and his
apparent experience with multiple RDBMS products.

Alex' most recent post is about aggregation, and
finding a top 3 based on the aggregate:

In …

**1**to

**6**

**Percentile**(reset)

I just read SQL: Ranking without self join, in which
Shlomi
Noach shares a nice MySQL-specific trick based on user-defined variables to compute rankings.

Shlomi's trick reminds me somewhat of the trick I came across
little over a year ago to caclulate percentiles. At that time, several
people pointed out to me too that using user-defined variables in
this way can be unreliable.The problem with user-defined
variablesSo what is the problem exaclty? Well, whenever a query
assigns to a variable, and that same variable is read in another
part of the query, you're on thin ice. That's because the …

After stepping off of the `GROUP_CONCAT()`

solution
for calculating quantiles I figured it would be
nice to find a better way to calculate the median too.

Solution

I previously wrote on how to calculate the median using
`GROUP_CONCAT()`

, but I think that this is a better
way:

[Read more]

SELECT AVG(length) AS median -- take the average of left and right median

, MIN(length) AS left_median --

, MAX(length) AS right_median --

, @l AS left_median_position --

, @r AS right_median_position --

FROM (

SELECT @n, length -- @n is just here to …

Are you getting fed up yet with my posts on calculating percentiles? Well, I'm sorry but
I'm not quite finished.

Here's a simple, fast method to calculate the specified number of
quantiles:

[Read more]

-- set the number of quantiles, for exmple:

-- quartiles: 4

-- deciles: 10

-- percentiles: 100

SET @quantiles:=4; -- select quartiles

-- calculate all quantiles

--

--

SELECT amount AS metric

, @n DIV (@c DIV @quantiles) AS quantile

, @n AS N

FROM sakila.payment

CROSS JOIN (

SELECT @n:=0 -- rownumber

, @c:=COUNT(*) …

My previous post on calculating percentiles with
MySQL generated some comments and good discussion. In particular,
I got some very interesting comments from Vladimir.

Basically, Vladimir was doubtful whether the
`GROUP_CONCAT()`

solution would be optimal in
comparison to a `JOIN`

. His proposal is to solve it
like this:

SELECT SUM(g1.r) sr

, g2.length l

, SUM(g1.r)/(SELECT COUNT(*) FROM film) p

FROM (SELECT COUNT(*) r, length FROM film GROUP BY length) g1

JOIN (SELECT COUNT(*) r, length FROM film GROUP BY length) g2

ON g1.length < g2.length

GROUP BY g2.length

HAVING p > 0.9

ORDER BY p

LIMIT 1

First, this query sets up two identical subqueries …

Yesterday, I was on the freenode ##pentaho irc channel when
Andres
Chaves asked me how to calculate the *N*th percentile in MySQL. He saw a
solution somewhere using subqueries, but wasn't too happy about
it.

A while ago I wrote about calulating the median in MySQL, and it turns
out the *N*th percentile can be calculated using a
similar, single-pass approach, not relying on subqueries, UDFs,
or user-defined variables.

The percentile....

So, what is a percentile exactly? Here's what the wikipedia
says:

A percentile is the value of a variable below which a certain
percent of observations fall. So the 20th percentile is the value
(or score) below which 20 …

**1**to

**6**