Ranking
Finding the position of a value in a larger set is a common operation. In SQL, ranking is done by RANK() OVER
in conjunction with an ORDER BY
specification.
Examples
- Find the rank of each student
SELECT ID,
RANK() OVER (ORDER BY (GPA) DESC) AS s_rank
FROM student_grades;
/* student_grades(ID, GPA) is a relation
* giving the GPA of each student */
Here, an extra ORDER BY
clause can return the results in sorted order.
SELECT ID,
RANK() OVER (ORDER BY (GPA) DESC) AS s_rank
FROM student_grades
ORDER BY s_rank ASC;
A basic issue with ranking is how to deal with the case of mutiple tuples that are the same values on the ordering attribute(s).
-
Naturally, the
RANK()
function leaves gaps.Meaning that if the highest GPA is shared by two students, both would get rank
1
, and the next rank would be3
. -
There is a
DENSE_RANK()
function which does not leave gaps.
Ranking with partitions
Ranking can be done within partitions of data, using PARTITION BY
.
Example
- Find the rank of students within each department
SELECT ID,
dept_name,
RANK() OVER
(
PARTITION BY dept_name
ORDER BY GPA DESC
) AS dept_rank
FROM dept_grades
ORDER BY dept_name, dept_rank;
-
Multiple rank expressions can be used within a single
SELECT
statement. -
When ranking (possibly with partitioning) occurs along with a
GROUP BY
clause, theGROUP BY
clause is applied first, and partitioning and ranking are done on the results ofGROUP BY
, allowing aggregate values to be used for ranking.
Other ranking related features
-
PERCENT_RANK
: Gives the rank of the tuple as a fraction -
CUME_DIST
: Cumulative distribution -
ROW_NUMBER
: Sorts the rows and gives each row a unique number corresponding to its position- Non-deterministic in presence of duplicates
-
NULLS FIRST
,NULLS LAST
-
NTILE(n)
: Takes the tuples in each partition in the specified order, and divides them inton
buckets with equal numbers of tuples
Example
- Find for each student the quartile they belong to
SELECT ID,
NTILE(4) OVER (ORDER BY GPA DESC) AS quartile
FROM student_grades;
Windowing
Window queries compute an aggregate function over ranges of tuples. They are used to smooth out random variables.
Unlike partitions, windows may overlap, in which case a tuple may contribute to more than one window.
SQL provides a windowing feature to support such queries.
ROWS n_1 PRECEDING AND n_2 FOLLOWING
Example
- Compute the sum for each three days window.
SELECT date,
SUM(value) OVER
(
ORDER BY date
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
)
FROM sales;
Other windowing related features
UNBOUNDED
: The number of preceding / following rows are unboundedCURRENT ROW
: Specifies the current row- e.g.,
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- e.g.,
RANGE BETWEEN ...
: Cover all tuples with a particular value rather than covering a specific number of tuples
Windowing with partitions
SQL supports windowing within partitions.
Example
- Find total balance of each account after each transaction on the account
SELECT account_number,
date_time,
SUM(value) OVER
(
PARTITION BY account_number
ORDER BY date_time
ROWS UNBOUNDED PRECEDING
) AS balance
FROM transaction
ORDER BY account_number, date_time;