/  Technology   /  MySQL: How to Write a Query That Returns the Top Records in a Group
MySQL: How to Write a Query That Returns the Top Records in a Group

MySQL: How to Write a Query That Returns the Top Records in a Group

 

Here you can write a query that returns the top n records from a group or category. Over time,I’ve needed this type of query occasionally, and i always end up either iterating through a result set in code or doing an overly complex multi-query union.In retrospect, both methods are highly inefficient and stupid.

In my quest to learn new database query techniques, I discovered the rank() function in MYSQL 8.0. This common conundrum can be solved with an equally simple technique in MYSQL 5.7 and earlier

Here’s an example.

The Sample Table

To begin , let’s build a sample database table. This is a simple orders table that snaps different customers over four months. It provides an ordered GUID as the primary key and contains an order number, customer number, customer name, order date and order amount.

We will use this same table in both of our MySQL version examples:

CREATE TABLE orders(
id BINARY(16),
order_number INT,
customer_number INT,
customer_name VARCHAR(90),
order_date DATE,
order_amount DECIMAL(13,2),
PRIMARY KEY (`id`)
);

INSERT INTO orders VALUES 
(UNHEX(‘11E92BDEA738CEB7B78E0242AC110002’), 100, 5001, ‘Wayne Enterprises’, ‘2018–11–14’, 100.00),
(UNHEX(‘11E92BDEA73910BBB78E0242AC110002’), 101, 6002, ‘Star Labs’, ‘2018–11–15’, 200.00),
(UNHEX(‘11E92BDEA7395C95B78E0242AC110002’), 102, 7003, ‘Daily Planet’, ‘2018–11–15’, 150.00),
(UNHEX(‘11E92BDEA739A057B78E0242AC110002’), 103, 5001, ‘Wayne Enterprises’, ‘2018–11–21’, 110.00),
(UNHEX(‘11E92BDEA739F892B78E0242AC110002’), 104, 6002, ‘Star Labs’, ‘2018–11–22’, 175.00),
(UNHEX(‘11E92BE00BADD97CB78E0242AC110002’), 105, 6002, ‘Star Labs’, ‘2018–11–23’, 117.00),
(UNHEX(‘11E92BE00BAE15ACB78E0242AC110002’), 106, 7003, ‘Daily Planet’, ‘2018–11–24’, 255.00),
(UNHEX(‘11E92BE00BAE59FEB78E0242AC110002’), 107, 5001, ‘Wayne Enterprises’, ‘2018–12–07’, 321.00),
(UNHEX(‘11E92BE00BAE9D7EB78E0242AC110002’), 108, 6002, ‘Star Labs’, ‘2018–12–14’, 55.00),
(UNHEX(‘11E92BE00BAED1A4B78E0242AC110002’), 109, 7003, ‘Daily Planet’, ‘2018–12–15’, 127.00),
(UNHEX(‘11E92BE021E2DF22B78E0242AC110002’), 110, 6002, ‘Star Labs’, ‘2018–12–15’, 133.00),
(UNHEX(‘11E92BE021E31638B78E0242AC110002’), 111, 5001, ‘Wayne Enterprises’, ‘2018–12–17’, 145.00),
(UNHEX(‘11E92BE021E35474B78E0242AC110002’), 112, 7003, ‘Daily Planet’, ‘2018–12–21’, 111.00),
(UNHEX(‘11E92BE021E39950B78E0242AC110002’), 113, 6002, ‘Star Labs’, ‘2018–12–31’, 321.00),
(UNHEX(‘11E92BE021E3CEC5B78E0242AC110002’), 114, 6002, ‘Star Labs’, ‘2019–01–03’, 223.00),
(UNHEX(‘11E92BE035EF4BE5B78E0242AC110002’), 115, 6002, ‘Star Labs’, ‘2019–01–05’, 179.00),
(UNHEX(‘11E92BE035EF970DB78E0242AC110002’), 116, 5001, ‘Wayne Enterprises’, ‘2019–01–14’, 180.00),
(UNHEX(‘11E92BE035EFD540B78E0242AC110002’), 117, 7003, ‘Daily Planet’, ‘2019–01–21’, 162.00),
(UNHEX(‘11E92BE035F01B8AB78E0242AC110002’), 118, 5001, ‘Wayne Enterprises’, ‘2019–02–02’, 133.00),
(UNHEX(‘11E92BE035F05EF0B78E0242AC110002’), 119, 7003, ‘Daily Planet’, ‘2019–02–05’, 55.00),
(UNHEX(‘11E92BE0480B3CBAB78E0242AC110002’), 120, 5001, ‘Wayne Enterprises’, ‘2019–02–08’, 25.00),
(UNHEX(‘11E92BE25A9A3D6DB78E0242AC110002’), 121, 6002, ‘Star Labs’, ‘2019–02–08’, 222.00);

MySQL 5.7 Example

The rank() function is pretty cool, but it is not available prior to MySQL 8.0. Therefore we will need to write a creative nested query to rank our records and provide the results.

We are going to start by writing a query that ranks all of the records in our table in order of year, month and order amount in descending sequence (so that the largest orders get the lowest scores).

SELECT order_number, customer_number, customer_name, order_date,
YEAR(order_date) AS order_year, 
MONTH(order_date) AS order_month, 
order_amount, 
@order_rank := IF(@current_month = MONTH(order_date),
@order_rank + 1, 1) AS order_rank,
@current_month := MONTH(order_date) 
FROM orders
ORDER BY order_year, order_month, order_amount DESC;

In our example SELECT statement we are getting all the fields from the table along with getting the YEAR from the order date as well as the MONTH. Since our goal is to rank the orders by month I am creating a temporary MySQL variable called @current_month that will keep track of each and every month. On every change of month we reset the @order_rank variable to one, otherwise we increment by one.

Note: Using the := operand allows us to generate a new variable on the fly without requiring the SET command.

Note: Keep in mind this SELECT statement query will rank all of the records in your table. Normally you’d need to have a WHERE clause query that limits the size of the result set. Perhaps by customer or date range.

The query above manufacture a result set that looks like this:

m2

You can see that the orders are sorted by year or month and then by order amount in come down sequence. The new order_rank column is comprise that ranks every order in 1–2–3 sequence by month.

Now we can include this query as a subquery to a SELECT query that only pulls the top 3 orders out of every group. 

That final query looks like this:

SELECT customer_number, customer_name, order_number, order_date, order_amount 
FROM 
(SELECT order_number, customer_number, customer_name, order_date,
YEAR(order_date) AS order_year, 
MONTH(order_date) AS order_month,
order_amount, 
@order_rank := IF(@current_month = MONTH(order_date), 
@order_rank + 1, 1) AS order_rank,
@current_month := MONTH(order_date) 
FROM orders
ORDER BY order_year, order_month, order_amount DESC) ranked_orders 
WHERE order_rank <= 3;

Using our ranking query as a subquery, we only need to pull out the fields we need for reporting. A WHERE clause is added to pull only records with a rank of 3 or less. Below is our final result set:

m3

According to the results, we got the top 3 orders every month.

The MySQL 8.0 Example

In MySQL 8.0, the rank() function adds some additional functionality for ranking records in a result set. The rank() function partitions a result set by a value you specify, then assigns a rank to each row within each partition. Tie records are given the same rank, and subsequent new numbers are given a rank of one plus the number of ranked records before them.

The ranking query we’re using with this new feature looks like this:

SELECT order_number, customer_number, customer_name, order_date,
YEAR(order_date) AS order_year, 
MONTH(order_date) AS order_month, 
order_amount,
RANK() OVER (
PARTITION BY YEAR(order_date), MONTH(order_date)
ORDER BY YEAR(order_date), MONTH(order_date), order_amount DESC) order_value_rank
FROM orders;

m4

In this example, the two greatest orders in December had $321.00 in total. The rank() function returns these two records as 1 and the subsequent record gets a rank of 3, etc. 

This ranking query is used as a subquery for our final query:

WITH ranked_orders AS (
SELECT order_number, customer_number, customer_name, order_date,
YEAR(order_date) AS order_year, 
MONTH(order_date) AS order_month, 
order_amount,
RANK() OVER (
PARTITION BY YEAR(order_date), MONTH(order_date)
ORDER BY YEAR(order_date), 
MONTH(order_date), order_amount DESC) order_rank
FROM orders
)
SELECT customer_number, customer_name, order_number, order_date,
order_amount 
FROM ranked_orders
WHERE order_rank <= 3;

The final query is very similar to our MySQL 5.7 example, but uses some MySQL 8.0 goodness (like the availability of the WITH statement) along with more ranking capabilities that you can research in the MySQL 8.0 documentation. The final results to this query are identical to our MySQL 5.7 results in the example above.

 

Leave a comment