DB2 SQL OLAP : RANK and DENSE_RANK
In any exam how are participants are ranked ?
- Participant(s) with highest marks got the 1st rank.
- Participant(s) with 2nd highest marks got the 2nd rank and so on.
Here is a sample table with participant names with marks
- Rank function work in almost same manner as we define ranks in any exam.
- There is a minor difference in DENSE_RANK() and RANK() functions.
- Same as ROW_NUMBER() OVER() function is required.
- But ORDER BY parameter is required in OVER() function.
- Black OVER() will not work
- If you “PARTITION BY” it must be followed by “ORDER BY”.
- “ORDER BY” defines how to want to give the ranks.
- This work exactly as the exam ranking login defined in the starting of this article.
- Based on “ORDER BY” top most row and all other matching rows get rank 1.
- 2nd and all matching rows get rank 2 and so on.
- There will be not gaps in rank sequence. Means no sequence number will be missing between 1 and highest rank sequence.
select a.*, DENSE_RANK() OVER(ORDER BY MARKS DESC) as DENSE_RANK from RESULT a
- Both rows with 900 marks are ranked 1.
- Row with 800 marks is ranked 2.
- All 3 rows with 600 marks are ranked 4.
OVER(PARTITION BY… ORDER BY…)
- “PARTITION BY” works same as in ROW_NUMBER() function.
- Create a virtual partition in rows and each partition has its own set of sequence number starting with 1.
select a.*, DENSE_RANK() OVER(PARTITION BY MARKS ORDER BY NAME) as DENSE_RANK from RESULT a
- Partitions are created by MARKS so each set of equal marks get its own set of rank sequence.
- After that ranks are assigned based on NAME (ORDER BY NAME).
Here is how RANK() is different from DENSE_RANK().
- If you ask all participants to make a queue by their marks only.(Highest mark in front and lowest in back).
- Queue will look like some this
|Position in queue||1||2||3||4||5||6||7||8||9||10|
Here is how RANK() works
- Same marks always get the same rank. So
- Top 900 occupy position 1 and 2. But you can not tell which one will go for position 1 and which one for position 2.
- So both gets RANK 1
- Both 900 got RANK 1 but they occupied first 2 positions
- So 800 will get the RANK 3. This will create a gap as there is no one with RANK 2.
- 700 will get the RANK 4.
- Three 600 with occupy position 5,6 and 7 and again you can not tell which one gets which position.
- So assign all three same RANK 5.
- Now 500 will get RANK 8 creating gaps for RANK 6 and RANK 7.
- and SO ON…
This example with RANK() and DENSE_RANK() will help you understand how RANK() works in comparison of DENSE_RANK()
select a.*, RANK() OVER(ORDER BY MARKS DESC) as RANK, DENSE_RANK() OVER(ORDER BY MARKS DESC) as DENSE_RANK from RESULT a
- There is not gaps in DENSE_RANK with all values between 1 to 7.
- Maximum rank can be less than total number of rows.
- There are gaps es in RANK with missing ranks 2,6 and 7.
- Maximum rank is always equal to number of rows(if PARITION BY is not used).
Its not like that gaps will occur in every case of RANK(). Based on values used for “ORDER BY” gaps may or may not occur.
select a.*, RANK() OVER(ORDER BY MARKS DESC, name) as RANK from RESULT a
- No gaps
RANK() OVER(PARTITION BY… ORDER BY…) work in same concept as in DENSE_RANK() OVER(PARTITION BY… ORDER BY…) as above.