Search This Blog

Monday, September 27, 2010

Ranking functions

Ranking functions:

1.RANK
2.DENSE_RANK
3.ROW_NUMBER
4.NTILE
5.WIDTH_BUCKET
6.CUME_DIST
7.PERCENT_RANK
8.FIRST / LAST (KEEP)


1.RANK: Starts with 1, if duplicate values, same rank is assigned. Gaps appear in the ranks.

2.DENSE_RANK: Starts with 1, if duplicate values,same rank is assigned. No gap in ranks.

3.ROW_NUMBER: Starts with 1 & unique, if duplicate values,numbers are arbitrarily assigned.

The PARTITION BY clause used in ranking functions is used to divide a result set into pieces so that rankings can be applied within each subset.

Handling NULLS:

All ranking functions allow to specify where in the ranking order NULL values should appear.

Append NULLS FIRST / NULLS LAST after the ORDER BY clause of the function.

USAGE: Top/Bottom-N queries



FIRST / LAST: (KEEP)

"Find the regions with the best and worst total sales last year."

Example:

MIN(region_id)
    KEEP (DENSE_RANK FIRST ORDER BY SUM(tot_sales) DESC) best_region,
MIN(region_id)
    KEEP (DENSE_RANK LAST ORDER BY SUM(tot_sales) DESC) worst_region

NTILE:

Group the records into buckets ( equiheight buckets ).

Example:

 NTILE(4) OVER (ORDER BY SUM(tot_sales) DESC) will divide it into four buckets.


WIDTH_BUCKET:(equiwidth buckets)

WIDTH_BUCKET function groups rows of the result set into buckets.

WIDTH_BUCKET function attempts to create equiwidth buckets, meaning that the range of values is evenly distributed across the buckets.

WIDTH_BUCKET can operate on numeric or date types.

WIDTH_BUCKET uses the values of the second, third, and fourth parameters to generate N buckets.

If the expression yields values that fall outside the range specified by the second and third parameters, the WIDTH_BUCKET function will generate two additional buckets, numbered 0 and N+1, into which the outliers are placed.

Example:
 WIDTH_BUCKET(SUM(tot_sales), 1, 3000000, 3) sales_buckets

Based on these parameters, the WIDTH_BUCKET function generates three buckets;
the first bucket starts at 1, and the third bucket has an upper range of 3,000,000.
Since there are three buckets, the ranges for each bucket will be
1 to 1,000,000,
1,000,0001 to 2,000,000, and
2,000,0001 to 3,000,000.
Any values less than 1 or greater than 3,000,000 will be placed to 0th and 4th buckets.

NOTE:

Keep in mind that the WIDTH_BUCKET function does not remove rows from the result set that do not lie within the specified range; rather, they are placed into special buckets that your query can either utilize or ignore as needed.


CUME_DIST and PERCENT_RANK :

CUME_DIST( )  : (Cumulative Distribution)
   Ratio of the number of rows that have a lesser or equal ranking to the total number of rows in the partition.

PERCENT_RANK( ) :
   Ratio of a row's ranking to the number of rows in the partition.

   PERCENT_RANK=((Rank of the row -1) / (Total rows -1))

Example:

CUME_DIST(  ) OVER (ORDER BY SUM(tot_sales) DESC) sales_cume_dist
PERCENT_RANK(  ) OVER (ORDER BY SUM(tot_sales) DESC) sales_percent_rank

No comments:

Post a Comment