Posts Tagged ‘CUME_DIST()’

CUME_DIST is a very nice & helpful analytical function introduced in SQL SERVER 2012. In this article, we will discuss its syntax, purpose, return type, simple examples and real world examples also.


    OVER ( [ partition_by_clause ] order_by_clause )

The purpose of this function is to calculate the cumulative distribution of value in a group of values.

Return Type
The return type is float(53) and the values are always between 0 and 1.

Example 1 :  Simple CUME_DIST ()

Lets take gold rates as an example to check their cumulative distribution in one week.

Create table [Daily_Gold_Rate]
[S.No] int,
[Date] datetime,
[Carat] int,
[Gold Rate] numeric(18,2)

Insert into [Daily_Gold_Rate] values(1,'2012-12-03',18,155.00)
Insert into [Daily_Gold_Rate] values(4,'2012-12-04',18,153.00)
Insert into [Daily_Gold_Rate] values(7,'2012-12-05',18,152.00)
Insert into [Daily_Gold_Rate] values(10,'2012-12-06',18,154.50)
Insert into [Daily_Gold_Rate] values(13,'2012-12-07',18,154.50)


Row_Number() OVER (ORDER BY [Carat],[Gold Rate]) as [Row Number]
,[Gold Rate]
,CUME_DIST () OVER (ORDER BY [Carat],[Gold Rate]) AS [CUME_DIST]
from [Daily_Gold_Rate]

Explanation :

If you look at Column E and Column H, Column E is calculated by SQL Server and Column H is calculated manually to understand how it works. To calculate CUME_DIST manually, you need two values.

1. Row Number based on the values (Meaning less than or equal to value) (Column F)
2. Total Number of Records (Column G)

Column G is simple; you need to calculate the total number of records.
Column F: You simply need to get the row number based on the values. You can observe that it is simple row number till row number 2 but in row number 3 & 4 we found the same Gold rate, so it picked up 4 (due to less than or equal to value criteria) as row number.

Finally, you need to divide Column F by Column G to get the CUME_DIST manually. The same functionality CUME_DIST does automatically in SQL.

Example 2 : CUME_DIST () with Partition By Clause

Lets insert other Gold rates to proceed with this example.

Insert into [Daily_Gold_Rate] values(2,'2012-12-03',22,190.50)
Insert into [Daily_Gold_Rate] values(3,'2012-12-03',24,202.23)

Insert into [Daily_Gold_Rate] values(5,'2012-12-04',22,191.00)
Insert into [Daily_Gold_Rate] values(6,'2012-12-04',24,202.25)

Insert into [Daily_Gold_Rate] values(8,'2012-12-05',22,190.00)
Insert into [Daily_Gold_Rate] values(9,'2012-12-05',24,203.25)

Insert into [Daily_Gold_Rate] values(11,'2012-12-06',22,189.50)
Insert into [Daily_Gold_Rate] values(12,'2012-12-06',24,201.50)

Insert into [Daily_Gold_Rate] values(14,'2012-12-07',22,189.00)
Insert into [Daily_Gold_Rate] values(15,'2012-12-07',24,201.00)

Row_Number() OVER (Partition by [Carat] ORDER BY [Carat],[Gold Rate])
as [Row Number]
,[Gold Rate]
,CUME_DIST () OVER (Partition by [Carat] ORDER BY [Carat],[Gold Rate])
from [Daily_Gold_Rate]

Reference :MSDN

Read Full Post »