Feeds:
Posts
Comments

Archive for the ‘Analytic Function’ Category

In this article, we will discuss another important analytical function introduced in SQL SERVER 2012 namely PERCENTILE_CONT. Lets discuss PERCENTILE_CONT syntax, purpose, return type with simple examples.

Syntax

PERCENTILE_CONT ( numeric_literal )
    WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] )
    OVER ( [ <partition_by_clause> ] )

Purpose
The purpose of this function is to calculate the percentile based on a continuous distribution of the field. In other words you can say that it also calculates the median but just to base it on the median, is not correct.

Return Type
The return type is float(53) and the value of percentile should be between 0 and 1.

Formula
Given below is the formula that will help you to understand how it works.
If (CRN = FRN = RN) then the result is
(value of expression from row at RN)
Otherwise the result is
(CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)

Where
RN = Row number
CRN = Ceiling of RN
FRN = Floor of RN

Lets implement this formula in the examples :
Example 1 : When PERCENTILE_CONT is 0.5
Lets take gold rate as an example to check their percentile based on a continuous distribution of the gold rate in one week.

Create table [Daily_Gold_Rate]
(
[S.No] int,
[Date] datetime,
[Carat] int,
[Gold Rate] numeric(18,2)
)

Insert into [Daily_Gold_Rate] values(1,'2013-01-03',18,155.00)
Insert into [Daily_Gold_Rate] values(2,'2013-01-03',22,190.50)
Insert into [Daily_Gold_Rate] values(3,'2013-01-03',24,202.23)

Insert into [Daily_Gold_Rate] values(4,'2013-01-04',18,153.00)
Insert into [Daily_Gold_Rate] values(5,'2013-01-04',22,191.00)
Insert into [Daily_Gold_Rate] values(6,'2013-01-04',24,202.25)

Insert into [Daily_Gold_Rate] values(7,'2013-01-05',18,150.00)
Insert into [Daily_Gold_Rate] values(8,'2013-01-05',22,190.00)
Insert into [Daily_Gold_Rate] values(9,'2013-01-05',24,203.25)

Insert into [Daily_Gold_Rate] values(10,'2013-01-06',18,158.00)
Insert into [Daily_Gold_Rate] values(11,'2013-01-06',22,189.50)
Insert into [Daily_Gold_Rate] values(12,'2013-01-06',24,201.50)

Select
[Date]
,[Carat]
,[Gold Rate]
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY [Gold rate])
OVER (PARTITION BY [Carat]) AS MedianCont
from [Daily_Gold_Rate]

percent_cont1.1

Lets calculate the formula in excel and view the results.
if Percentile is 0.5 then RN is not equal to CRN and is not equal to FRN. Therefore the formula will be
(CRN – RN) * (value of expression for row at FRN) +
(RN – FRN) * (value of expression for row at CRN)

percent_cont_excel1.1

Example 2 : When PERCENTILE_CONT is 1

Select
[Date]
,[Carat]
,[Gold Rate]
,PERCENTILE_CONT(1) WITHIN GROUP (ORDER BY [Gold rate])
OVER (PARTITION BY [Carat]) AS MedianCont
from [Daily_Gold_Rate]

percent_cont1.3-1

Lets calculate the formula in excel and view the results.
if Percentile is 1 then RN=CRN=FRN and the formula will be
If (CRN = FRN = RN) then the result is
(value of expression from row at RN)

percent_cont_excel1.2

Reference :MSDN

Read Full Post »

In my previous article I discussed about CUME_DIST. In this artcile we will discuss another important analytical function introduced in SQL SERVER 2012 and that is similar to CUME_DIST namely PERCENT_RANK. Lets discuss PERCENT_RANK syntax, purpose, return type, simple examples.

Syntax

PERCENT_RANK( )
    OVER ( [ partition_by_clause ] order_by_clause )

Purpose
The purpose of this function is to calculate the relative rank of a row within a group of rows. PERCENT_RANK of any set of First row value will be 0. PERCENT_RANK includes NULL values but they are treated as the lowest possible values.

Return Type
The return type is float(53) and the values are always between 0 and 1.

Example 1 :  Simple PERCENT_RANK()

Lets take gold rates as an example to check their relative rank in one week.

Create table [Daily_Gold_Rate]
(
(
[S.No] int,
[Date] datetime,
[Carat] int,
[Gold Rate] numeric(18,2)
)
Insert into [Daily_Gold_Rate] values(1,'2013-01-03',18,155.00)
Insert into [Daily_Gold_Rate] values(4,'2013-01-04',18,153.00)
Insert into [Daily_Gold_Rate] values(7,'2013-01-05',18,152.00)
Insert into [Daily_Gold_Rate] values(10,'2013-01-06',18,154.50)
Insert into [Daily_Gold_Rate] values(13,'2013-01-07',18,154.50)

GO

Select
Row_Number() OVER (ORDER BY [Carat],[Gold Rate]) as [Row Number]
,[Date]
,[Carat]
,[Gold Rate]
,PERCENT_RANK () OVER (ORDER BY [Carat],[Gold Rate]) AS [PERCENT_RANK]
from [Daily_Gold_Rate]

Explanation :


If you look at Column E and Column H, Column E is calculated by SQL Server and Column H is calculated manually to understand how it works. To calculate PERCENT_RANK() manually, you need two values.

1. Row Number based on the values (meaning less than or equal to value) (Column F). But first value will always be 0.
2. Total Number of Records – 1 (Column G)

Column G: It is simple; you need to calculate the total number of records and reduce 1 from it.
Column F: You simply need to get the row number based on the values. But first row number will be 0 as well, so you need start counting it from row number 2. You can observe that it is simple row number till row number 2 but in row number 3 & 4 we found the same Gold rate, so it picked up 2 (due to less than or equal to value criteria) as row number.

Finally, you need to divide Column F by Column G to get the PERCENT_RANK() manually. The same functionality PERCENT_RANK() does automatically in SQL.

Example 2 : PERCENT_RANK() with Partition By Clause

Lets insert other Gold rates to proceed with this example.

Insert into [Daily_Gold_Rate] values(2,'2013-01-03',22,190.50)
Insert into [Daily_Gold_Rate] values(3,'2013-01-03',24,202.23)

Insert into [Daily_Gold_Rate] values(5,'2013-01-04',22,191.00)
Insert into [Daily_Gold_Rate] values(6,'2013-01-04',24,202.25)

Insert into [Daily_Gold_Rate] values(8,'2013-01-05',22,190.00)
Insert into [Daily_Gold_Rate] values(9,'2013-01-05',24,203.25)

Insert into [Daily_Gold_Rate] values(11,'2013-01-06',22,189.50)
Insert into [Daily_Gold_Rate] values(12,'2013-01-06',24,201.50)

Insert into [Daily_Gold_Rate] values(14,'2013-01-07',22,189.00)
Insert into [Daily_Gold_Rate] values(15,'2013-01-07',24,201.00)

Select
Row_Number() OVER (Partition by [Carat] ORDER BY [Carat],[Gold Rate])
as [Row Number]
,[Date]
,[Carat]
,[Gold Rate]
,PERCENT_RANK () OVER (Partition by [Carat] ORDER BY [Carat],[Gold Rate])
AS [PERCENT_RANK]
from [Daily_Gold_Rate]

Reference :MSDN

Read Full Post »

CUME_DIST is a very nice & helpful analytical function introduced in SQL SERVER 2012. In this article, we will discuss its syntax, purpose, return type, simple examples and real world examples also.

syntax

CUME_DIST( )
    OVER ( [ partition_by_clause ] order_by_clause )

purpose
The purpose of this function is to calculate the cumulative distribution of value in a group of values.

Return Type
The return type is float(53) and the values are always between 0 and 1.

Example 1 :  Simple CUME_DIST ()

Lets take gold rates as an example to check their cumulative distribution in one week.

Create table [Daily_Gold_Rate]
(
(
[S.No] int,
[Date] datetime,
[Carat] int,
[Gold Rate] numeric(18,2)
)

Insert into [Daily_Gold_Rate] values(1,'2012-12-03',18,155.00)
Insert into [Daily_Gold_Rate] values(4,'2012-12-04',18,153.00)
Insert into [Daily_Gold_Rate] values(7,'2012-12-05',18,152.00)
Insert into [Daily_Gold_Rate] values(10,'2012-12-06',18,154.50)
Insert into [Daily_Gold_Rate] values(13,'2012-12-07',18,154.50)

GO

Select
Row_Number() OVER (ORDER BY [Carat],[Gold Rate]) as [Row Number]
,[Date]
,[Carat]
,[Gold Rate]
,CUME_DIST () OVER (ORDER BY [Carat],[Gold Rate]) AS [CUME_DIST]
from [Daily_Gold_Rate]

Explanation :


If you look at Column E and Column H, Column E is calculated by SQL Server and Column H is calculated manually to understand how it works. To calculate CUME_DIST manually, you need two values.

1. Row Number based on the values (Meaning less than or equal to value) (Column F)
2. Total Number of Records (Column G)

Column G is simple; you need to calculate the total number of records.
Column F: You simply need to get the row number based on the values. You can observe that it is simple row number till row number 2 but in row number 3 & 4 we found the same Gold rate, so it picked up 4 (due to less than or equal to value criteria) as row number.

Finally, you need to divide Column F by Column G to get the CUME_DIST manually. The same functionality CUME_DIST does automatically in SQL.

Example 2 : CUME_DIST () with Partition By Clause

Lets insert other Gold rates to proceed with this example.

Insert into [Daily_Gold_Rate] values(2,'2012-12-03',22,190.50)
Insert into [Daily_Gold_Rate] values(3,'2012-12-03',24,202.23)

Insert into [Daily_Gold_Rate] values(5,'2012-12-04',22,191.00)
Insert into [Daily_Gold_Rate] values(6,'2012-12-04',24,202.25)

Insert into [Daily_Gold_Rate] values(8,'2012-12-05',22,190.00)
Insert into [Daily_Gold_Rate] values(9,'2012-12-05',24,203.25)

Insert into [Daily_Gold_Rate] values(11,'2012-12-06',22,189.50)
Insert into [Daily_Gold_Rate] values(12,'2012-12-06',24,201.50)

Insert into [Daily_Gold_Rate] values(14,'2012-12-07',22,189.00)
Insert into [Daily_Gold_Rate] values(15,'2012-12-07',24,201.00)

Select
Row_Number() OVER (Partition by [Carat] ORDER BY [Carat],[Gold Rate])
as [Row Number]
,[Date]
,[Carat]
,[Gold Rate]
,CUME_DIST () OVER (Partition by [Carat] ORDER BY [Carat],[Gold Rate])
AS [CUME_DIST]
from [Daily_Gold_Rate]

Reference :MSDN

Read Full Post »

In my previous article, I discussed about LAG. Today, we will discuss another analytical function namely“LEAD” in SQL SERVER 2012. This function performs exactly opposite functionality of LAG. In simple words we can say that if we need to pick up any value from the next row(s) of the current row, we can use this function and get it, instead of using self joins.

Let me explain its syntax, parameters, purpose and examples in detail.
Syntax :

    LEAD ( scalar_expression [ ,offset ] , [ default ] ) 
    OVER ( [ partition_by_clause ] order_by_clause )

Parameters :

@scalar_expression :The value to be returned based on the offset. (Mandatory)
@offset : The number of rows ahead of the rows from where we need to pick the value. Default value is 1 (Optional)
@default : If @scalar_expression is NULL, then you can define any default value here to be returned (Optional)

Purpose :
The purpose of this function is to get the next row in front of the current row in the same result without using any self joins. And once you get the next row parallel to current, you can easily compare their values and perform any analytical task.

Lets create an example to explain “LEAD” function in SQL SERVER 2012.

USE tempdb
GO
Create Table [Test_Table]
(
[S.No] int,
[Letters] varchar(50)
)
GO
Insert into [Test_Table] values (1,'LETTER A')
Insert into [Test_Table] values (2,'LETTER B')
Insert into [Test_Table] values (3,'LETTER C')
Insert into [Test_Table] values (5,'LETTER D')
Insert into [Test_Table] values (6,'LETTER E')
Insert into [Test_Table] values (7,'LETTER F')
Insert into [Test_Table] values (8,'LETTER G')
Insert into [Test_Table] values (9,'LETTER H')
Insert into [Test_Table] values (10,'LETTER I')

Let me explain this with simple examples.

Example-1 : LEAD- WITH DEFAULT VALUES

Select [S.No],[Letters]
,LEAD([Letters]) Over (Order By [Letters]) as [Next Value]
from  [Test_Table]

Example-2 : LEAD – REPLACE DEFAULT VALUE NULL WITH EMPTY STRING
In the above example, you can view that if LEAD could not find the value it gives you NULL. In order to avoid NULL, you can pass the third parameter and it will be replaced by NULL.
In the example given below, I replaced NULL with empty spaces (”).

Select [S.No],[Letters]
,LEAD([Letters],1,'') Over (Order By [Letters]) as [Next Value]
from  [Test_Table]

Example-3 : LEAD – CHANGE OFFSET VALUE FROM DEFAULT TO 2
In the above examples, you can view that LEAD function picks one row subsequent value than current row.
In this example, we will set the offset value to 2 to pick up two rows’ next value than current.

Select [S.No],[Letters]
,LEAD([Letters],2,'') Over (Order By [Letters]) as [Next Value]
from  [Test_Table]

Example-4 : LEAD – REAL WORLD – FIND Promotions of Employees
In the earlier version of SQL SERVER, if you need to find current designation with the promotion of any employee in the same result set , you need to use the self join to achieve it. But in SQL SERVER 2012, you can use the LEAD function to achieve it.

Lets create an example to explain this.

Use tempdb
Create table [tbl_Promotion]
(
[S.No] int,
[Date] datetime,
[Employee Name] varchar(50),
[Designation]  varchar(50)
)
 
Insert into [tbl_Promotion] values(1,'2009-08-20','Imran','Assistant Manager')
Insert into [tbl_Promotion] values(2,'2011-11-21','Imran','Manager')
 
Insert into [tbl_Promotion] values(3,'2010-09-05','Bob','Technical Manager')
Insert into [tbl_Promotion] values(4,'2012-10-06','Bob','Technical Director')
 
Insert into [tbl_Promotion] values(5,'2012-01-10','Robert','Junior Developer')
Insert into [tbl_Promotion] values(6,'2012-06-11','Robert','Developer')
 
GO
Select * from [tbl_Promotion]
 
GO
 
Select [S.No],[Date],[Employee Name],[Designation]
,LEAD ([Designation],1,0) Over (Partition By [Employee Name] Order By [S.No]) as [Next Promotion]
from [tbl_Promotion]
Go
Drop Table [tbl_Promotion]

Reference :MSDN

Read Full Post »

Today, we will discuss one of the important analytical functions namely “LAG” in SQL SERVER 2012. In simple words we can say that if we need to pick up any value from the previous rows in the current row, we can use this function and get it, instead of using self joins.

Lets discuss each and every aspect of this function. Also we will discuss the need/importance of this function in SQL server.

By using this function, we can reduce the number of codes because in the previous versions of SQL SERVER, to do the same, we had to use self joins to achieve it.

Let me explain its syntax, parameters, purpose and examples in detail.
Syntax :

      LAG (scalar_expression [,offset] [,default])
    OVER ( [ partition_by_clause ] order_by_clause )

Parameters :

@scalar_expression :The value to returned based on the offset. (Mandatory)
@offset : The number of rows back from the rows from where we need to pick the value. Default value is 1 (Optional)
@default : If @scalar_expression is NULL, then you can define any default value here to be returned.(Optional)

Purpose :
The purpose of this function is to get the previous row next to current row in the same result without using any self joins. And once you will get the previous row parallel to current, you can easily compare their values and perform any analytical task.

Lets create an example to explain “LAG” function in SQL SERVER 2012.

USE tempdb 
GO
Create Table [Test_Table]
(
[S.No] int,
[Letters] varchar(50)
)
GO
Insert into [Test_Table] values (1,'LETTER A')
Insert into [Test_Table] values (2,'LETTER B')
Insert into [Test_Table] values (3,'LETTER C')
Insert into [Test_Table] values (5,'LETTER D')
Insert into [Test_Table] values (6,'LETTER E')
Insert into [Test_Table] values (7,'LETTER F')
Insert into [Test_Table] values (8,'LETTER G')
Insert into [Test_Table] values (9,'LETTER H')
Insert into [Test_Table] values (10,'LETTER I')

Let me explain this with simple examples.

Example-1 : LAG – WITH DEFAULT VALUES

Select [S.No],[Letters]
,LAG([Letters]) Over (Order By [Letters]) as [Previous Value]
from  [Test_Table]

Example-2 : LAG – REPLACE DEFAULT VALUE NULL WITH EMPTY STRING
In the above example, you can view that if LAG could not find the value it gives you NULL. In order to avoid NULL, you can pass the third parameter and it will be replaced by NULL.
Given below example, I replaced NULL with empty spaces (”).

Select [S.No],[Letters]
,LAG([Letters],1,'') Over (Order By [Letters]) as [Previous Value]
from  [Test_Table]

Example-3 : LAG – CHANGE OFFSET VALUE FROM DEFAULT OF 2
In the above examples, you can view that LAG function picks one row previous value than current row.
In this example, we will set the offset value to 2 to pick up two rows previous value than current.

Select [S.No],[Letters]
,LAG([Letters],2,'') Over (Order By [Letters]) as [Previous Value]
from  [Test_Table]

Example-4 : LAG – REAL WORLD – FIND PREVIOUS DAY EXCHANGE RATE
In the earlier version of SQL SERVER, if you need to find previous day exchange rate, you need to use the self join to achieve it. But in SQL SERVER 2012, you can use the LAG function to achieve it.

Lets create an example to explain this.

Use tempdb
GO
Create table [Daily_Currency_Rate]
(
[S.No] int,
[Date] datetime,
[Currency] varchar(50),
[Exchange Rate] numeric(18,4)
)

Insert into [Daily_Currency_Rate] values(1,'2012-11-20','Indian Rupee (INR)',14.86)
Insert into [Daily_Currency_Rate] values(2,'2012-11-21','Indian Rupee (INR)',14.87)

Insert into [Daily_Currency_Rate] values(3,'2012-11-20','Pakistani Rupee (PKR)',25.84)
Insert into [Daily_Currency_Rate] values(4,'2012-11-21','Pakistani Rupee (PKR)',25.83)

Insert into [Daily_Currency_Rate] values(5,'2012-11-20','Sri Lankan Rupee (LKR)',35.38)
Insert into [Daily_Currency_Rate] values(6,'2012-11-21','Sri Lankan Rupee (LKR)',35.44)

GO
Select * from [Daily_Currency_Rate]

GO

Select [S.No],[Date],[Currency],[Exchange Rate]
,LAG([Exchange Rate],1,0) Over (Partition By [Currency] Order By [Currency]) as [Yesterday Rate]
from [Daily_Currency_Rate]

GO
Drop Table [Daily_Currency_Rate]

Reference :MSDN

Read Full Post »

In my previous Post, I discussed about FIRST_VALUE.Today, we will discuss another important analytical function namely “LAST_VALUE”. We can also say this is the reverse of FIRST_VALUE analytical function.

Lets discuss this function syntax, purpose and examples in detail.

Syntax :

 LAST_VALUE ( [scalar_expression ) OVER ( [ partition_by_clause ] order_by_clause rows_range_clause )

Purpose :
It returns the Last value from a list in a specified order. Also we can define partition & rows range parameter in this function, but these are optional parameters. By default Row range is between unbounded preceding and current row or range unbounded preceding.
It is better to define “row range” parameter in the LAST_Value function. Later in this article, we will also discuss the importance of “row range” parameter in LAST_VALUE.

Lets create a student table and insert few records in it to demonstrate LAST_VALUE.

Create Table [Student]
(
[Student ID] int,
[Student Name] varchar(50),
[Subject] varchar(50),
[Marks] int
)
GO

Insert into [Student] values (3,'Derek','English',100)
Insert into [Student] values (3,'Derek','Math',60)
Insert into [Student] values (3,'Derek','Science',90)

Insert into [Student] values (1,'Bob','English',80)
Insert into [Student] values (1,'Bob','Math',75)
Insert into [Student] values (1,'Bob','Science',60)

Insert into [Student] values (2,'Sandra','English',70)
Insert into [Student] values (2,'Sandra','Math',80)
Insert into [Student] values (2,'Sandra','Science',40)
Select * from [Student] Order By [Student ID]

Example-1 :
In this example, LAST_VALUE returns the Last Name of the student in the class (In alphabetical order) . And the expected result is “Sandra”.

 Select
 [Student ID]
,[Student Name]
,[Subject]
,[Marks]
,LAST_Value([Student Name]) Over (Order By [Student Name])
as [Last Value]
from dbo.[Student]
--Result

But Unfortunately, this is not our expected result, It is the same row value in Last Value &  student name column.

Here “row range ” optional parameters of LAST_VALUE comes to solve this problem. Now, we need to change the “row range” value from default to CURRENT ROW AND UNBOUNDED FOLLOWING.

Lets make another example with row range CURRENT ROW AND UNBOUNDED FOLLOWING.

 Select
 [Student ID]
,[Student Name]
,[Subject]
,[Marks]
,LAST_Value([Student Name]) Over (Order By [Student Name]
ROWS BETWEEN  CURRENT ROW AND UNBOUNDED FOLLOWING) as [Last Value]
from dbo.[Student]
--Result

Now, we got our expected result by defining “row range”.

Example-2 :
In this example, LAST_VALUE returns the student name having least score in the class. Expected result is : “Sandra”. Because she is having least score (40) in science in the entire class.

Select
 [Student ID]
,[Student Name]
,[Subject]
,[Marks]
,LAST_Value([Student Name]) Over (Order By [Marks] Desc
ROWS BETWEEN  CURRENT ROW AND UNBOUNDED FOLLOWING) AS [LAST Value]
from dbo.[Student]
--Result

Example-3 :
In this example, LAST_VALUE returns the student name having least score in each subject in the class. Expected result is : Sandra in “English & Science” & Derek in “Maths”.

Select
 [Student ID]
,[Student Name]
,[Subject]
,[Marks]
,LAST_Value([Student Name]) Over (Partition By [Subject]
Order By [Marks] Desc
ROWS BETWEEN  CURRENT ROW AND UNBOUNDED FOLLOWING) AS [LAST Value]
from dbo.[Student]
--Result

Reference : MSDN

Read Full Post »

In SQL server 2012, Microsoft has introduced some new analytical functions and one of the most important functions in it is FIRST_VALUE. As It name implies, it does the same.

Lets discuss this function syntax, purpose and examples in detail.

Syntax :

    FIRST_VALUE ( [scalar_expression )
    OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] )

Purpose :
It returns the first value from a list in a specified order (By default ascending order). Also we can do the partition in it but it is optional.

Lets create a student table and insert few records in it to demonstrate FIRST_VALUE.

Create Table [Student]
(
[Student ID] int,
[Student Name] varchar(50),
[Subject] varchar(50),
[Marks] int
)
GO

Insert into [Student] values (3,'Derek','English',100)
Insert into [Student] values (3,'Derek','Math',60)
Insert into [Student] values (3,'Derek','Science',90)

Insert into [Student] values (1,'Bob','English',80)
Insert into [Student] values (1,'Bob','Math',75)
Insert into [Student] values (1,'Bob','Science',60)
 
Insert into [Student] values (2,'Sandra','English',70)
Insert into [Student] values (2,'Sandra','Math',80)
Insert into [Student] values (2,'Sandra','Science',40)
Select * from [Student] Order By [Student ID]

Example-1 :
In this example, FIRST_VALUE returns the First Name of the student in the class (In ascending order) .

Select
 [Student ID]
,[Student Name]
,[Subject]
,[Marks]
,FIRST_Value([Student Name]) Over (Order By [Student Name]) as [First Value]
from dbo.[Student]
--Result

Example-2 :
In this example, FIRST_VALUE returns the student name having high score in the class.

Select
 [Student ID]
,[Student Name]
,[Subject]
,[Marks]
,FIRST_Value([Student Name]) Over (Order By [Marks] Desc) as [First Value]
from dbo.[Student]
--Result

Example-3 :
In this example, FIRST_VALUE returns the student name having high score in each subject in the class.

Select
 [Student ID]
,[Student Name]
,[Subject]
,[Marks]
,FIRST_Value([Student Name]) Over (Partition By [Subject] Order By [Marks] Desc) as [First Value]
from dbo.[Student]
--Result

Reference : MSDN

Read Full Post »