I have been asked this question how to remove the duplicate words in a sentence, when my team and I were busy in messaging legacy application data and we had to migrate it to SQL Server 2012. Also, we had to check and clean the data if there was any duplicate word in a sentence. So I started scripting and checking multiple options to develop this solution including loop & XML, but I usually prefer XML. So I finally developed the solution using XML.
Before proceeding with the solution, I would like to create an example to demonstrate the solution.
--DROP TABLE tbl_Sample --GO CREATE TABLE tbl_Sample ( [ID] INT IDENTITY(1,1), [Sentence] VARCHAR(MAX) ) GO INSERT INTO tbl_Sample VALUES ('This is the the test test script from from raresql.com') GO INSERT INTO tbl_Sample VALUES ('This should should remove duplicates') GO
The script of this solution is given below. and can be downloadable from here.
--DROP FUNCTION dbo.[UDF_Remove_Duplicate_Entry] --GO CREATE FUNCTION dbo.[UDF_Remove_Duplicate_Entry] ( @Duplicate_Word VARCHAR(MAX) ) RETURNS VARCHAR(MAX) BEGIN DECLARE @Xml XML DECLARE @Removed_Duplicate_Word VARCHAR(MAX) SET @Xml = CAST(('<A>'+REPLACE(@Duplicate_Word,' ','</A><A>')+'</A>') AS XML) ;WITH CTE AS ( SELECT ROW_NUMBER() OVER(ORDER BY A) AS [Sno], A.value('.', 'varchar(max)') AS [Column] FROM @Xml.nodes('A') AS FN(A) ) SELECT @Removed_Duplicate_Word =(SELECT Stuff((SELECT '' + ' ' + '' + A.[Column] FROM CTE A LEFT JOIN CTE B ON A.[Sno]+1=B.[Sno] WHERE (A.[Column]<>B.[Column] Or B.[Sno] is NULL) FOR XML PATH('') ),1,1,'')) RETURN @Removed_Duplicate_Word END GO SELECT [ID] ,[Sentence] As [Before Duplicate removal] ,dbo.[UDF_Remove_Duplicate_Entry]([Sentence]) As [After Duplicate removal] FROM tbl_Sample GO --OUTPUT
Let me know if you come across this scenario and its solution.
Leave a Reply