Feeds:
Posts
Comments

Posts Tagged ‘Strip HTML’

Today, I came across a situation where I needed to parse HTML tags and get plain text from it, and we do not have a built-in function in SQL SERVER to do it. So, I searched the solution over the internet but most of the solutions are designed from the loop.
So, I thought of doing it without loop and with the help of XQuery.
Given below is the user defined function to remove all HTML tags (“< >”) from any HTML string and return plain text.

CREATE FUNCTION dbo.[UDF_Parse_HTML_From_String]
(
 @HTML_STRING VARCHAR(MAX) -- Variable for string
)
RETURNS VARCHAR(MAX)
BEGIN

	DECLARE @STRING VARCHAR(MAX)
	Declare @Xml AS XML
	SET @Xml = CAST(('<A>'+ REPLACE(REPLACE(REPLACE(REPLACE(@HTML_STRING
        ,'<','@*'),'>','!'),'@','</A><A>'),'!','</A><A>') +'</A>') AS XML)

	;WITH CTE AS (SELECT A.value('.', 'VARCHAR(MAX)') [A] 
         FROM @Xml.nodes('A') AS FN(A) WHERE CHARINDEX('*',
         A.value('.', 'VARCHAR(MAX)'))=0 
         AND ISNULL(A.value('.', 'varchar(max)'),'')<>'')

	SELECT @STRING=STUFF((SELECT ' ' + [A] FROM CTE FOR XML PATH('')),1,1,'')
	RETURN @STRING
END
GO
SELECT dbo.[UDF_Parse_HTML_From_String] ('<b>This is raresql.com</b><h2>HTML Parse User Defined Function</h2><a href="http://raresql.com"></a>') as [Text]
--OUTPUT

Advertisements

Read Full Post »