Today, I came across a situation where I needed to parse HTML tags and get plain text from it, and we do not have a built-in function in SQL SERVER to do it. So, I searched the solution over the internet but most of the solutions are designed from the loop.
So, I thought of doing it without loop and with the help of XQuery.
Given below is the user defined function to remove all HTML tags (“< >”) from any HTML string and return plain text.
CREATE FUNCTION dbo.[UDF_Parse_HTML_From_String] ( @HTML_STRING VARCHAR(MAX) -- Variable for string ) RETURNS VARCHAR(MAX) BEGIN DECLARE @STRING VARCHAR(MAX) Declare @Xml AS XML SET @Xml = CAST(('<A>'+ REPLACE(REPLACE(REPLACE(REPLACE(@HTML_STRING ,'<','@*'),'>','!'),'@','</A><A>'),'!','</A><A>') +'</A>') AS XML) ;WITH CTE AS (SELECT A.value('.', 'VARCHAR(MAX)') [A] FROM @Xml.nodes('A') AS FN(A) WHERE CHARINDEX('*', A.value('.', 'VARCHAR(MAX)'))=0 AND ISNULL(A.value('.', 'varchar(max)'),'')<>'') SELECT @STRING=STUFF((SELECT ' ' + [A] FROM CTE FOR XML PATH('')),1,1,'') RETURN @STRING END GO SELECT dbo.[UDF_Parse_HTML_From_String] ('<b>This is raresql.com</b><h2>HTML Parse User Defined Function</h2><a href="http://raresql.com"></a>') as [Text] --OUTPUT
Leave a Reply