Today, I came across a situation where I needed to parse HTML tags and get plain text from it, and we do not have a built-in function in SQL SERVER to do it. So, I searched the solution over the internet but most of the solutions are designed from the loop.
So, I thought of doing it without loop and with the help of XQuery.
Given below is the user defined function to remove all HTML tags (“< >”) from any HTML string and return plain text.
CREATE FUNCTION dbo.[UDF_Parse_HTML_From_String]
(
@HTML_STRING VARCHAR(MAX) -- Variable for string
)
RETURNS VARCHAR(MAX)
BEGIN
DECLARE @STRING VARCHAR(MAX)
Declare @Xml AS XML
SET @Xml = CAST(('<A>'+ REPLACE(REPLACE(REPLACE(REPLACE(@HTML_STRING
,'<','@*'),'>','!'),'@','</A><A>'),'!','</A><A>') +'</A>') AS XML)
;WITH CTE AS (SELECT A.value('.', 'VARCHAR(MAX)') [A]
FROM @Xml.nodes('A') AS FN(A) WHERE CHARINDEX('*',
A.value('.', 'VARCHAR(MAX)'))=0
AND ISNULL(A.value('.', 'varchar(max)'),'')<>'')
SELECT @STRING=STUFF((SELECT ' ' + [A] FROM CTE FOR XML PATH('')),1,1,'')
RETURN @STRING
END
GO
SELECT dbo.[UDF_Parse_HTML_From_String] ('<b>This is raresql.com</b><h2>HTML Parse User Defined Function</h2><a href="http://raresql.com"></a>') as [Text]
--OUTPUT

Leave a comment