I need to count the words from one of the fields in the table and its data type is varchar (max). This is a very simple task, but the problem is, varchar field has unnecessary White Spaces, Tabs, Carriage Returns and Line Feeds as well. I searched it over the internet and found a solution but it works when it is a simple sentence without any hurdles or otherwise with loop. So, I thought of developing this solution without loop.
First of all, I found which character control has what char value. Given below are the details :
- Tab char(9)
- Line feed char(10)
- Carriage return char(13)
- White space char(32)
Solution : (Word count for complex paragraphs having unnecessary White Spaces, Tabs, Carriage Returns and Line Feeds)
CREATE FUNCTION dbo.[UDF_Word_Count_From_String]
(
@STRING VARCHAR(MAX) -- Variable for string
)
RETURNS VARCHAR(MAX)
BEGIN
DECLARE @RETURNSTRING AS VARCHAR(MAX)
Set @STRING=REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@STRING
,CHAR(9),CHAR(32))
,CHAR(13),CHAR(32))
,CHAR(10),CHAR(32))
,CHAR(32),'øù')
,'ùø','')
,'øù',CHAR(32))
SELECT @RETURNSTRING =LEN(@String) - LEN(REPLACE(@String,' ', '')) + 1
RETURN @RETURNSTRING
END
GO
Select dbo.[UDF_Word_Count_From_String]('How many
word do you
have in this sentence?') as [Word Count]
--OUTPUT
Word Count
——————–
9
(1 row(s) affected)
Let me know if you know a better solution.
I like your solution, especially for spaces merging into one. Otherwise I would use regular expressions.
This is a little cumbersome but it handles the whitespace issue nicely, its fast and inline, no udf.
DECLARE @Term VARCHAR(100) = ‘ this is pretty fast ‘
SELECT @Term, LEN(REPLACE(REPLACE(REPLACE(‘ ‘+@Term,’ ‘,’ ‘+CHAR(1)) ,CHAR(1)+’ ‘,”),CHAR(1),”)) – LEN(REPLACE(REPLACE(REPLACE(REPLACE(‘ ‘+@Term,’ ‘,’ ‘+CHAR(1)) ,CHAR(1)+’ ‘,”),CHAR(1),”),’ ‘,”)) [Word Count]
[…] How to remove extra spaces from string value is a common issue and we usually come across this issue while massaging the data. I used this solution as a part of a solution in one of my earlier solutions. […]