I need to count the words from one of the fields in the table and its data type is varchar (max). This is a very simple task, but the problem is, varchar field has unnecessary White Spaces, Tabs, Carriage Returns and Line Feeds as well. I searched it over the internet and found a solution but it works when it is a simple sentence without any hurdles or otherwise with loop. So, I thought of developing this solution without loop.
First of all, I found which character control has what char value. Given below are the details :
- Tab char(9)
- Line feed char(10)
- Carriage return char(13)
- White space char(32)
Solution : (Word count for complex paragraphs having unnecessary White Spaces, Tabs, Carriage Returns and Line Feeds)
CREATE FUNCTION dbo.[UDF_Word_Count_From_String] ( @STRING VARCHAR(MAX) -- Variable for string ) RETURNS VARCHAR(MAX) BEGIN DECLARE @RETURNSTRING AS VARCHAR(MAX) Set @STRING=REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@STRING ,CHAR(9),CHAR(32)) ,CHAR(13),CHAR(32)) ,CHAR(10),CHAR(32)) ,CHAR(32),'øù') ,'ùø','') ,'øù',CHAR(32)) SELECT @RETURNSTRING =LEN(@String) - LEN(REPLACE(@String,' ', '')) + 1 RETURN @RETURNSTRING END GO Select dbo.[UDF_Word_Count_From_String]('How many word do you have in this sentence?') as [Word Count] --OUTPUT
Word Count
——————–
9
(1 row(s) affected)
Let me know if you know a better solution.
I like your solution, especially for spaces merging into one. Otherwise I would use regular expressions.
This is a little cumbersome but it handles the whitespace issue nicely, its fast and inline, no udf.
DECLARE @Term VARCHAR(100) = ‘ this is pretty fast ‘
SELECT @Term, LEN(REPLACE(REPLACE(REPLACE(‘ ‘+@Term,’ ‘,’ ‘+CHAR(1)) ,CHAR(1)+’ ‘,”),CHAR(1),”)) – LEN(REPLACE(REPLACE(REPLACE(REPLACE(‘ ‘+@Term,’ ‘,’ ‘+CHAR(1)) ,CHAR(1)+’ ‘,”),CHAR(1),”),’ ‘,”)) [Word Count]
[…] How to remove extra spaces from string value is a common issue and we usually come across this issue while massaging the data. I used this solution as a part of a solution in one of my earlier solutions. […]