Solution for challenge #5
This is my solution for challenge #5:
DECLARE @filter TABLE (id INT IDENTITY, data nvarchar(32))
insert into @filter (data) values ('sql,performance')
insert into @filter (data) values ('profiler')
insert into @filter (data) values ('table,performance,view')
DECLARE @blog TABLE (id INT IDENTITY, name nvarchar(20), tags nvarchar(20))
insert into @blog (name,tags) values ('post 1','sql,profiler,table')
insert into @blog (name,tags) values ('post 2','sql,performance')
insert into @blog (name,tags) values ('post 3','profilter')
insert into @blog (name,tags) values ('post 4','view,table')
;with cte
as( select n=1,
id,
tag=case charindex(',',tags)
when 0 then tags
else substring(tags,1,charindex(',',tags,1)-1) end ,
txt=substring(tags,charindex(',',tags,2)+1,len(tags))
from @blog
union all
select n=n+1,
id,
tag=case charindex(',',txt)
when 0 then txt
else substring(txt,1,charindex(',',txt,1)-1) end ,
txt=substring(txt,charindex(',',txt,2)+1,len(txt))
from cte
where txt!=tag
)
select
f.id,
f.data,
b.[name],
b.tags,
RELEVANCE=count(*)
from
cte join
@blog b on b.id=cte.id join
@filter f on charindex(','+cte.tag+',' , ','+f.data+',')>0
group by
f.id,
f.data,
b.[name],
b.tags
order by
id,
count(*)desc,
b.[name]
To solve this challenge I normalized the @blog table and then checked how many times each tag in the normalized table exists in the [data] field in the @filter table.
By normalizing @blog table I mean bringing it to the form like
id tag
-- ---
1 sql
1 profiler
1 table
2 sql
2 performance
3 profilter
4 view
4 table
To achive this we need to split delimited strings in the [tags] field for each record in the @blog table. This task can be done in a number of ways but I decided to use common table expression (CTE). CTEs are available in SQL Server 2005 and up, so it's not going to work in SQL Server 2000. I used a recursive form of CTE. It allowed me to solve the challenge in one query without using a loop or a cursor.
Recursive CTE consists of 2 parts integrated by UNION ALL operator: anchor member and recursive member.
To show how it works I added recursions counter n to my original code and limited it to 2 as in the following:
;with cte
as(
select n=1,
id,
tag=case charindex(',',tags)
when 0 then tags
else substring(tags,1,charindex(',',tags,1)-1)end ,
txt=substring(tags,charindex(',',tags,2)+1,len(tags))
from @blog
union all
select n=n+1,
id,
tag=case charindex(',',txt)
when 0 then txt
else substring(txt,1,charindex(',',txt,1)-1) end ,
txt=substring(txt,charindex(',',txt,2)+1,len(txt))
from cte
where n<2
)
select * from cte
order by n,id
This code generates the result set below:
n id tag txt
-- -- ---------- --------------
1 1 sql profiler,table
1 2 sql performance
1 3 profilter profilter
1 4 view table
2 1 profiler table
2 2 performance performance
2 3 profilter profilter
2 4 table table
In each iteration 3 things happen:
- We check the delimited string for a delimiter and if it exists retrieve the leftmost item in that string
- Create a new delimited string of tags without the retrieved item and the following delimiter
- Accumulate results in CTE
The only difference is that anchor member queries @blog table and recursive member queries CTE itself.
It's important to tell the CTE when to stop the recursion.
In our case we can stop when the number of iterations exceeds the maximum number of delimiters in tags field values (2 commas in the string ‘table,performance,view’ in the given test data).
On other hand, as we can see in the output above, when the splitting completed the values in the tag and txt fields are the same.
So in my solution I decided to use it in a where clause of the recursive member in CTE as a stop point of recursion:
where txt != tag
It also reduces the number of rows to be processed in each iteration.
Then we join cte result to the @filter table using
join @filter f on charindex(','+cte.tag+',' , ','+f.data+',')>0
Here we take only those records from the @filter table where we have the [tag] value from the cte presented as an item in the delimited string in the [data] field.
I surrounded field values with the delimiters to avoid partial matches, when [tag] value is just a part of the item.
The pitfall would be to use just charindex(cte.tag ,f.data)>0 ,which might cause relevance miscounts.
The potential problem is not obvious from the test data that we were provided with.
Let's look at an example when we have "TAB" value as a tag and "profiler,TABle", "daTABase,profiler", "performance,TAB,profiler" values in the [data] field.
We do not want to match the tag "TAB" with the strings "profiler,TABle" or daTABase,profiler". Only the records with the string where "tab" is an item like "x,TAB,xx" or "TAB,xx" or "x,TAB" should be included ("performance,TAB,profiler" in my example.)
The other way would be to normalize both @blog and @filter table and use regular joins as in the Rui Carvalho ’s solution.
And the rest of the code is just a join to @blog table to get [name] and [tags] and count of the relevance.
You May Also Like These Related Posts
If you like this article,
Subscribe in a reader or Subscribe by Email. Show your support by sharing this article with your friends through the services given below.