We have seen a number of posts on various string operations using XQuery functions and FOR XML operator. This post explains how to match a delimited string against another delimited string.
I wanted to share a piece of code that I wrote to help a member of our User Group. I found it to be very interesting and thought of sharing with all of you. One of the reasons why I wanted to share this example is because it uses an XML approach to solve the given problem.
Let us get into the details of the problem. Here is the data from a table that we are going to query.
id Colors
----------- --------------------
1 Red
2 Red,Green
3 Green, Yellow
4 Yellow,Red,White
5 Green, White
6 White
We need to write a stored procedure that receives a comma delimited string that contains one or more color names. For example, if we receive "Red,Yellow", we need to return every row that has "Red" or "Yellow" in the "Colors" column. (All rows except 5 and 6 in this example)
I suppose there are a number of ways to write this query. One of the options is to split the incoming parameter using your favorite method (and I decided to use XML). Once we have split the parameter, we can find the appropriate values using PATINDEX().
Here is the code that performs this.
SET NOCOUNT ON;
DECLARE @t TABLE (id INT, Colors VARCHAR(20))
INSERT INTO @t (id, Colors) SELECT 1, 'Red'
INSERT INTO @t (id, Colors) SELECT 2, 'Red,Green'
INSERT INTO @t (id, Colors) SELECT 3, 'Green, Yellow'
INSERT INTO @t (id, Colors) SELECT 4, 'Yellow,Red,White'
INSERT INTO @t (id, Colors) SELECT 5, 'Green, White'
INSERT INTO @t (id, Colors) SELECT 6, 'White'
DECLARE @Filter VARCHAR(100)
SET @Filter = 'Red,Yellow'
DECLARE @xmlFilter XML
SELECT @xmlFilter = CAST('' + REPLACE(@Filter, ',', '') + '' AS XML)
SELECT DISTINCT
id,
Colors
FROM @t t
CROSS JOIN (
SELECT
x.i.value('.', 'VARCHAR(10)') AS filter
FROM @XmlFilter.nodes('//i') x(i)
) b
WHERE PATINDEX('%' + b.filter + '%', Colors) > 0
/*
id Colors
----------- --------------------
3 Green, Yellow
1 Red
2 Red,Green
4 Yellow,Red,White
*/
I have not analyzed the performance factors. The CROSS JOIN might be a little expensive. PATINDEX() is found to be performing well, in many of the tests I did in the past. In fact, I found it to be performing better than LIKE, in many cases.
Let us see how this query works.
The first part of the query converts the comma delimited string to an XML document.
SELECT @xmlFilter = CAST('' + REPLACE(@Filter, ',', '') + '' AS XML)
/*
Red
Yellow
*/
Then the inner query retrieves the values from the XML document.
SELECT
x.i.value('.', 'VARCHAR(10)') AS filter
FROM @XmlFilter.nodes('//i') x(i)
/*
filter
----------
Red
Yellow
*/
The CROSS JOIN generates the following.
id Colors
----------- --------------------
1 Red
1 Red
2 Red,Green
2 Red,Green
3 Green, Yellow
3 Green, Yellow
4 Yellow,Red,White
4 Yellow,Red,White
5 Green, White
5 Green, White
6 White
6 White
The WHERE filter then identifies the rows that has one of the values present in the XML result.
id Colors
----------- --------------------
1 Red
2 Red,Green
3 Green, Yellow
4 Yellow,Red,White
4 Yellow,Red,White
Finally, the DISTINCT clause removes duplicate rows generated by the CROSS JOIN and we get the final result.
id Colors
----------- --------------------
3 Green, Yellow
1 Red
2 Red,Green
4 Yellow,Red,White