It seems like the most frequently-asked question on the MSDN T-SQL Forum is how to put together a column that contains a list of some kind.
For example, take a look at the Northwind database. There’s a Categories table and a Products table. We’d like to produce a list of all Categories and, for each Category, produce a column containing a comma-delimited (and sorted) list of ProductID’s that are discontinued.
The standard answer that we give to these kinds of questions (providing that the original poster of the question is using SQL2005 or beyond) is something like this:
select CategoryName
,DiscontinuedIDs=
stuff(
(
select ','+convert(varchar,ProductID)
from Products
where CategoryID=Categories.CategoryID
and Discontinued=1
order by ProductID
for xml path('')
)
,1,1,'')
from Categories
order by CategoryName
/*
CategoryName DiscontinuedIDs
--------------- ---------------
Beverages 24
Condiments 5
Confections NULL
Dairy Products NULL
Grains/Cereals 42
Meat/Poultry 9,17,29,53
Produce 28
Seafood NULL
*/
Most times the poster will come back and say, “Er… uh… Thanks.” But occasionally someone will say, “Huh? What’s with all that XML stuff? Could you explain what the heck that complicated mess o’ code is doing please?”
I know when I first started studying T-SQL and started frequenting the forums and saw this kind of answer, I felt kind of intimidated and figured that this stuff was more complex than I had thought.
But it all becomes clear if you go through how this works step by step.
For illustration purposes, let’s create a temporary table #Stooges and populate it:
create table #Stooges (ID int, Name varchar(10))
insert #Stooges select 1,'Moe'
union all select 2,'Larry'
union all select 3,'Curly'
Now if we do a normal query on this table, we get rows and columns:
select * from #Stooges
/*
ID Name
--- ------
1 Moe
2 Larry
3 Curly
*/
But if we add a FOR XML clause to the query, we get an XML representation of the data:
select * from #Stooges for xml path
/*
XML_F52E2B61-18A1-11d1-B105-00805F49916B
----------------------------------------
<row><ID>1</ID><Name>Moe</Name></row>
<row><ID>2</ID><Name>Larry</Name></row>
<row><ID>3</ID><Name>Curly</Name></row>
*/
In reality, it did not return 3 rows of data… I just presented it that way for readability. In fact, it didn’t produce any rows or columns… it actually produced an XML stream. Any query that has the FOR XML clause (without the ,TYPE directive) produces an XML stream. The data is streamed out in (2033-character) chunks as the query is executed, as you can see below, where I query the 120,000-record SalesOrderDetail table in AdventureWorks. The first part of the XML stream is plopped into the SSMS Results window almost immediately, and the query is still grinding away to finish up the rest of the stream.

(If you want to read more about how the XML streams are created, read this blog entry from Eugene Kogan of Microsoft).
In case you’re curious about where the mysterious F52E2B61-18A1-11d1-B105-00805F49916B identifier name comes from, see the snapshot of the Registry Editor below.

It’s the GUID associated with the IXMLDocument Interface that’s part of the Type Library associated with the MSXML DLL files installed into the Windows System Directory. (If that last sentence sounded like a lot of gibberish to you, I can relate… Don’t worry… It’s not important… As far as I’m concerned, we can pretend that the identifier name came from little magic elves sprinkling pixie dust on pretty unicorns… It makes about as much sense and is easier to understand and relate to).
Anyway, back to the subject at hand…
If you use FOR XML in a scalar subquery it will produce an NVARCHAR(MAX) column, unless you use the ,TYPE directive, which will produce a true XML column. You can see this below, where I do FOR XML subqueries with and without the ,TYPE directive and I put the results INTO a temporary table. If we look at that temporary table’s structure (via INFORMATION_SCHEMA.COLUMNS), we can see the datatypes that got created:
select ForXmlPath=(select *
from #Stooges
for xml path)
,ForXmlPathType=(select *
from #Stooges
for xml path,type)
into #ThrowMeAway
select Table_Name
,Column_Name
,Data_Type
,Character_Maximum_Length
from tempdb.information_schema.columns
where Table_Name like '#ThrowMeAway%'
/*
Table_Name Column_Name Data_Type Character_Maximum_Length
------------------------------- -------------- --------- ------------------------
#ThrowMeAway__...__000000000018 ForXmlPath nvarchar -1
#ThrowMeAway__...__000000000018 ForXmlPathType xml -1
*/
So, with this knowledge, we can perform any kind of string functions on a FOR XML (non ,TYPE) subquery that we like.
Note that when we do a FOR XML PATH without specifying the actual path, SQL Server assumes a path of ‘row’. (Again, for clarification, the result below is a single row, single column string value… I only show it as 3 lines for readability):
select Info=(select Name
from #Stooges
for xml path)
/*
Info
-----------------------------
<row><Name>Moe</Name></row>
<row><Name>Larry</Name></row>
<row><Name>Curly</Name></row>
*/
If we specify a path of ‘Stooge’ and change the name of the column to StoogeName, we get the following (again, showing the string column as 3 lines for readability):
select Info=(select StoogeName=Name
from #Stooges
for xml path('Stooge'))
/*
Info
-----------------------------------------------
<Stooge><StoogeName>Moe</StoogeName></Stooge>
<Stooge><StoogeName>Larry</StoogeName></Stooge>
<Stooge><StoogeName>Curly</StoogeName></Stooge>
*/
Now, note that we can choose to have no name (and therefore no tag in the XML) by either making the column into an expression or by giving the column a deliberate name of [*]:
select Info=(select ''+Name
from #Stooges
for xml path('Stooge'))
/*
Info
----------------------------------------------------------------
<Stooge>Moe</Stooge><Stooge>Larry</Stooge><Stooge>Curly</Stooge>
*/
select Info=(select [*]=Name
from #Stooges
for xml path('Stooge'))
/*
Info
----------------------------------------------------------------
<Stooge>Moe</Stooge><Stooge>Larry</Stooge><Stooge>Curly</Stooge>
*/
We can also choose to specify an empty PATH:
select Info=(select [*]=Name
from #Stooges
for xml path(''))
/*
Info
-------------
MoeLarryCurly
*/
Now you can probably see where this is going. Let’s put commas in front of those names:
select Info=(select ','+Name
from #Stooges
for xml path(''))
/*
Info
----------------
,Moe,Larry,Curly
*/
And now we can use the STUFF function to substitute that first comma with an empty string:
select Info=stuff(
(
select ','+Name
from #Stooges
for xml path('')
)
,1,1,'')
/*
Info
---------------
Moe,Larry,Curly
*/
This is great! But WAIT! We’re not done.
Let’s say some smart-aleck (probably from California) with a somewhat esoteric name of X&Y><Z&T decides to join the gang:
insert #Stooges select 4,'X&Y><Z&T'
What happens now?
select Info=stuff(
(
select ','+Name
from #Stooges
for xml path('')
)
,1,1,'')
/*
Info
--------------------------------------
Moe,Larry,Curly,X&Y><Z&T
*/
Hmmm… That’s the problem with doing these kinds of tricks by creating XML output… certain characters (like ampersand, less-than, and greater-than) are encoded in a special way, as you can see below:
select Info=(select 'X&Y' for xml path('test'))
/*
Info
--------------------
<test>X&Y</test>
*/
The FOR XML clause encodes those special characters as it constructs the XML representation of the data. In order for us to UN-encode those characters, we have to work backwards… in other words, extract the data back OUT of the XML representation of the data. We do this through the .value() method. But since this method only works on a true XML datatype, we have to introduce the ,TYPE directive into our query to force the subquery to produce an XML datatype:
select Info=(select 'X&Y' for xml path('test'),type).value('.','varchar(10)')
/*
Info
----
X&Y
*/
The .value() method takes two parameters. The first parameter describes the specific “piece” of the XML data that we want to extract. In our case, we just passed it '.', which indicates that we want to extract all the data from the XML as a whole. And, as you noticed, it only pulled out the data without the tags (). The second parameter indicates what kind of datatype we want to convert it to as we extract it.
So, incorporating the ,TYPE directive and the .value() method into our #Stooges list, we finally get the expected result:
select Info=stuff(
(
select ','+Name
from #Stooges
for xml path(''),type
).value('.','varchar(max)')
,1,1,'')
/*
Info
------------------------
Moe,Larry,Curly,X&Y><Z&T
*/
If you know for sure that the list you’re creating does NOT have any of these special characters, like the query at the beginning of this article which only constructs lists of numbers, then you don’t need to incorporate the ,TYPE directive and .value() method. It will speed things up a bit if you leave those out.
Now we can put together lists of practically anything. For example, below is a query that pulls out all the orders for the CustomerID of OCEAN from the Northwind database, and shows a list of all the products in those orders (sorted by ProductID), with the quantity and price paid.
select OrderID
,ProductList=
stuff(
(
select '; '+convert(varchar,p.ProductID)
+'-'+p.ProductName
+' ('+convert(varchar,d.Quantity)
+'@$'+convert(varchar,d.UnitPrice)+')'
from [Order Details] d
join Products p on d.ProductId=p.ProductID
where d.OrderID=Orders.OrderID
order by p.ProductID
for xml path(''),type
).value('.','nvarchar(max)')
,1,2,'')
from Orders
where CustomerID='OCEAN'
order by OrderID
/*
OrderID ProductList
------- -------------------------------------------------------------------------
10409 14-Tofu (12@$18.60); 21-Sir Rodney's Scones (12@$8.00)
10531 59-Raclette Courdavault (2@$55.00)
10898 13-Konbu (5@$6.00)
10958 5-Chef Anton's Gumbo Mix (20@$21.35); 7-Uncle Bob's Organic Dried Pear...
10986 11-Queso Cabrales (30@$21.00); 20-Sir Rodney's Marmalade (15@$81.00); ...
*/
You can create lists within lists also, as in the example below, which lists all Customers and, for each Customer, lists all Categories (in order of the CategoryName) of Products that he purchased, and, for each Category, lists all ProductIDs (in numerical order) of that Category that were purchased:
with CustsCatgsProds as
(
select o.CustomerID
,p.CategoryID
,p.ProductID
from Orders o
join [Order Details] d on o.OrderID=d.OrderID
join Products p on d.ProductID=p.ProductID
join Categories c on p.CategoryID=c.CategoryID
group by o.CustomerID
,p.CategoryID
,p.ProductID
)
select CustomerID
,CategorizedProductList=
stuff(
(
select '; '+CategoryName+':'
+stuff(
(
select ','+convert(varchar,ProductID)
from CustsCatgsProds
where CustomerID=Customers.CustomerID
and CategoryID=CCP.CategoryID
order by ProductID
for xml path('')
)
,1,1,'')
from (select distinct CategoryID
from CustsCatgsProds
where CustomerID=Customers.CustomerID) CCP
join Categories on CCP.CategoryID=Categories.CategoryID
order by CategoryName
for xml path(''),type
).value('.','nvarchar(max)')
,1,2,'')
from Customers
order by CustomerID
/*
CustomerID CategorizedProductList
---------- ----------------------------------------------------------------
ALFKI Beverages:39,76; Condiments:3,6,63,77; Dairy Products:59,71; ...
ANATR Beverages:70; Confections:19; Dairy Products:11,32,60,69,72; ...
ANTON Beverages:2,34,43,75; Condiments:66; Confections:26,48; Dairy...
...
WHITC Beverages:2,34,35,38,75; Condiments:4,8,44,61,63,77; Confecti...
WILMK Beverages:1,24,43,76; Condiments:8,61,65; Confections:16,19,2...
WOLZA Beverages:1,2,24,35,75; Condiments:61; Confections:62,68; Dai...
*/
If you’ve read some of my past blog posts, you know I’m a big promoter of the CROSS APPLY operator to improve readability of a query. For example, the very first query in this article could be re-written as follows:
select CategoryName
,DiscontinuedIDs
from Categories
cross apply (select DiscIDString=(select ','+convert(varchar,ProductID)
from Products
where CategoryID=Categories.CategoryID
and Discontinued=1
order by ProductID
for xml path(''))) F1
cross apply (select DiscontinuedIDs=stuff(DiscIDString,1,1,'')) F2
order by CategoryName
/*
CategoryName DiscontinuedIDs
--------------- ---------------
Beverages 24
Condiments 5
Confections NULL
Dairy Products NULL
Grains/Cereals 42
Meat/Poultry 9,17,29,53
Produce 28
Seafood NULL
*/
If you do use the CROSS APPLY approach, be very careful in checking the performance and the plan of the query. Adam Haines wrote a terrific blog post that demonstrated that the CROSS APPLY approach to constructing lists can yield terrible results under certain circumstances because of the way that SQL Server performs the query. You can usually re-formulate the query to eliminate that poor performance.
I hope this article was helpful in explaining how the FOR XML PATH method works in creating lists. In my next blog post, I'll show how we can tear those lists apart (or shred them) back into rows.