Getting Started with Adobe After Effects - Part 6: Motion Blur


Upload Image Close it
Select File

Browse by Tags · View All
xml 4
BRH 4
nodes() 3
#XML 2
#TSQL 2
XQuery 2
values() 2
for xml path 2
value() 1
text() 1

Archive · View All
May 2010 4

Brad_Schulz's Blog

SQL Server XML - Making a List (and Checking It Twice)

May 18 2010 4:00AM by Brad Schulz   

It seems like the most frequently-asked question on the MSDN T-SQL Forum is how to put together a column that contains a list of some kind.

For example, take a look at the Northwind database. There’s a Categories table and a Products table. We’d like to produce a list of all Categories and, for each Category, produce a column containing a comma-delimited (and sorted) list of ProductID’s that are discontinued.

The standard answer that we give to these kinds of questions (providing that the original poster of the question is using SQL2005 or beyond) is something like this:

select CategoryName
      ,DiscontinuedIDs=
         stuff(
           (
             select ','+convert(varchar,ProductID)
             from Products
             where CategoryID=Categories.CategoryID
               and Discontinued=1
             order by ProductID
             for xml path('')
           )
           ,1,1,'')
from Categories
order by CategoryName
/*
CategoryName    DiscontinuedIDs
--------------- ---------------
Beverages       24
Condiments      5
Confections     NULL
Dairy Products  NULL
Grains/Cereals  42
Meat/Poultry    9,17,29,53
Produce         28
Seafood         NULL
*/

Most times the poster will come back and say, “Er… uh… Thanks.” But occasionally someone will say, “Huh? What’s with all that XML stuff? Could you explain what the heck that complicated mess o’ code is doing please?”

I know when I first started studying T-SQL and started frequenting the forums and saw this kind of answer, I felt kind of intimidated and figured that this stuff was more complex than I had thought.

But it all becomes clear if you go through how this works step by step.

For illustration purposes, let’s create a temporary table #Stooges and populate it:

create table #Stooges (ID int, Name varchar(10))
insert #Stooges select 1,'Moe' 
      union all select 2,'Larry' 
      union all select 3,'Curly'

Now if we do a normal query on this table, we get rows and columns:

select * from #Stooges
/*
 ID Name
--- ------
  1 Moe
  2 Larry
  3 Curly
*/

But if we add a FOR XML clause to the query, we get an XML representation of the data:

select * from #Stooges for xml path
/*
XML_F52E2B61-18A1-11d1-B105-00805F49916B
----------------------------------------
<row><ID>1</ID><Name>Moe</Name></row>
<row><ID>2</ID><Name>Larry</Name></row>
<row><ID>3</ID><Name>Curly</Name></row>
*/

In reality, it did not return 3 rows of data… I just presented it that way for readability. In fact, it didn’t produce any rows or columns… it actually produced an XML stream. Any query that has the FOR XML clause (without the ,TYPE directive) produces an XML stream. The data is streamed out in (2033-character) chunks as the query is executed, as you can see below, where I query the 120,000-record SalesOrderDetail table in AdventureWorks. The first part of the XML stream is plopped into the SSMS Results window almost immediately, and the query is still grinding away to finish up the rest of the stream.

xmlstream

(If you want to read more about how the XML streams are created, read this blog entry from Eugene Kogan of Microsoft).

In case you’re curious about where the mysterious F52E2B61-18A1-11d1-B105-00805F49916B identifier name comes from, see the snapshot of the Registry Editor below.

ixmldocument

It’s the GUID associated with the IXMLDocument Interface that’s part of the Type Library associated with the MSXML DLL files installed into the Windows System Directory. (If that last sentence sounded like a lot of gibberish to you, I can relate… Don’t worry… It’s not important… As far as I’m concerned, we can pretend that the identifier name came from little magic elves sprinkling pixie dust on pretty unicorns… It makes about as much sense and is easier to understand and relate to).

Anyway, back to the subject at hand…

If you use FOR XML in a scalar subquery it will produce an NVARCHAR(MAX) column, unless you use the ,TYPE directive, which will produce a true XML column. You can see this below, where I do FOR XML subqueries with and without the ,TYPE directive and I put the results INTO a temporary table. If we look at that temporary table’s structure (via INFORMATION_SCHEMA.COLUMNS), we can see the datatypes that got created:

select ForXmlPath=(select * 
                   from #Stooges 
                   for xml path)
      ,ForXmlPathType=(select * 
                       from #Stooges 
                       for xml path,type)
into #ThrowMeAway
  
select Table_Name
      ,Column_Name
      ,Data_Type
      ,Character_Maximum_Length
from tempdb.information_schema.columns
where Table_Name like '#ThrowMeAway%'
/*
Table_Name                      Column_Name    Data_Type Character_Maximum_Length
------------------------------- -------------- --------- ------------------------
#ThrowMeAway__...__000000000018 ForXmlPath     nvarchar                        -1
#ThrowMeAway__...__000000000018 ForXmlPathType xml                             -1
*/

So, with this knowledge, we can perform any kind of string functions on a FOR XML (non ,TYPE) subquery that we like.

Note that when we do a FOR XML PATH without specifying the actual path, SQL Server assumes a path of ‘row’. (Again, for clarification, the result below is a single row, single column string value… I only show it as 3 lines for readability):

select Info=(select Name 
             from #Stooges 
             for xml path)
/*
Info
-----------------------------
<row><Name>Moe</Name></row>
<row><Name>Larry</Name></row>
<row><Name>Curly</Name></row>
*/

If we specify a path of ‘Stooge’ and change the name of the column to StoogeName, we get the following (again, showing the string column as 3 lines for readability):

select Info=(select StoogeName=Name 
             from #Stooges 
             for xml path('Stooge'))
/*
Info
-----------------------------------------------
<Stooge><StoogeName>Moe</StoogeName></Stooge>
<Stooge><StoogeName>Larry</StoogeName></Stooge>
<Stooge><StoogeName>Curly</StoogeName></Stooge>
*/

Now, note that we can choose to have no name (and therefore no tag in the XML) by either making the column into an expression or by giving the column a deliberate name of [*]:

select Info=(select ''+Name
             from #Stooges 
             for xml path('Stooge'))
/*
Info
----------------------------------------------------------------
<Stooge>Moe</Stooge><Stooge>Larry</Stooge><Stooge>Curly</Stooge>
*/

select Info=(select [*]=Name
             from #Stooges 
             for xml path('Stooge'))
/*
Info
----------------------------------------------------------------
<Stooge>Moe</Stooge><Stooge>Larry</Stooge><Stooge>Curly</Stooge>
*/

We can also choose to specify an empty PATH:

select Info=(select [*]=Name
             from #Stooges 
             for xml path(''))
/*
Info
-------------
MoeLarryCurly
*/

Now you can probably see where this is going. Let’s put commas in front of those names:

select Info=(select ','+Name
             from #Stooges 
             for xml path(''))
/*
Info
----------------
,Moe,Larry,Curly
*/

And now we can use the STUFF function to substitute that first comma with an empty string:

select Info=stuff(
              (
                select ','+Name
                from #Stooges 
                for xml path('')
              )
              ,1,1,'')
/*
Info
---------------
Moe,Larry,Curly
*/

This is great! But WAIT! We’re not done.

Let’s say some smart-aleck (probably from California) with a somewhat esoteric name of X&Y><Z&T decides to join the gang:

insert #Stooges select 4,'X&Y><Z&T'

What happens now?

select Info=stuff(
              (
                select ','+Name
                from #Stooges 
                for xml path('')
              )
              ,1,1,'')
/*
Info
--------------------------------------
Moe,Larry,Curly,X&amp;Y&gt;&lt;Z&amp;T
*/

Hmmm… That’s the problem with doing these kinds of tricks by creating XML output… certain characters (like ampersand, less-than, and greater-than) are encoded in a special way, as you can see below:

select Info=(select 'X&Y' for xml path('test'))
/*
Info
--------------------
<test>X&amp;Y</test>
*/

The FOR XML clause encodes those special characters as it constructs the XML representation of the data. In order for us to UN-encode those characters, we have to work backwards… in other words, extract the data back OUT of the XML representation of the data. We do this through the .value() method. But since this method only works on a true XML datatype, we have to introduce the ,TYPE directive into our query to force the subquery to produce an XML datatype:

select Info=(select 'X&Y' for xml path('test'),type).value('.','varchar(10)')
/*
Info
----
X&Y
*/

The .value() method takes two parameters. The first parameter describes the specific “piece” of the XML data that we want to extract. In our case, we just passed it '.', which indicates that we want to extract all the data from the XML as a whole. And, as you noticed, it only pulled out the data without the tags (). The second parameter indicates what kind of datatype we want to convert it to as we extract it.

So, incorporating the ,TYPE directive and the .value() method into our #Stooges list, we finally get the expected result:

select Info=stuff(
              (
                select ','+Name
                from #Stooges 
                for xml path(''),type
              ).value('.','varchar(max)')
              ,1,1,'')
/*
Info
------------------------
Moe,Larry,Curly,X&Y><Z&T
*/

If you know for sure that the list you’re creating does NOT have any of these special characters, like the query at the beginning of this article which only constructs lists of numbers, then you don’t need to incorporate the ,TYPE directive and .value() method. It will speed things up a bit if you leave those out.

Now we can put together lists of practically anything. For example, below is a query that pulls out all the orders for the CustomerID of OCEAN from the Northwind database, and shows a list of all the products in those orders (sorted by ProductID), with the quantity and price paid.

select OrderID
      ,ProductList=
         stuff(
           (
             select '; '+convert(varchar,p.ProductID)
                   +'-'+p.ProductName
                   +' ('+convert(varchar,d.Quantity)
                   +'@$'+convert(varchar,d.UnitPrice)+')'
             from [Order Details] d
             join Products p on d.ProductId=p.ProductID
             where d.OrderID=Orders.OrderID
             order by p.ProductID
             for xml path(''),type
           ).value('.','nvarchar(max)')
           ,1,2,'')
from Orders
where CustomerID='OCEAN'
order by OrderID
/*
OrderID ProductList
------- -------------------------------------------------------------------------
  10409 14-Tofu (12@$18.60); 21-Sir Rodney's Scones (12@$8.00)
  10531 59-Raclette Courdavault (2@$55.00)
  10898 13-Konbu (5@$6.00)
  10958 5-Chef Anton's Gumbo Mix (20@$21.35); 7-Uncle Bob's Organic Dried Pear...
  10986 11-Queso Cabrales (30@$21.00); 20-Sir Rodney's Marmalade (15@$81.00); ...
*/

You can create lists within lists also, as in the example below, which lists all Customers and, for each Customer, lists all Categories (in order of the CategoryName) of Products that he purchased, and, for each Category, lists all ProductIDs (in numerical order) of that Category that were purchased:

with CustsCatgsProds as
(
  select o.CustomerID
        ,p.CategoryID
        ,p.ProductID
  from Orders o
  join [Order Details] d on o.OrderID=d.OrderID
  join Products p on d.ProductID=p.ProductID
  join Categories c on p.CategoryID=c.CategoryID
  group by o.CustomerID
          ,p.CategoryID
          ,p.ProductID
)
select CustomerID
      ,CategorizedProductList=
         stuff(
           (
             select '; '+CategoryName+':'
                   +stuff(
                      (
                        select ','+convert(varchar,ProductID)
                        from CustsCatgsProds
                        where CustomerID=Customers.CustomerID
                          and CategoryID=CCP.CategoryID
                        order by ProductID
                        for xml path('')
                      )
                      ,1,1,'')
             from (select distinct CategoryID
                   from CustsCatgsProds
                   where CustomerID=Customers.CustomerID) CCP
             join Categories on CCP.CategoryID=Categories.CategoryID
             order by CategoryName
             for xml path(''),type
           ).value('.','nvarchar(max)')
           ,1,2,'')
from Customers
order by CustomerID
/*
CustomerID CategorizedProductList
---------- ----------------------------------------------------------------
ALFKI      Beverages:39,76; Condiments:3,6,63,77; Dairy Products:59,71; ...
ANATR      Beverages:70; Confections:19; Dairy Products:11,32,60,69,72; ...
ANTON      Beverages:2,34,43,75; Condiments:66; Confections:26,48; Dairy...
...
WHITC      Beverages:2,34,35,38,75; Condiments:4,8,44,61,63,77; Confecti...
WILMK      Beverages:1,24,43,76; Condiments:8,61,65; Confections:16,19,2...
WOLZA      Beverages:1,2,24,35,75; Condiments:61; Confections:62,68; Dai...
*/

If you’ve read some of my past blog posts, you know I’m a big promoter of the CROSS APPLY operator to improve readability of a query. For example, the very first query in this article could be re-written as follows:

select CategoryName
      ,DiscontinuedIDs
from Categories
cross apply (select DiscIDString=(select ','+convert(varchar,ProductID)
                                  from Products
                                  where CategoryID=Categories.CategoryID
                                    and Discontinued=1
                                  order by ProductID
                                  for xml path(''))) F1
cross apply (select DiscontinuedIDs=stuff(DiscIDString,1,1,'')) F2
order by CategoryName
/*
CategoryName    DiscontinuedIDs
--------------- ---------------
Beverages       24
Condiments      5
Confections     NULL
Dairy Products  NULL
Grains/Cereals  42
Meat/Poultry    9,17,29,53
Produce         28
Seafood         NULL
*/

If you do use the CROSS APPLY approach, be very careful in checking the performance and the plan of the query. Adam Haines wrote a terrific blog post that demonstrated that the CROSS APPLY approach to constructing lists can yield terrible results under certain circumstances because of the way that SQL Server performs the query. You can usually re-formulate the query to eliminate that poor performance.

I hope this article was helpful in explaining how the FOR XML PATH method works in creating lists. In my next blog post, I'll show how we can tear those lists apart (or shred them) back into rows.

Tags: BRH, xml, for xml path, for xml, FOR_XML,


Brad Schulz
107 · 2% · 483
0
Liked
 
0
Lifesaver
 
0
Refreshed
 
0
Learned
 
0
Incorrect



Submit

Your Comment


Sign Up or Login to post a comment.

    Copyright © Rivera Informatic Private Ltd Contact us      Privacy Policy      Terms of use      Report Abuse      Advertising      [ZULU1097]