After reading XQuery Lab 51, my friend and database expert Brad Schulz send me a note showing another way of achieving the same results. The approach he suggested was much cleaner than my version.
My code used the following expression in the WHERE clause to filter the records where @about and @resource attributes match.
y.value('@r:resource[1]','VARCHAR(100)') =
z.value('@about','VARCHAR(100)')
Brad suggested a different approach where the entire query is moved into a CROSS APPLY and the outer query can select columns returned by the CROSS APPLY operator and apply filters on them.
Here is the version of the query that Brad wrote.
DECLARE @x XML
SELECT @x = '
<RDF xmlns:r="http://www.w3.org/TR/RDF/"
xmlns:d="http://purl.org/dc/elements/1. 0/"
xmlns="http://dmoz.org/rdf/">
<Topic r:id="Top/World/Afrikaans/Besigheid">
<catid>724829</catid>
<link r:resource="http://www.videos-sa.com" />
<link r:resource="http://besigheidcenturion.co.za/bc/" />
</Topic>
<ExternalPage about="http://www.videos-sa.com">
<d:Title>Kobus Petzer Videoproduksies</d:Title>
<d:Description>Vervaardiging van...</d:Description>
<topic>Top/World/Afrikaans/Besigheid</topic>
</ExternalPage>
<ExternalPage about="http://besigheidcenturion.co.za/bc/">
<d:Title>Besigheid Centurion</d:Title>
<d:Description>Sakeportaal vir ...</d:Description>
<topic>Top/World/Afrikaans/Besigheid</topic>
</ExternalPage>
<ExternalPage about="http://besigheidcenturion.co.za/bc/">
<d:Title>Besigheid Centurion</d:Title>
<d:Description>Sakeportaal vir ...</d:Description>
<topic>Top/World/Afrikaans/Besigheid</topic>
</ExternalPage>
</RDF>'
;WITH XMLNAMESPACES(
'http://www.w3.org/TR/RDF/' as r,
'http://purl.org/dc/elements/1. 0/' as d,
default 'http://dmoz.org/rdf/'
)
SELECT
[Resource],
About,
Title,
CatID
FROM @x.nodes('RDF/Topic') a(x)
CROSS APPLY x.nodes('link') b(y)
CROSS APPLY @x.nodes('RDF/ExternalPage') c(z)
CROSS APPLY (
SELECT
y.value('@r:resource[1]','VARCHAR(100)') AS [Resource],
z.value('@about','VARCHAR(100)') AS About,
z.value('d:Title[1]','VARCHAR(100)') AS Title,
x.value('catid[1]','VARCHAR(100)') AS CatID
) F
WHERE [Resource] = About
/*
Resource About Title CatID
------------------ ---------------- ------------- ------
http://www.videos- http://www.video Kobus Petzer 724829
http://besigheidce http://besigheid Besigheid Cen 724829
http://besigheidce http://besigheid Besigheid Cen 724829
*/
View All Labs: XQuery Labs - A Collection of XQuery Sample Scripts and Tutorials
I heard this question recently on one of the forums. The situation is as follows:
- Save an Excel 2003 document in a FILESTREAM column
- Read the data from the FILESTREAM column and try to open in Excel – It works as expected
- Save an Excel 2007 document in a FILESTREAM column
- Read the data from the FILESTREAM column and try to open in Excel – Excel thinks that the file is corrupted and will prompt for repair.
To SQL Server, both the documents are blob data and there is no chance of discrimination between the two. I suggested the OP to re-check the code that reads information from the FILESTREAM column and double check that the bytes are correctly retrieved from the FILESTREAM column.
He replied that the problem is reported to Microsoft Support and they acknowledged the problem. Further their investigation found that the same problem happens with Word 2007 and the support engineers are looking into the problem.
That sounded very strange. Why does SQL Server behave differently to two different BLOB data? I tried to create a repro. Here is what I did.
Step 1: Created an Excel 2007 document
Created the sample document and saved to a local folder.
Step 2: Created a FILESTREAM enabled Database
-- Drop Northpole Database
IF DB_ID('NorthPole') IS NOT NULL
DROP DATABASE NorthPole
GO
-- Recreate the database
CREATE DATABASE NorthPole
ON
PRIMARY (
NAME = NorthPoleDB,
FILENAME = 'C:\Data\NorthPoleDB.mdf'
), FILEGROUP NorthPoleFS CONTAINS FILESTREAM(
NAME = NorthPoleFS,
FILENAME = 'C:\FS\NorthPoleFS')
LOG ON (
NAME = NorthPoleLOG,
FILENAME = 'C:\Data\NorthPoleLOG.ldf')
GO
Step 3: Created a TABLE with FILESTREAM column
Use Northpole
GO
CREATE TABLE [dbo].[Documents](
[DocumentID] UNIQUEIDENTIFIER ROWGUIDCOL NOT NULL UNIQUE,
[DocumentName] VARCHAR(512),
[DocumentType] VARCHAR(50),
[DocumentData] VARBINARY(MAX) FILESTREAM NULL
)
Step 4: Loaded the document into the FILESTREAM column
-- Declare a variable to store the image data
DECLARE @data AS VARBINARY(MAX)
-- Load the image data
SELECT @data = CAST(bulkcolumn AS VARBINARY(MAX))
FROM OPENROWSET(
BULK
'C:\Jacob\...\filestreamexcel.xlsx',
SINGLE_BLOB ) AS x
-- Insert the data to the table
INSERT INTO Documents ([DocumentID], [DocumentName], [DocumentType], [DocumentData])
SELECT NEWID(), 'FilestreamExcel.xlsx','excel', @data
Step 5: Retrieved the document from the FILESTREAM column and saved to a local disk
'Create a connection to the database
Dim ConStr As String
ConStr = "Data Source=JACOB-LAPTOP\SQL2008R2NOV;Initial Catalog=NorthPole" & _
";Integrated Security=True"
Dim con As New SqlConnection(ConStr)
con.Open()
'Retrieve the FilePath() of the image file
Dim sqlCommand As New SqlCommand()
sqlCommand.Connection = con
sqlCommand.CommandText = "SELECT DocumentData.PathName() AS PathName " + _
"FROM Documents WHERE DocumentName = 'FilestreamExcel.xlsx'"
Dim filePath As String = sqlCommand.ExecuteScalar()
'Obtain a Transaction Context
Dim transaction As SqlTransaction = con.BeginTransaction("DocumentTran")
sqlCommand.Transaction = transaction
sqlCommand.CommandText = "SELECT GET_FILESTREAM_TRANSACTION_CONTEXT()"
Dim txContext As Byte() = sqlCommand.ExecuteScalar()
'Open and read file using SqlFileStream Class
Dim sqlFileStream As New SqlFileStream(filePath, txContext, FileAccess.Read)
Dim buffer As Byte() = New Byte(sqlFileStream.Length) {}
'sqlFilesTream.Length = 16896
sqlFileStream.Read(buffer, 0, buffer.Length)
'buffer.Length = 16897
Dim fStream As New FileStream("C:\..\filestreamexcel2.xlsx", FileMode.CreateNew)
Dim bw As New BinaryWriter(fStream)
bw.Write(buffer)
bw.Close()
fStream.Close()
'Cleanup
sqlFileStream.Close()
sqlCommand.Transaction.Commit()
con.Close()
Step 6: Open the document in Excel - Excel prompts for repair
Alright, so I am able to reproduce the problem.
The only reason for this to happen is that one or more bytes in the original document is modified when the document is stored or when it is retrieved. To narrow down the problem, I decided to check if the problem occurs at the time of storing the document.
To verify this, I did a ‘not-recommended’ approach. I went to the FILESTREAM data container (NTFS folder) and located the file. I copied the file to another folder and renamed to “.xlsx”. I then tried to open the file in Excel 2007 and it opened normally. It indicates that SQL Server did not modify the content of the file when storing the document into the FILESTREAM store.
The next step is to see what happens with Excel 2003. I repeated the same steps with an Excel 2003 file. When I retrieved the Excel 2003 document from the FILESTREAM store and opened it in Excel, it opened normally. Surprising!
At this point, it was certain that the document gets modified when reading from the FILESTREAM store. But the question is why is it happening only to the Excel 2007 document and not to the 2003 version. I was also curious to know what exactly gets modified. I tried to do a binary comparison of both the documents.
As you can see from the hex comparison results, there is an extra byte (with 0) getting added to the blob data when the document is retrieved from the FILESTREAM data store. That is why Excel 2007 complains and asks you to repair the document. Makes sense!
So, why is it not happening to Excel 2003? I did a similar comparison on the Excel 2003 document too. Surprisingly, I found that the same is happening to Excel 2003 too.
So we observed the following
- An extra byte is getting added to the blob data when it is retrieved.
- It happens to both Excel 2007 and Excel 2003 documents, which means that it is happening to all blob data.
- Excel 2007 complains but Excel 2003 does not.
So the question now is why is this extra byte getting added. I decided to check the length of the FILESTREAM data.
At first, I checked the length of the FILESTREAM data from TSQL.
select datalength(documentdata) as Length
from documents
/*
Length
--------------------
16896
*/
So the length of the document is 16896 bytes. I then decided to debug the code and found the following.
'Open and read file using SqlFileStream Class
Dim sqlFileStream As New SqlFileStream(filePath, txContext, FileAccess.Read)
Dim buffer As Byte() = New Byte(sqlFileStream.Length) {}
'sqlFilesTream.Length = 16896
sqlFileStream.Read(buffer, 0, buffer.Length)
'buffer.Length = 16897
So it looks like I found the culprit. It was I. It could be a problem that most people face or it could be my lack of .NET expertise. People reading this with more .NET experience will know it better.
With my limited .NET knowledge, here is the workaround I did.
'Open and read file using SqlFileStream Class
Dim sqlFileStream As New SqlFileStream(filePath, txContext, FileAccess.Read)
Dim buffer As Byte() = New Byte(sqlFileStream.Length - 1) {}
sqlFileStream.Read(buffer, 0, buffer.Length)
So the workaround is to create a byte array with one byte less than the length of FILESTREAM. That fixed the problem in my repro code.
I am eager to know if a similar problem has happened to anyone else and the fixe/workaround applied to get it fixed.
One of my friends approached me recently with a request to write a TSQL query that reads information from an RDF document. Here is the sample XML document he wanted to process.
<RDF xmlns:r="http://www.w3.org/TR/RDF/"
xmlns:d="http://purl.org/dc/elements/1. 0/"
xmlns="http://dmoz.org/rdf/">
<Topic r:id="Top/World/Afrikaans/Besigheid">
<catid>724829</catid>
<link r:resource="http://www.videos-sa.com" />
<link r:resource="http://besigheidcenturion.co.za/bc/" />
</Topic>
<ExternalPage about="http://www.videos-sa.com">
<d:Title>Kobus Petzer Videoproduksies</d:Title>
<d:Description>Vervaardiging van...</d:Description>
<topic>Top/World/Afrikaans/Besigheid</topic>
</ExternalPage>
<ExternalPage about="http://besigheidcenturion.co.za/bc/">
<d:Title>Besigheid Centurion</d:Title>
<d:Description>Sakeportaal vir ...</d:Description>
<topic>Top/World/Afrikaans/Besigheid</topic>
</ExternalPage>
<ExternalPage about="http://besigheidcenturion.co.za/bc/">
<d:Title>Besigheid Centurion</d:Title>
<d:Description>Sakeportaal vir ...</d:Description>
<topic>Top/World/Afrikaans/Besigheid</topic>
</ExternalPage>
</RDF>
The challenge is to read the resource names from the Link element and join it with the about attribute in the ExternalPage element. Here is the output required from the above XML document.
/*
Resource About Title CatID
------------------ ---------------- ------------- ------
http://www.videos- http://www.video Kobus Petzer 724829
http://besigheidce http://besigheid Besigheid Cen 724829
http://besigheidce http://besigheid Besigheid Cen 724829
*/
While writing the query to get the above result is pretty much straight forward, many people find it difficult because of the namespace declarations present in the XML document.
Here is the TSQL code that queries the above XML document and produces the required output.
DECLARE @x XML
SELECT @x = '
<RDF xmlns:r="http://www.w3.org/TR/RDF/"
xmlns:d="http://purl.org/dc/elements/1. 0/"
xmlns="http://dmoz.org/rdf/">
<Topic r:id="Top/World/Afrikaans/Besigheid">
<catid>724829</catid>
<link r:resource="http://www.videos-sa.com" />
<link r:resource="http://besigheidcenturion.co.za/bc/" />
</Topic>
<ExternalPage about="http://www.videos-sa.com">
<d:Title>Kobus Petzer Videoproduksies</d:Title>
<d:Description>Vervaardiging van...</d:Description>
<topic>Top/World/Afrikaans/Besigheid</topic>
</ExternalPage>
<ExternalPage about="http://besigheidcenturion.co.za/bc/">
<d:Title>Besigheid Centurion</d:Title>
<d:Description>Sakeportaal vir ...</d:Description>
<topic>Top/World/Afrikaans/Besigheid</topic>
</ExternalPage>
<ExternalPage about="http://besigheidcenturion.co.za/bc/">
<d:Title>Besigheid Centurion</d:Title>
<d:Description>Sakeportaal vir ...</d:Description>
<topic>Top/World/Afrikaans/Besigheid</topic>
</ExternalPage>
</RDF>'
;WITH XMLNAMESPACES(
'http://www.w3.org/TR/RDF/' as r,
'http://purl.org/dc/elements/1. 0/' as d,
default 'http://dmoz.org/rdf/'
)
SELECT
y.value('@r:resource[1]','VARCHAR(100)') AS Resource,
z.value('@about','VARCHAR(100)') AS About,
z.value('d:Title[1]','VARCHAR(100)') AS Title,
x.value('catid[1]','VARCHAR(100)') AS CatID
FROM @x.nodes('RDF/Topic') a(x)
CROSS APPLY x.nodes('link') b(y)
CROSS APPLY @x.nodes('RDF/ExternalPage') c(z)
WHERE y.value('@r:resource[1]','VARCHAR(100)') =
z.value('@about','VARCHAR(100)')
/*
Resource About Title CatID
------------------ ---------------- ------------- ------
http://www.videos- http://www.video Kobus Petzer 724829
http://besigheidce http://besigheid Besigheid Cen 724829
http://besigheidce http://besigheid Besigheid Cen 724829
*/
View All Labs: XQuery Labs - A Collection of XQuery Sample Scripts and Tutorials
I was asked this question in one of the SQL Server forums recently. The OP wanted to generate an XML document with CDATA sections and NAMESPACE declarations.
SQL Server 2005 introduced WITH XMLNAMESPACES() which is very handy to generate namespace declarations in the output XML document generated by FOR XML. FOR XML EXPLICIT can be used to generate CDATA sections in the XML output. However, what is interesting is that you cannot use WITH XMLNAMESPACES and FOR XML EXPLICIT together.
Here is a FOR XML EXPLICIT query that generate CDATA sections as well as XML NAMESPACE declarations in the output XML document.
SELECT
1 AS Tag,
NULL AS Parent,
'Jacob' AS 'Person!1!Name!element',
'Hello world' AS 'Person!1!Greeting!CDATA',
'http://beyondrelational.com/xml.aspx'
AS 'Person!1!xmlns:x'
FOR XML EXPLICIT
Here is the output
<Person xmlns:x="http://beyondrelational.com/xml.aspx">
<Name>Jacob</Name>
<Greeting><![CDATA[Hello world]]></Greeting>
</Person>