TSQL Challenge 7 – Solution by Barry Young
Barry Young provides us a nice solution using the functions MsForeach that we already talked about but also with a very interesting inline-decompressing feature.
R. Barry Young has been a Principal Database and Development Consultant for Proactive Performance Solutions, a software consulting company in Northern Delaware for the last 15 years. Prior to that he was a performance and capacity planning Consultant for Digital Equipment Corporation.
Databases that he has used include SQL Server, Oracle, Sybase, VAX RDB and VAX DBMS. Lanuages that has programed in include SQL, VB.net, C#, VB, C++, VBA, C, Pascal, Fortan, Cobol, Bliss, TPU, DCL, Teco, Javascript, DHTML+CSS, Modula, Macro-32, Macro-11, etc. Industries that he has worked for or consulted in include Insurance, Finance, Aerospace, Chemical, Software, Pharmacuetical, Health & Medical, Defense, Manufacturing and Government.
Barry has been programming for 35 years, profesionally for 30 years, and as a professional consultant for over 25 years. He has a B.S. in Theoretical mathematics. He is a frequent poster at SQLServerCentral.com and has had several articles published there as well. He is also active as a presenter in his local SQL Server & .Net SIGs.
You can found of him on his blog site is MovingSQL.com, and you can usually found him at SQLServerCentral.com.
you can find below the solution of Barry with a full explanation.
So how did I come to develop what is possibly the ugliest, most inelegant and incomprehensible kludge of SQL code ever seen by human eyes?
Well, I initially came up with two different approaches to this TSQL challenge, one based on sp_spaceused and the undocumented procedures sp_MSForEachDB and sp_MSForEachTable to re-execute it for every table in every non-system database, and the other using nested aggregated dynamic SQL to access the the allocation DMV's directly. The sp_MSForEachDB approach as shown in uncompacted form proved to be significantly shorter:
-- Temp Table to hold SP_SPACEUSED output for each table in a
-- single database.
CREATE TABLE #T
(
[TABLE] NVARCHAR(128),
ROWS CHAR(11),
V VARCHAR(128),
SIZE VARCHAR(128),
I VARCHAR(128),
U VARCHAR(128)
)
-- Temp Table to hold table space info accumulated and labeled
-- for each database.
CREATE TABLE #TT
(
BASE NVARCHAR(128),
[TABLE] NVARCHAR(128),
ROWS CHAR(11),
V VARCHAR(128),
SIZE VARCHAR(128),
I VARCHAR(128),
U VARCHAR(128)
)
-- Execute commands for each database:
EXEC SP_MSFOREACHDB '
USE [^] -- Switch to the next database
IF DB_ID()>4 -- Skip System databases
BEGIN
-- For Each Table in this database:
EXEC SP_MSFOREACHTABLE
-- Execute SP_SPACEUSED on the table and Insert the output into #T
''INSERT INTO #T EXEC SP_SPACEUSED ''''?'''''',
-- Filter out Temp tables and table variables in TempDB which
-- cannot be accessed this way (turns out I didnt need this,
-- but I missed it).
@WHEREAND=''AND O.NAME NOT LIKE ''''#[0-9]%'''''',
-- Save local data into #TT adding DB name label.
-- (in fact I only need two tables so that I can add the
-- DB name labels)
@POSTCOMMAND=''INSERT INTO #TT SELECT DB_NAME(),* FROM #T
DELETE FROM #T''
END
', '^'
-- Change the DB substitution symbol, because '?' is used for
-- the table names
-- Display the top 5 tables in the whole Server Instance:
SELECT TOP 5 BASE ,
[TABLE] ,
SIZE ,
ROWS
FROM #TT
-- Order by size descending:
ORDER BY CAST(LEFT(SIZE ,LEN(SIZE )-3) AS INT) DESC
-- CleanUp
DROP TABLE #TT
DROP TABLE #T
Unfortunately, it was still huge. Even after taking out all of the comments and replacing all of the carriage returns and line feeds and all whitespace with single spaces, it was still 634 characters long. I was pretty sure that this was not nearly short enough to win, so I had to find some other way to make it shorter.
It occurred to me that there was a lot of redundancy in the script that it would probably compress pretty well. However, obviously I could not implement a ZIP algorithm to compress it. Not because it couldn't be done in T-SQL, in fact it is quite doable, but because the corresponding decompression code that would need to be included would be much larger than the original uncompressed script. However, I thought that a simple many-to-one character substitution compression might give me enough compression, if the decompression code that I would have ti embed could be made small enough.
The obvious approach was to just use REPLACE to substitute the longest and most frequently occurring strings for bytes in the character set that I was not otherwise using. Unfortunately this resulted in a decompression script that consisted of about 30 lines like this:
SELECT @S=REPLACE(@s,'`','CATCH ')
SELECT @S=REPLACE(@s,'~','TRY ')
SELECT @S=REPLACE(@s,'>','END ')
SELECT @S=REPLACE(@s,'<','BEGIN ')
SELECT @S=REPLACE(@s,';','), ')
SELECT @S=REPLACE(@s,'!','TABLE')
SELECT @S=REPLACE(@s,'$','var¥')
SELECT @S=REPLACE(@s,'¥','char(')
This meant that each replacement was going to add an additional 34 characters plus a the length of a copy of the string being replaced, resulting in a total characters saved = (N*L) -N*1 -34 -L. Where "L" is the length of the substring being replaced, "N" is the number of occurrences of that substring, so "(N*L)" is the total number of characters removed, "-N*1" is the number of single characters that replace them, "-34" is the base overhead per replacement and "-L" is the copy of the original string that remains in the decompressing REPLACE statement. Simplifying this gives Total_Savings = (N-1)*(L-1) - (34+1).
This was a very large cost of replacement, requiring at least 2 occurrences of a substring 37 or more characters long to result in a total savings that was positive, or 37 occurrences of a 2 character substring or anything in between. A cost this high was not going to result in in many substitution opportunities nor much overall compression. I determined that I could save an enormous amount by re-compressing the decompression statements themselves, replacing the string "') SELECT @S=REPLACE(@s,'" and "','" each with a single character. This would be very complicated because I would have to nest it and actually have two embedded decompression routines along with all of the quote-doubling that goes into nesting dynamic SQL, but it would change the cost equation to (N-1)*(L-1) - (2+1), which would mean that cases as small as 2 occurrences of a 5 character string or 5 occurrences of a 2 character string could be replaced with a net gain in total length.
That was pretty good, but the additional cost and complexity of two decompression routines and having to double-quotes twice was a concern. I felt that there must be a simpler way to both decrease the replacement cost and simplify the decompression.Obviously I could decompress the string by doing the REPLACE in a loop, but then where could I efficiently store and retrieve the REPLACE parameters? Finally it occurred to me that since the substitution strings where always a single character I could prefix them onto the beginning of the compressed command string and have the decompresser remove them as it used them. I could do the same thing with the replacement strings but since they were variable length I would need an unused character as a separator ("~"). Thus if the starting command string was this:
"CREATE TABLE #T( ... CREATE TABLE #TT( ...
DROP TABLE #T DROP TABLE #TT"
substituting "$" for the repeated string " TABLE #T" would look like this:
"$ TABLE #T~CREATE$( ... CREATE$T( ... DROP$ DROP$T"
resulting in a total savings of 21 characters. This has further reduced our base cost from 2 to 1, making the savings formula now (N-1)*(L-1) - (1+1) meaning that occurrences as small as four two-character strings or two four-character strings would be compressible. plus the decompressor was relatively simple, resulting in the following SQL:
declare @s nvarchar(999)
Set @s='Œ''~‚,ROWS~€ŒŒ~}and~|le~{char(1~\e ~Q Se|ct ~Jexec
sp_~< drop $+~;creat\$+~: From+~7SIZ\~6€INSERT into+~/JMSFor
Each~+ #t~&€Œ~$TAB|~"[$] n!‚ {1),v !,7!,i !,u !) ~!var{28)~;
(";t(BAS\n!,"/DB ŒUS\[^]IF DB_ID()>4 BEGIN /$ 6 Jspaceused
&Œ?&&,@where}=€} o.nam\not lik\&Œ#[0-9]%&&,@postcomm}=6tQdb_
name(),*: De|te:€ END Œ, Œ^ŒQTOP 5 BAS\,[$] ,7‚:t order by
cast(|ft(7,|n(7)-3) as int) desc<' ASCII(@s)< WHILE>99
SELECT @s=REPLACE
(
Right( @s, Len(@s) - (Charindex('~', @s)) ),
Left( @s, 1 ),
substring( @s, 2, Charindex('~', @s)-2 )
)
EXEC(@s)
(Editor note: the @s string was originally in one row but for formatting reasons it was exploded in multiple lines, same remark for the next code)
Of course, I had to compact this as well (remove line breaks, whitespace, etc.), with the final command string being:
declare @s nvarchar(999) Set @s='Œ''~‚,ROWS~€ŒŒ~}and~|le~{char(1~\e ~Q Se|ct ~Jexec sp_~< drop $+~;creat\$+~: From+~7SIZ\~6€INSERT into+~/JMSForEach~+ #t~&€Œ~$TAB|~"[$] n!‚ {1),v !,7!,i !,u !) ~!var{28)~;(";t(BAS\n!,"/DB ŒUS\[^]IF DB_ID()>4 BEGIN /$ 6 Jspaceused &Œ?&&,@where}=€} o.nam\not lik\&Œ#[0-9]%&&,@postcomm}=6tQdb_name(),*: De|te:€ END Œ, Œ^ŒQTOP 5 BAS\,[$] ,7‚:t order by cast(|ft(7,|n(7)-3) as int) desc<' ASCII(@s)< WHILE>99SELECT @s=REPLACE(Right(@s,Len(@s)-(Charindex('~',@s))),left(@s,1),substring(@s,2,Charindex('~',@s)-2)) EXEC(@s)
With a compressed length of 384 characters, I had shortened the original string by almost 50%, however, adding in the additional external commands, including the decompresser brought the total length back up to 550 characters. I had managed to achieve a final reduction of over 80 characters, not nearly as much as I had hoped, but still significant and after a week of trying, discarding and retrying dozens of different approaches and compression attempts, I was happy to settle for it.
Well, this has gotten pretty long and I have skipped over a lot of stuff (such as the compression routines, a whole story in themselves), so if anyone has any questions about this, I would request that you post them as a new thread at SQLServerCentral.com
As Barry said compression routines are whole story in themselves and I hope we will have time to talk about them in a near future!
Again congratulations to Barry to share with us this really interesting solution!
Get notified when a new challenge is available or the evaluation result published either by subscribing to the
RSS feed or subscribing to the Email Notification.