beyondrelational.com



September 2009 - Posts

TSQL Challenge #14 - Identify the longest sequence of characters in a string

This challenge is all about identifying sequences of characters in a string using a SET based query. The task is to identify the longest sequence of characters within a given string.

Here is the sample data for this challenge

Data
------------------------------------
9992EDC6-D117-4DEE-B410-4E5FAE46AE97
0BFC936B-BD9A-4C6A-AFB2-CF3F1752F8B1
4A73E7EB-7777-4A04-9258-F1E75097977C
5AAF477C-274D-400D-9067-035968F33B19
725DA718-30D0-44A9-B36A-89F27CDFEEDE
8083ED5A-D3B9-4694-BB04-F0B09C588888

Here is the expected output

Data                                 Char Pos  Len 
------------------------------------ ---- ---- ----
8083ED5A-D3B9-4694-BB04-F0B09C588888 B    20   2 
8083ED5A-D3B9-4694-BB04-F0B09C588888 8    32   5 
4A73E7EB-7777-4A04-9258-F1E75097977C 7    10   4
4A73E7EB-7777-4A04-9258-F1E75097977C 7    34   2
9992EDC6-D117-4DEE-B410-4E5FAE46AE97 9    1    3
9992EDC6-D117-4DEE-B410-4E5FAE46AE97 1    11   2
9992EDC6-D117-4DEE-B410-4E5FAE46AE97 E    17   2
5AAF477C-274D-400D-9067-035968F33B19 A    2    2
5AAF477C-274D-400D-9067-035968F33B19 7    6    2
5AAF477C-274D-400D-9067-035968F33B19 0    16   2
5AAF477C-274D-400D-9067-035968F33B19 3    32   2
725DA718-30D0-44A9-B36A-89F27CDFEEDE 4    15   2
725DA718-30D0-44A9-B36A-89F27CDFEEDE E    33   2

Column "char" shows the character having longest sequence within the string. "pos" shows the starting position of the sequence and the "len" column shows the length of the sequence.

The output should be ordered as given in the 'expected output' listing. The data having the longest sequence should come on top followed by the next longest sequence (descending order). Within each data, the rows should be ordered by the position at which the sequence starts.

Sample Data

Use the following script to generate the sample data needed for this challenge

DECLARE @t TABLE (Data VARCHAR(40) )

INSERT @t (Data) SELECT '9992EDC6-D117-4DEE-B410-4E5FAE46AE97'
INSERT @t (Data) SELECT '0BFC936B-BD9A-4C6A-AFB2-CF3F1752F8B1'
INSERT @t (Data) SELECT '4A73E7EB-7777-4A04-9258-F1E75097977C'
INSERT @t (Data) SELECT '5AAF477C-274D-400D-9067-035968F33B19'
INSERT @t (Data) SELECT '725DA718-30D0-44A9-B36A-89F27CDFEEDE'
INSERT @t (Data) SELECT '8083ED5A-D3B9-4694-BB04-F0B09C588888'

SELECT * FROM @t

Notes

  1. Read the Submission Guidelines and make sure that your solution follows them.
  2. The solution should work on SQL Server 2005, 2008 or later versions
  3. Last date to submit your entries: Monday 12 October 2009 Midnight GMT
  4. Use this forum for any questions related to TSQL Challenge #14

About the Authors.


TSQL Challenge 10 – Solution by Alejandro Mesa (Hunchback)

. Hunchback (or Alejandro Mesa) is is a Database developer at Bank of America.He has been working with SQL Server for almost 10 years by now, OLTP and OLAP. Lately he has been more dedicated to dimensional modeling, ETL, OLAP, and reporting services, but his passion is T-SQL. Alejandro is also a SQL MVP.

 

Alejandro used PIVOT and CROSS APPLY statement to order vertically and horizontally data. Theses features are available since SQL Server 2005 and enable to change a table-valued expression into another table. It's perfect to order horizontally and vertically data.

The idea, as you will see, is not far than the others in the concept but it’s quite different in the way to achieve it.

That’s maybe the last time Alejandro wins a TSQL Challenge as he is part of the team now. I would like to congratulate him for this challenge but also say again a big welcome!

How did Alejandro work on this challenge?

Here is the approach taken by Alejandro on this challenge as he explains us:

UNPIVOT the columns using a derived table, which contain as many rows as columns to UNPIVOT. I used this method because the columns are nullable, so if you want to preserve those rows (columns with null value) then you can not use PIVOT operator.

Notice that I used the function row_number to assign a subrogate key to each row in order to identify each set of columns after the UNPIVOT.Then I assigned a row number to each row partitioned by the subrogate key and ordered by the value of the column.

Finally, I used Pivot the result of the ranking to get the final table.

Here is the solution by Alejandro:

WITH unpvt 
AS 
( 
	SELECT 
		pk, 
		[value], 
		ROW_NUMBER() OVER(
		PARTITION BY pk ORDER BY [value]) AS rn 
	FROM 
	( 
		SELECT 
			pk, 
			CASE n.c1 
				WHEN 1 THEN t.c1 
				WHEN 2 THEN t.c2 
				WHEN 3 THEN t.c3 
			END AS [value] 
		FROM 
		(
			SELECT 	c1, c2, c3, 
					ROW_NUMBER() OVER(
					ORDER BY (SELECT 0)) AS pk 
			FROM @t) AS t 
			CROSS JOIN 
			(
				SELECT 1 AS c1 UNION ALL 
				SELECT 2 UNION ALL 
				SELECT 3) AS n 
		) AS r 
)
SELECT 	DISTINCT 
		[1] AS c1, 
		[2] AS c2, 
		[3] AS c3 
FROM 	unpvt 
PIVOT 
	( 
		MIN([value]) 
		FOR rn IN ([1], [2], [3]) 
	) AS pvt 
ORDER BY 
		c1, c2, c3;

 

What can we learn from this?

As I said before, this solution is not so far than the Rob’s one, but it shows us an interesting way of unpivoting without unpivot operator. It’s should be an old fashion way if it don’t have the ROW_NUMBER. Let’s analyze it’s procedure:

  1. from the original table, tag the rows with an id. This is to get a reference to the original row if needed for future computation. This may be optional.
  2. create a virtual table with the same number of rows than the columns you want to unpivot.
  3. cross join the 2 tables.
  4. select only the corresponding value for each row each column by a magical case when.

let’s try it:

DECLARE @T TABLE (c1 CHAR(1), c2 CHAR(1))
INSERT INTO @T (c1, c2) values ('A','B')
INSERT INTO @T (c1, c2) values ('X','Y')

-- ORIGINAL WITH ID
SELECT	c1,c2,
		ROW_NUMBER() OVER(
		ORDER BY (SELECT 0)) AS Id
FROM	@T
-- MULTIPLIED
SELECT	*
FROM	(
		SELECT	c1,c2,
				ROW_NUMBER() OVER(
				ORDER BY (SELECT 0)) AS Id
		FROM	@T )
		AS SOURCE,(
			SELECT 'Col1' AS v1 UNION ALL 
			SELECT 'Col2' 
		) AS ROWS
-- UNPIVOTED
SELECT	SOURCE.Id,
		VAL=
			CASE ROWS.v1
				WHEN 'Col1' THEN SOURCE.c1
				WHEN 'Col2' THEN SOURCE.c2
			END
		
FROM	(
		SELECT	c1,c2,
				ROW_NUMBER() OVER(
				ORDER BY (SELECT 0)) AS Id
		FROM	@T )
		AS SOURCE,(
			SELECT 'Col1' AS v1 UNION ALL 
			SELECT 'Col2' 
		) AS ROWS	

This will give us the sequence:

image

 

As we already said in this column, the CASE statement is very useful when you need to make some kind of multiple aggregation inline or column selection.

A Final word

Congratulations again for this solution and it’s very nice to have you now in team working with us in the challenges.

Stay with for the next step with TSQL Challenge #14!

 

About the Authors.


TSQL Challenge 10 – Solution by Divya Agrawal

 

Divya has an original solution based on XML data type methods and PIVOT operator to perform the required sort operation for this challenge. XML Data type is a great feature added in SQL Server 2005.  While XML methods can be used to solve a number of common TSQL problems, you should be very careful about the performance. There are times when XML solutions are found to be better and there are times when they are not advisable. You should probably decide this based on a case-to-case basis.

How did Divya work on this challenge?

Here is the approach Divya took to solve this challenge, in her own words.

Well to start with, it was a complex challenge. Let me show you the input table to the problem.

The first thing comes to our mind would be to sort it horizontally as vertical sorting is a facility provided by SQL Server. So, that’s all the problem is all about. I thought first I will combine the rows 1 to 6 into a single column separated by XML tags. Also I wanted to add a column id ordered on column ‘c1’ to keep a track of the order of the row. This can be done by the following query:

WITH temp AS
( 
 SELECT ROW_NUMBER() OVER (ORDER BY c1) AS id,
 	c1,c2,c3,
 	CAST('<x>'+c1+'</x><x>'+c2+'</x><x>'+c3+'</x>' 
	AS XML) AS xmlcol
 FROM @t

)SELECT * FROM temp;

 

The output of the query above would give you an extra column ‘xmlcol’

Next step would be to make the ‘xmlcol’ values in vertical order, so that we can sort them. Along with the vertical sequence we need to remember on which column we have ordered a single row that has been made vertical. That we can identify by using PARTITION BY on ‘id’ column and ordered by the value in xml column.

WITH temp AS
( 
 SELECT ROW_NUMBER() OVER (ORDER BY c1) AS id,
 	c1,c2,c3,
 	CAST('<x>'+c1+'</x><x>'+c2+'</x><x>'+c3+'</x>' 
	AS XML) AS xmlcol
 FROM @t
)
SELECT 	T.id,T1.value,
	ROW_NUMBER() OVER (PARTITION by T.id ORDER BY value) AS col 
FROM temp T
CROSS APPLY
(
 	SELECT	fdata.D.value('.','varchar(50)') AS value  
	FROM T.xmlcol.nodes('X') AS fdata(D) 
)T1

After executing the query above would give you the output as: 

You might be wondering why the column ‘col’ is taken. It is taken for rotating back all the vertically transferred rows into horizontal format. We had originally three columns ‘c1’,’c2’ and ‘c3’. These column ‘col’ would behave as those three columns col value 1 is ‘c1’, value 2 is ‘c2’ and value 3 is ‘c3’. Let us now have a look at the complete query:

WITH temp AS
( 
 SELECT	ROW_NUMBER() OVER (ORDER BY c1) AS id,
 	c1,c2,c3,
 	CAST('<x>'+c1+'</x><x>'+c2+'</x><x>'+c3+'</x>' 
	AS XML) AS xmlcol
 FROM @t
)
SELECT   T.id,T1.value,
	ROW_NUMBER() OVER (
		PARTITION by T.id ORDER BY value) AS col 
FROM temp T
CROSS APPLY
(
 	SELECT	fdata.D.value('.','varchar(50)') AS value  
	FROM T.xmlcol.nodes('X') AS fdata(D) 
)T1

The rows can be rotated back to horizontal picture can be done with the use of PIVOT. And then finally do the vertical sorting using ORDER BY on all the columns.

The final output you will get is:

Use this forum to discuss this solution.

What can we learn from this?

Divya used a combination of XML Data type methods and PIVOT operator to produce the required output. XML Data type method nodes() can be used to access each node of an XML document in a SET based manner. The value() method is used to retrieve values from the XML nodes.

A final word

We would like to congratulate Divya for this writing an interesting solution. We hope to see another interesting solution from you, for the next challenge.

About the Authors.

Posted: 09-24-2009 9:34 AM by David Barbarin | with 5 comment(s)
Filed under: ,

TSQL Challenge 10 – Solution by Rob Farley

For this challenge #10, Rob comes with a very nice solution, he was also part of the winners in the TSQL Challenge #9 in July.

Owner/Principal of LobsterPot Solutions, is a SQL MVP and MCT based in Adelaide, Australia where he runs the local SQL Server User Group. He consults and trains across Australia, helping people improve the way they use their databases. He has also helped create exams for Microsoft Learning and is a regular presenter at conferences around Australia. His company can be found at http://www.lobsterpot.com.au and his blog at http://msmvps.com/blogs/robfarley

 

The Rob's solution is based on PIVOT and UNPIVOT operators These feature is available since SQL Server 2005 and enable to change a table-valued expression into another table. It's perfect to order horizontally and vertically data.

 

How did Rob work on this challenge?

Here is the approach taken by Rob on this challenge as he explains us:

The problem here focuses on having to sort data across columns, not just in by rows. Thing is - SQL Server doesn't allow for sorting across columns very well. This is where the problem becomes an interesting one. I figured that since I can easily order rows, why not unpivot the data, so that I could then order it and pivot it back into the required shape.

Using ROW_NUMBER(), I can make sure that I can order the values that had been in columns, using the PARTITION BY clause to make sure that I'm keeping track of which row the values had originally come from.

I use CTEs throughout the problem, as I find them generally more effective than temporary tables or table variables.

Here is the final solution provided by Rob with the comments inline:

--Let's start by unpivotting the data we have.
WITH 
unpivotted AS (
	SELECT *
	FROM (
	/*Rownum makes sure we don't lose track of which row is which in
	our original data.I'm not actually interested in which order 
	ROW_NUMBER() gets applied, I just want to make sure that each 
	row is uniquely identified. Therefore, 	ORDER BY (SELECT 1) is
	used - just arbitrarily assigning row numbers.*/
		SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rownum, * 
		FROM @t
	) t UNPIVOT (vals FOR col IN (c1,c2,c3)) p
)
, 
/*Now we can provide an order for the columns, 
	to do the horizontal order*/
ordered_cols AS (
	SELECT 	rownum, vals, 
			ROW_NUMBER() OVER (
				PARTITION BY rownum ORDER BY vals) AS colnum
	FROM unpivotted
)
/*Finally we can pivot them back again, and use 
	DISTINCT to de-dup the data*/
SELECT DISTINCT [1] AS [C1], [2] AS [C2], [3] as [C3]
FROM ordered_cols
  PIVOT
  (MAX(vals) FOR colnum IN ([1],[2],[3])) p
ORDER BY [1], [2], [3]
; 

What can we learn from this?

We have here a nice example of the use of PIVOT/UNPIVOT functions and how to combine them to make a complex operation.

The key point here was to consider the initial table as an aggregated data table already pivoted. The idea is simple, if you want to make some ordering in your columns (obviously if they carry the same kind of data), put them in rows! That’s why the first step is to unpivot the original table.

Just try this simple sample to compare:

--ORIGINAL
SELECT c1='2',c2='1',c3='3' UNION 
SELECT '3','2','1' UNION
SELECT 'Z','X','Y'
--UNPIVOTED
;WITH 
REPIVOT AS (
	SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rownum, * 
	FROM (
		SELECT c1='2',c2='1',c3='3' UNION 
		SELECT '3','2','1' UNION
		SELECT 'Z','X','Y'
	) ORIGINAL UNPIVOT (vals FOR col IN (c1,c2,c3)) p
)
SELECT * FROM REPIVOT

Which will give us:

image

Then you are able to make your specific computation over rows and columns (an ordering in our case) and then make a normal pivot to put again the data in the right way.

These functions where introduced in Sql Server 2005 and even if they are not well used by everyone they are really useful. And as usual with integrated functions the code is easier to read and the performances better than an home mode script to make this pivot/unpivot.

A final word

So, congratulations again Rob for this challenge and thank you for this explanation! We hope to see you soon in a next challenge.

We will continue in the next days with the solutions from Divya and Alejandro and don’t miss the next challenge by the end of the week!

About the Authors.

Posted: 09-23-2009 8:00 AM by David Barbarin | with no comments
Filed under: ,

TSQL Challenge 10 Winners

Thanks to all

First, we would like to thank all challengers for their participation. Every solution has its points of interest, and we always spend a lot of time analyzing all the solutions to find the more accurate, different views and so on to find the more valuable content to present to you for a type of problem.

For this challenge, to select the 3 winners was particularly difficult because many solutions are in the same mind. We had to work more accurately to define our metric's evaluation. We hope to share that with you in a near future in order to be completely transparent and to get your feedback.

The Challenge

This challenge is about sorting the data horizontally and vertically. The major difficulty solving this is that to sort data horizontally. Here is the source data for the challenge:

C1   C2   C3
---- ---- ----
2    1    3
3    2    1
Z    X    Y
B    C    D
Y    Z    X
B    C    A

By using usual sorting  we realize that it will be difficult to obtain this solution :

C1   C2   C3
---- ---- ----
1    2    3
A    B    C
B    C    D
X    Y    Z

Key points

Usually we have to order the data by using the order by statement  but with horizontal sorting we have to use a very different approach. There are many  solutions but all are not good.

You used 3 solutions mostly :

  • based on comparison operator (=, <=, >=),
  • based on union operator or
  • based on new SQL Server 2005's operators PIVOT and UNPIVOT

One of our metric's evaluation consisted of to select the most scalable and efficient solution.

 

The Winners

First of all, we want to give great congratulations to the winners!

This challenge was very difficult to analyze because the most solutions are very good. The winners show very scalable and efficient  or original queries and we want to congratulate them again for that. We want also thanks all other challengers for their participation because they show us also different point of views and sometimes they are not really far in terms of performance from the winners set.

We will see in the next days how Rob, Divya and Hunchback solve this challenge:

Owner/Principal of LobsterPot Solutions, is a SQL MVP and MCT based in Adelaide, Australia where he runs the local SQL Server User Group. He consults and trains across Australia, helping people improve the way they use their databases. He has also helped create exams for Microsoft Learning and is a regular presenter at conferences around Australia. His company can be found at http://www.lobsterpot.com.au and his blog at http://msmvps.com/blogs/robfarley
 
 

Divya is working as SQL Server Developer and is very keen in sharing and gathering knowledge on SQL Server.

She is also a multi-times winner of TSQL Challenges.

 

 

Hunchback (or Alejandro Mesa) is is a Database developer at Bank of America.He has been working with SQL Server for almost 10 years by now, OLTP and OLAP. Lately he has been more dedicated to dimensional modeling, ETL, OLAP, and reporting services, but his passion is T-SQL. Alesandro is also a SQL MVP.

 

 

And don’t forget to check for the next challenge #14 which will be online by the end of this week!

About the Authors.


Copyright © Beyondrelational.com