TSQL Challenges

TSQL Challenges intend to help you to test and enhance SET based querying skills using TSQL.





July 2009 - Posts

TSQL Challenge 9 – Solution by Syed Mehroz Alam

For the TSQL Challenge #9, Syed Mehroz Alam is again part of the winners. You can see its previous solutions for the TSQL Challenge #7 and TSQL Challenge #8.

syed Syed Mehroz Alam, living in Karachi, Pakistan, is primarily a developer focusing Microsoft technologies. He has completed his bachelors as a Computer Systems Engineer in 2006 and is currently pursuing a Masters degree in Computer Science. Despite developing rich internet applications, he loves to work with SQL Business Intelligence platform that enhances his TSQL expertise. He is fond of logical challenges and has won several speed programming competitions which are listed here.

Syed writes articles at CodeProject, and expresses his experiences with .NET and SQL server at his blog. When he has time, he contributes to MSDN and Silverlight forums. He loves to play football (Soccer) and computer games, especially first person shooters and RPGs.

How did Syed work on the challenge?

Here is the comments by Syed on this challenge:

An important part of the challenge was to write a scalable query capable of handling milllions of rows. This requirement kicked off the following simple answer to the question involving min/max subqueries:

;With Result
as
(
Select
(Select IsNull(Max(C.id)+1, (Select Min(id) from @tc9)) 
 from @tc9 C
 where C.id <= A.id 
   and (C.SendState<>A.SendState or C.AckState<>A.AckState)
 ) as MinID,
(Select IsNull(Min(C.id)-1, (Select Max(id) from @tc9)) 
 from @tc9 C
 where C.id >= A.id 
   and (C.SendState<>A.SendState or C.AckState<>A.AckState)
 ) as MaxID,
 A.SendState, A.AckState
From @tc9 A
)

Select distinct C.MinID, C.MaxID, C.SendState, C.AckState
from Result C
order by C.MinID

The above solution tries to find Min/Max IDs for every row using a subquery and hence will result in hopeless speed when executed on very large tables. We need to find another more efficient way to solve the problem.

Luckily, T-SQL 2005 presents us with ranking functions that are very helpful in various scenarios. Consider assigning a row number per Send and Ack states using the following query.

;With RowNoPerStateCombination
As
(
    Select
        *,
        --assign a row number grouped per SendState/AckState combination
        ROW_NUMBER() OVER(PARTITION BY SendState,AckState ORDER BY ID) AS RowID
    From @tc9
)

Select * from RowNoPerStateCombination
order by ID

The output is

challenge9-rowid

Notice a great pattern here: if we try to subtract this generated RowID from the primary key(ID), it is going to give us a unique result for every consecutive send/ack state combination. This is depicted using the following screenshot from Excel:

challenge9-pattern

Since we are able to produce a unique result (say it GroupID) for every consecutive send/ack state combinations, we can just pick the min and max values per GroupID per Send/Ack State combination. Here’s what I am talking about:

;With MessageGroups
as
(
    Select
        *,
        -- assign a unique group number for each 
        -- consequtive state combination
        ID - ROW_NUMBER() 
        OVER(PARTITION BY SendState,AckState 
             ORDER BY ID) AS GroupID
    From @tc9
    Where CreationDate between @startTime and @endTime
)

Select MIN(ID) as FirstIdInclusive, MAX(ID) as LastIdInclusive, 
      SendState, AckState
From MessageGroups
Group by GroupID, SendState, AckState
Order by MIN(ID)

And we get the required output:

challenge9-output

Finally, here’s my complete solution:

--populate sample data
DECLARE @tc9 TABLE(
    ID INT IDENTITY(1,1),
    CreationDate DATETIME,
    Content NVARCHAR(10),
    SendState BIT,
    AckState BIT
)

INSERT INTO @tc9 (CreationDate,Content,SendState,AckState)
SELECT GETDATE()-1.0,'Msg #1',0,0 UNION
SELECT GETDATE()-0.9,'Msg #2',0,0 UNION
SELECT GETDATE()-0.8,'Msg #3',1,1 UNION
SELECT GETDATE()-0.7,'Msg #4',1,1 UNION
SELECT GETDATE()-0.6,'Msg #5',1,1 UNION
SELECT GETDATE()-0.5,'Msg #6',1,0 UNION
SELECT GETDATE()-0.4,'Msg #7',1,0 UNION
SELECT GETDATE()-0.3,'Msg #8',1,0 UNION
SELECT GETDATE()-0.2,'Msg #9',1,0 UNION
SELECT GETDATE()-0.1,'Msg #10',1,1

--SELECT * FROM @tc9

--solution
Declare @startTime datetime
Declare @endTime datetime

set @startTime = GetDate()-20.8
set @endTime = GetDate()

;With MessageGroups
as
(
    Select
        *,
        -- assign a unique group number for each 
        -- consequtive state combination
        ID - ROW_NUMBER() 
        OVER(PARTITION BY SendState,AckState 
             ORDER BY ID) AS GroupID
    From @tc9
    Where CreationDate between @startTime and @endTime
)

Select MIN(ID) as FirstIdInclusive, 
       MAX(ID) as LastIdInclusive, 
       SendState, AckState
From MessageGroups
Group by GroupID, SendState, AckState
Order by MIN(ID)

So, congratulations again Syed and thank you for this great piece of explanations!

What can we learn from this?

It is very amazing to see that Syed produces finally a query very similar the one of Rob. I think I had already talked enough about it on the previous, so what can we add more to this?

One interesting thing we can note from the explanations of Syed, is that he call its technique a pattern. And that’s what it is, a technique you can re-apply to your own needs.

TSQL development is full of these patterns and that’s why we love running these challenges. This one major target of the TSQL Challenges. We try to present each time a challenge mainly focused on problems we found on our day to day sql life or by users we helped on forums. Beyond that we always think on how it should be valuable and what new techniques we can learn. That’s finally what all the expert challengers give to us, a transformation of the problems to patterns we can reuse.

One other interesting thing we can extract from all of that is that the use of ranking functions. Or should I say, the use beyond the use of the ranking functions.

As you may know, these ranking function were introduced in Sql Server 2005:

These features are very useful since they were introduced and most of the times they changed our (TSql)life. But more than their common usage what we need to learn is techniques like the one we saw here that can really improve our queries.

Creating patterns on an intelligent combination of these ranking function is finally a great challenge!

 

A Final word

So thanks again Syed for sharing this solution with us !

If you have any questions, feel free to write on the dedicated forum thread: http://beyondrelational.com/groups/tsqlchallenge/forum/t/136.aspx

Again, if you find a new interesting solution for this challenge even after the end of the challenge we will be very pleased to it and write about it!

 

About the Authors

Rui Carvalho is the director for TSQLChallenges. He is a senior developer on Sql Server and .Net mainly experienced in web applications. He work as consultant for a Microsoft experts company called Winwise in France. Rui worked in the past as a full time Sql developer specialized in sql optimizations and reporting for CRM applications and now  mainly focus his time on Sql Server developments, .Net core technologies and Asp.net MVC. Architecture and software design are also part of his job.
He runs two blogs, one in French and one in English.

Introducing new “TSQL Challenge” Team Members

I take this opportunity to thank the TSQL Challenge Community for your continuous support and encouragement towards ‘TSQL Challenges’ we are doing for the last several months. The TSQL Challenge Team is further strengthened by the participation of some of the most renowned TSQL Experts from the Industry. Though these people are well known in the SQL Server community and most of you know them, I would like to take the pleasure of writing a brief introduction for each of them.

Peter Larsson, SQL Server MVP

Peter is a Professional database developer since 2001. Peter started to use Microsoft SQL Server in 1993, and have been true since then. Peter is mostly working with performance troubleshooting and database architecting. See complete profile.

Mangal Pardeshi, SQL Server MVP

Mangal is an E&TC Engineer from Pune University. He started his career as an ERP Technical Consultant in 2007. Currently he works as a BI Developer.    His core areas of expertise are Data Warehousing and Business Intelligence.

See complete profile.

Alejandro Messa, SQL Server MVP

Alejandro is a Dabase developer at Bank of America.He has been working with SQL Server for almost 10 years by now, OLTP and OLAP. Lately he has been more dedicated to dimensional modeling, ETL, OLAP, and reporting services, but his passion is T-SQL.

 

See complete profile.

Adam Haines

Adam Haines is a MCP Database Administrator and Developer, who has been working with database technology since 2004. Adam has experience in database administration, performance tuning and optimization techniques, working in a clustered environment, SSIS development, Analysis Services, SSRS development, and Web and Windows development. See complete profile.

The Mission

The entire “TSQL Challenge” team will focus on fulfilling our mission; “helping people to enhance their SET based query writing skills”. We will come up with more and more interesting TSQL Challenges that encourages you to look for alternate logics and inspires you to think outside the regular thought process.

The Team

“TSQL Challenges” is a community effort and is a show run by volunteers. Here are those individuals who are proud members of the ‘TSQL Challenges’ Team.

 Jacob Sebastian

Rui Carvalho

 

Antoine Gémis

Ashish Gilhotra

Akhilkumar Patel

Vijay Krishna Birju

Barbarin David

Anil Nair

Khyati Patel

Comments and Feedback

We will be very happy to hear your comments and feedback on ‘TSQL Challenges’. Please feel free to contact us with your comments. If you have got the idea for an interesting challenge, we will be glad to publish your challenge as well.

TSQL Challenge 9 – Solution by Rob Farley

For the TSQL Challenge #9, Rob was one of the first to provide a very smart and simple solution.

Owner/Principal of LobsterPot Solutions, is a SQL MVP and MCT based in Adelaide, Australia where he runs the local SQL Server User Group. He consults and trains across Australia, helping people improve the way they use their databases. He has also helped create exams for Microsoft Learning and is a regular presenter at conferences around Australia. His company can be found at http://www.lobsterpot.com.au and his blog at http://msmvps.com/blogs/robfarley

How did Rob work on this challenge?

Here is the approach taken by Rob on this challenge as he explains us:

From the sample data, let’s consider what happens if we query the data for only those entries with SendState = 1 and AckState = 1:

SELECT *
FROM @tc9
WHERE SendState = 1 AND AckState = 1
ORDER BY ID;

Result :

ID CreationDate            Content    SendState AckState
-- ----------------------- ---------- --------- --------
3  2009-07-26 00:10:24.540 Msg #3     1         1
4  2009-07-26 02:34:24.540 Msg #4     1         1
5  2009-07-26 04:58:24.540 Msg #5     1         1
10 2009-07-26 16:58:24.540 Msg #10    1         1

We see that our ID column has values 3,4,5,10. If we're looking for the start and end of each range, we can compare this ID value with a ROW_NUMBER() field :

SELECT ROW_NUMBER() OVER (ORDER BY ID) AS rownum, *
FROM @tc9
WHERE SendState = 1 AND AckState = 1
ORDER BY ID;

Result:

rownum ID CreationDate            Content    SendState AckState
------ -- ----------------------- ---------- --------- --------
1      3  2009-07-26 00:10:24.540 Msg #3     1         1
2      4  2009-07-26 02:34:24.540 Msg #4     1         1
3      5  2009-07-26 04:58:24.540 Msg #5     1         1
4      10 2009-07-26 16:58:24.540 Msg #10    1         1

Our rownum field is a perfect sequence, no gaps. But every time there  is a gap in the values in our ID field, the difference between it and our rownum field grows:

SELECT ROW_NUMBER() OVER (ORDER BY ID) AS rownum, 
ID - ROW_NUMBER() OVER (ORDER BY ID) AS Diff, *
FROM @tc9
WHERE SendState = 1 AND AckState = 1
ORDER BY ID;

(Editor Note: for publishing reasons and to let the results table fit correctly the width of the page, we removed the CreationDate column which is not significant here) 

rownum Diff ID  Content    SendState AckState
------ ---- --  ---------- --------- --------
1      2    3   Msg #3     1         1
2      2    4   Msg #4     1         1
3      2    5   Msg #5     1         1
4      6    10  Msg #10    1         1

We could then GROUP BY this difference, to get each run of consecutive values in our ID field, looking at the MIN and MAX in each run:

WITH numbered AS (
SELECT	ROW_NUMBER() 
	OVER (ORDER BY ID) AS rownum, 
	ID - ROW_NUMBER() 
	OVER (ORDER BY ID) AS Diff, 
	*
FROM @tc9
WHERE SendState = 1 AND AckState = 1
)
SELECT	MIN(ID) AS FirstIDInclusive, 
	MAX(ID) AS LastIDInclusive
FROM numbered
GROUP BY Diff;
FirstIDInclusive LastIDInclusive
---------------- ---------------
3                5
10               10

This is great, but we want to be able to do this for each different combination of SendState and AckState.

Well, no problem - just partition the ROW_NUMBER() function, and include  SendState and AckState in the GROUP BY clause.

In this final version, I've ignored the rownum field, and renamed the Diff field as GapSize. I've also aliased the fields as in the question:

WITH numbered AS
(
SELECT	ID - ROW_NUMBER() OVER (
	PARTITION BY SendState, AckState 
	ORDER BY ID) AS GapSize,
			*
FROM @tc9
)
SELECT	MIN(ID) AS FirstIdInclusive, 
		MAX(ID) AS LastIdInclusive, 
		SendState, 
		AckState AS AcknowledgeState
FROM numbered
GROUP BY GapSize, SendState, AckState
ORDER BY FirstIdInclusive;

Which produces the final requested result:

FirstIdInclusive LastIdInclusive SendState AcknowledgeState
---------------- --------------- --------- ----------------
1                2               0         0
3                5               1         1
6                9               1         0
10               10              1         1

 

So, congratulations again Rob for this challenge and thank you for this explanation!

What can we learn from this?

Take the time for reflection

First of all, I really want to focus on the approach. This is a really generic topic not specific to this challenge but it is important to take some time to write some lines about it. Each solution we analyze is different and seeing the approach and the reflection sequences taken by these experts is always a very good learning point.

As for common TSQL needs, it is important to decoupling the the transformation of the data. Most of the time, our instinct guides us to start writing THE query that gives the final solution from scratch. Then, even if we manage to produce the correct data at this first attempt, it is very rarely the best solution or simply a good solution.

So, as Rob did for this challenge, we can think of taking these reasonable steps:

  • Take a sample from the original data : if you need to query tables with millions of rows, don’t spend time waiting for results while testing your queries
  • Try to find the most accurate target to start your queries: between the source data and the final report we can see that Send and Ack states are the values that defines the group sets and FirstId and LastId are only values from these sets.
  • Write small queries with only one transformation at one time : see the result of each feature you introduce on your query. It will easier to debug and step by step it will guide you to expected result.
  • Go back and retry if you arrive to a very complicated solution that needs to make obscure TSQL to make it work at the end. Error is human, and refactoring code is always a good sign of sanity.
  • If you are working on a tricky point, it should be interesting to get the subqueries that helps produce the final result in comments. It should help other people which needs to work on your queries even if you are not any more on the project.

Make it simple

The very interesting point of this solution is the simplicity and the nature of its elements. In fact it produces the best results even with a lot of rows. Why? Because of its simple structure:

  • Only one query with one CTE
  • No joins
  • Simple aggregates

CTEs and Aggregates have a cost but when you use them exactly in the right way they are created for, it should give you optimal results.

ID and Gap Size

Pay attention to this solution that it works because the ID of each row is an identity with sequential +1 values. If you have a similar case with no sequential values you need first to add an RowNumber column on all values to a sequential column to work with.

You should write:

(
	ROW_NUMBER() OVER
	(ORDER BY ID) - 
	ROW_NUMBER() OVER 
	(PARTITION BY SendState, AckState ORDER BY ID) 
)
AS GapSize

Insteat of only:

(
	ID - 
	ROW_NUMBER() OVER 
	(PARTITION BY SendState, AckState ORDER BY ID) 
)
AS GapSize

In final words the final and most important part of this solution is the GapSize! This calculation is the key point to identify our data islands in the original grouping sets.

So, to resume this solution:

  1. group by discriminator fields
  2. order each items inside each group with row_number as A
  3. (optionally get a global row_number to get a sequential id) as B
  4. B-A defines the data island  by each discriminator.

A Final word

So thanks again Rob for sharing with us this solution!

If you have any questions, feel free to write on the dedicated forum thread: http://beyondrelational.com/groups/tsqlchallenge/forum/t/136.aspx

Don’t forget to check the new TSQL Challenge #11 which is still open for a few days

 

About the Author

Rui Carvalho is the director for TSQLChallenges. He is a senior developper on Sql Server and .Net mainly experienced in web applications. He work as consultant for a Microsoft experts company called Winwise in france. Rui worked in the past as a full time Sql developper specialised in sql optimisations and reporting for CRM applications and now  mainly focus his time on .Net core technologies and Asp.net MVC. Architecture and software design are also part of his job.

He runs two blogs, one in French and one in English.

TSQL Challenge 9 Winners

Thanks to all

That's a repetitive point, but I would like to thank all challengers for their participation. Every solution has its points of interest, and we always spend a lot of time analyzing all the solutions to find the more accurate, different views and so on to find the more valuable content to present to you for a type of problem.

We plan also in a near future to  provide to the community the full set of solutions for every challenge a separate downloadable project (every solution will be anonymized obviously and with the acceptance of every challenger). If you have any idea about the best way to keep and present these solutions, don’t hesitate to send us a mail!

The Challenge

This challenge is about searching for ordered data islands in a flat table. The difficulty solving this is that it is that it is really different than usual aggregation grouping. When you aggregate, you put in the same box all values for a same field whatever their position on the table.

Here is the source data for the challenge:

ID          CreationDate            Content    SendState AckState
----------- ----------------------- ---------- --------- --------
1           2009-05-27 22:15:43.647 Msg #1     0         0
2           2009-05-28 00:39:43.647 Msg #2     0         0
3           2009-05-28 03:03:43.647 Msg #3     1         1
4           2009-05-28 05:27:43.647 Msg #4     1         1
5           2009-05-28 07:51:43.647 Msg #5     1         1
6           2009-05-28 10:15:43.647 Msg #6     1         0
7           2009-05-28 12:39:43.647 Msg #7     1         0
8           2009-05-28 15:03:43.647 Msg #8     1         0
9           2009-05-28 17:27:43.647 Msg #9     1         0
10          2009-05-28 19:51:43.647 Msg #10    1         1

Using usual aggregation, if we want to get first and last ID for each blocks based on SendState and AckState, we can only obtain something like this:

FirstIdInclusive LastIdInclusive SendState AcknoledgeState
---------------- --------------- --------- ---------------
1                2               0         0
6                9               1         0
3                10              1         1

But the challenge here is to obtain sequential series for each different couple of (SendState,AckState), like this:

FirstIdInclusive LastIdInclusive SendState AcknoledgeState
---------------- --------------- --------- ---------------
1                2               0         0
3                5               1         1
6                9               1         0
10               10              1         1

Key points

Solving this challenge involves solving two hurdles, one about finding the solution and the other about getting the best performance.

Obviously the first key point is to produce a query that works, but it has to works also in larger tables than the one provided in the example.

But, as explained in the original challenge post, one major target of this challenge is to make the best solution in terms of performance. Solving this challenge with a lot of complex sub-queries and self-joins, decoupling all, can be a solution but will it work with a table of one million rows?

By the nature of SQL it-self this kind of queries give very disparate results (from 1 to 1000 times difference) depending on how you manage it. The solutions we selected perform perfectly this performance point, and, as you will see they are also very simple and short. As usual with TSQL queries, writing the most short and accurate query is often the best way to get performance.

The Winners

First of all, I want to give great congratulations to the winners!

This challenge was one that receives the most different kind of solutions since the beginning and it was very interesting to analyze the differences between all of them. The winners show very smart queries and I want to congratulate them again for that. I want also thanks all other challengers for their participation because they show us also different point of views and sometimes they are not really far in terms of performance from the winners set.

We will see in the next days how Rob, Syed and David solve this challenge:

  • Rob Farley
    rob_sideOwner/Principal of LobsterPot Solutions, is a SQL MVP and MCT based in Adelaide, Australia where he runs the local SQL Server User Group. He consults and trains across Australia, helping people improve the way they use their databases. He has also helped create exams for Microsoft Learning and is a regular presenter at conferences around Australia. His company can be found at http://www.lobsterpot.com.au and his blog at http://msmvps.com/blogs/robfarley

  • Syed Mehroz Alam
    Syed
    Syed Mehroz Alam, living in Karachi, Pakistan, is primarily a developer focusing Microsoft technologies. He has completed his bachelors as a Computer Systems Engineer in 2006 and is currently pursuing a Masters degree in Computer Science. Despite developing rich internet applications, he loves to work with SQL Business Intelligence platform that enhances his TSQL expertise. He is fond of logical challenges and has won several speed programming competitions which are listed here.
    Syed writes articles at CodeProject, and expresses his experiences with .NET and SQL server at his blog. When he has time, he contributes to MSDN and Silverlight forums. He loves to play football (Soccer) and computer games, especially first person shooters and RPGs.

  • David Barbarin
    david David, living in Lyon, France, is SQL Server DBA working as expert to improve existing systems performance. He participates actively to french Sql Server dedicated sites and forums and has a lot of certification regarding Microsoft technologies (Microsoft MCDBA, MCTS SQL Server 2005 , MCITP SQL Server 2005).
    A special note to David, as he is now part of our volunteers team 

If you have any generic questions about this challenge, the discussion is always opened on the dedicated thread of the forum: http://beyondrelational.com/groups/tsqlchallenge/forum/t/136.aspx.

TSQL Challenges - All In One Overview
TSQL Challenges are growing every day and we have now a lot of valuable information produced by our challengers. It is not always easy to have a global idea of what we done in the past and that's the target of this overview page. It is also the opportunity...
TSQL Challenge 11 – Calculate the lowest price of an Item by applying the best combination of discount coupons

We hope you enjoyed the previous challenge and it is time to look at the next challenge. We are very excited to introduce TSQL Challenge #11, which we think all of you will enjoy solving. Just like we did with the previous challenge, there will be a FREE training webcast for those people who find it hard to solve. If you would like to attend the training webcast, please register at http://tsqlchallenge11.eventbrite.com/.

If you have comments, questions, challenge ideas or problems that you think should be interesting to discuss, don’t hesitate to post them on the dedicated forum threads or write to us.

The Context

The context of this challenge is about dealing with combinations and how to extract valuable data. For this challenge we will work with two data tables.

The products with their prices:

ID NAME    PRICE
-- ------- ---------
1  PROD 1  100,00
2  PROD 2  220,00
3  PROD 3  15,00
4  PROD 4  70,00
5  PROD 5  150,00

And the discount coupons:

ID NAME         VALUE  IS_PERCENT
-- -----------  ------ ----------
1  CP 1 : -15$  15     0
2  CP 2 : -5$   5      0
3  CP 3 : -10%  10     1
4  CP 4 : -12$  12     0

On the IS_PERCENT column, you should understand, if 0 value is considered as real money value, if 1 then it is a percent value of the price.

The Challenge

For this shopping application, customers could use one to two coupons for the same product but the discount price can not be less than 70% of the original price and the total amount of the discount can not exceed 30$.

It is important to note that coupons are applied in a cumulative way. The second coupon is applied on the result of the original price + first coupon.

With these conditions, the boss ask for a report that shows for each product the minimum price that should be paid for a product using any combination of the discount coupons. For this report you need also to show:

  • Original price (PRICE)
  • Discount price (DISC_PRICE)
  • Total amount of the discount (TOT_DISC)
  • Total rate of the discount (RATE)
  • An info field with names of coupons applied (COUPON_NAMES)

Regarding the original data tables, here is the final report you should produce:

ID NAME    PRICE    DISC_PRICE  TOT_DISC  RATE    COUPON_NAMES
-- ------  -------- ----------- --------- ------- -------------------------
1  PROD 1  100.00$  73.00$      27.00$    27.00%  CP 4 : -12$ + CP 1 : -15$
2  PROD 2  220.00$  193.00$     27.00$    12.27%  CP 4 : -12$ + CP 1 : -15$
3  PROD 3  15.00$   13.50$      1.50$     10.00%  CP 3 : -10%
4  PROD 4  70.00$   49.50$      20.50$    29.28%  CP 1 : -15$ + CP 3 : -10%
5  PROD 5  150.00$  120.00$     30.00$    20.00%  CP 3 : -10% + CP 1 : -15$

Sample Data

As usual, you should use exactly the sample data as it is provided in your solution script:

DECLARE @T TABLE (ID INT IDENTITY, NAME NVARCHAR(20),PRICE MONEY)
INSERT INTO @T (NAME,PRICE) VALUES ('PROD 1',100) 
INSERT INTO @T (NAME,PRICE) VALUES ('PROD 2',220) 
INSERT INTO @T (NAME,PRICE) VALUES ('PROD 3',15) 
INSERT INTO @T (NAME,PRICE) VALUES ('PROD 4',70) 
INSERT INTO @T (NAME,PRICE) VALUES ('PROD 5',150) 


DECLARE @C TABLE (ID INT IDENTITY, NAME NVARCHAR(20), VALUE INT, IS_PERCENT BIT) 
INSERT INTO @C (NAME,VALUE,IS_PERCENT) VALUES ('CP 1 : -15$',15,0) 
INSERT INTO @C (NAME,VALUE,IS_PERCENT) VALUES ('CP 2 : -5$',5,0) 
INSERT INTO @C (NAME,VALUE,IS_PERCENT) VALUES ('CP 3 : -10%',10,1) 
INSERT INTO @C (NAME,VALUE,IS_PERCENT) VALUES ('CP 4 : -12$',12,0)

Notes

  1. As usual, write a single query that produces the expected output.
  2. You should target any version of Sql Server
  3. Send your entries to tc@beyondrelational.com
  4. Do not include your solutions in the body of the email. Send them as an attachment in the email. Name it “firstname_lastname.sql”.
  5. Add ‘TSQL Challenge #11’ in the subject line of the email.
  6. Last date to submit your entries: 27 July 2009
  7. Use this forum for any questions related to TSQL Challenge #11
Posted: 07-15-2009 6:55 PM by Rui Carvalho | with 12 comment(s)
Filed under: ,
TSQL Challenge 8 – Solution by Leonid Koyfman

Yesterday, we saw the solution from Syed Mehroz Alam and discovered how Syed used subqueries. Another winner of TSQL Challenge 8 is Leonid Koyfman.
First of all, I would like to congratulate Leonid who is a faifthfull challenger and a many times winner.
Leonid not only writes regularly excellent solutions, but also sends to us some great new ideas for TSQL Challenges learning community.
Congratulations Leonid, for being an outstanding challenger and thanks for your support!


Leonid is a SQL Server Expert, he runs a blog to share technologies and techniques.

How did Leonid puzzle out this challenge ?

Here is the story, as narrated by Leonid:

Leonid: From the given data sample I made an assumption that there is no employees with the same name there. (It would be more appropriate to have an EmpID as a parameter.)

In the beginning I used recursive CTE to make a list of employees with their corresponding lists of parents and hierarchy level indicators.
In anchor part of CTE I started with managers

WHERE ReportsTo is null 

and in recursive part pulled the rest of employees.

To build a list of Employee parents I concatenated EmpName using '|' as a delimiter.

Using Empid for this purpose (as Mattthieu Hodin did in his solution) would be safer.
I optimistically assumed that there are no last names with "|" in them :-)

Adding delimiter in the beginning and at the end of the list makes furhter searches against it a bit simplier.

Otherwise, instead of

WHERE EmpNameList like'%|'+@manager+'|%'

we would need to use

WHERE '|'+EmpNameList+'|' like'%|'+@manager+'|%'.

I didn't want to use a subquery in each employee row to calculate @manager level: it's the same value for each row.

To calculate it just once and then apply to each row I used instead

   CROSS JOIN

            (     SELECT      [level]

                  FROM        org_chart

                  WHERE       EmpName =@manager

            ) manager

Calculating identation to format output I used

(oc.[level]-manager.[level])

to always display hierarchy left aligned, disregarding to the subordination level of the @manager in the organization chart.

And then I just repeated space character and concatenated it with EmpName in the output:

Hierarchy=REPLICATE(char(32),Ident)+EmpName

I prefer REPLICATE(char(32),Ident) over  SPACE(Ident) because REPLICATE function is more generic.

It can repeat not just space but any character(s).


Thanks Leonid for this great explanation. 
Here is Leonid’s complete solution:

DECLARE @Employees TABLE (
	EmpID INT, 
	EmpName VARCHAR(20), 
	ReportsTo INT
)
INSERT INTO @Employees(
	EmpID, 
	EmpName, 
	ReportsTo
)
   
SELECT 1, 'Jacob'	,	NULL UNION ALL    
SELECT 2, 'Rui'		,	NULL UNION ALL    
SELECT 3, 'Jacobson',	NULL UNION ALL    
SELECT 4, 'Jess'	,	1	 UNION ALL    
SELECT 5, 'Steve'	,	1	 UNION ALL    
SELECT 6, 'Bob'		,	1	 UNION ALL    
SELECT 7, 'Smith'	,	2	 UNION ALL    
SELECT 8, 'Bobbey'	,	2	 UNION ALL    
SELECT 9, 'Steffi'	,	3	 UNION ALL    
SELECT 10,'Bracha'	,	3	 UNION ALL    
SELECT 11,'John'	,	5	 UNION ALL    
SELECT 12,'Michael'	,	6	 UNION ALL    
SELECT 13,'Paul'	,	6	 UNION ALL    
SELECT 14,'Lana'	,	7	 UNION ALL    
SELECT 15,'Johnson'	,	7	 UNION ALL    
SELECT 16,'Mic'		,	8	 UNION ALL    
SELECT 17,'Stev'	,	8	 UNION ALL    
SELECT 18,'Paulson'	,	9	 UNION ALL    
SELECT 19,'Jessica'	,	10

DECLARE @manager varchar(20)
SET @manager='Jacob'

;WITH org_chart 
AS ( 
	SELECT 
		Empid,
		EmpName,
		EmpNameList=cast('|'+EmpName+'|' as varchar(max)) ,
		ReportsTo ,
		[level]=0 
	FROM @Employees 
	WHERE ReportsTo is null
UNION ALL 
	SELECT  
		e.Empid,
		e.EmpName,
		EmpNameList=CAST(EmpNameList+e.EmpName+'|'as varchar(max)) ,
		e.ReportsTo , 
		[level]=[level]+1 
	FROM @Employees e join org_chart oc on e.ReportsTo=oc.Empid
)
SELECT 
	Hierarchy=REPLICATE(char(32),Ident)+EmpName
FROM(
	SELECT 
		Ident=(oc.[level]-manager.[level])*4,
		EmpName,
		EmpNameList
	FROM 
		org_chart oc
		CROSS JOIN 
		(	SELECT	[level] 
			FROM	org_chart 
			WHERE	EmpName =@manager
		)manager
	WHERE EmpNameList like'%|'+@manager+'|%'
)x
ORDER BY EmpNameList


This is the last post of the “winners of TSQL Challenge 8” serie.
So, let’s try to piece together what we have learn so far, and see if there is more to find out from Leonid’s solution.

What can we learn from this TSQL Challenge ?

We learned that Common Table Expressions, or CTEs, introduced in SQL Server 2005,  are great for solving many kind of problems.
CTEs are readable and understandable.
CTEs, and recursive CTE's are great too for SET based querying instead of RBAR (“Row By Agonizing Row”).
It fits well with TSQL Challenges goal: help you to test and enhance SET based querying skills using TSQL.

We’ve already pointed out some possible issues: 

And discussed about using CHARINDEX() or LIKE (see Matthieu Hodin’s solution)

Leonid is talking about:

JOIN to calculate manager’s level.

  • Matthieu joins the CTE and the employee table: simple and direct way. That would be the one i’ll use in production.
  • Syed preferred to use subqueries over the employee table: nice trick for small resultsets :)
  • Leonid did a self join over the CTE: elegant, as usual :-)
    Note thant using an employee table of 10000 rows, a resultset of 9983 lines,
    using INNER JOIN instead of CROSS JOIN was twice faster on my machine!

REPLICATE() or SPACE() ?

As Leonid said, REPLICATE() is more generic than SPACE(). It’s seems logical that a simpler fonction could outperforms a more complex one.
Note that SPACE() cannot handle more than 8000 characters. One point to Leonid for scalability!  

So, let’s do some trial learning.

For this solution, there is no performance difference between SPACE() and REPLICATE() using an employee table populated with 3000 lines, a primary key and clustered index on EmpId.

Using a table t_1 populated with 1M rows containing just an ID int column and a clustered index.

SELECT REPLICATE(CHAR(32),1000) FROM t_1m 
SELECT SPACE(1000) FROM t_1m

Average of two measures CPU Reads Writes Duration
REPLICATE() 7400 5938   101965
SPACE() 6800 5938   90472

Due to measures imprecision, it seems similar.

 

Now, trying to do some dynamic allocations.

SELECT REPLICATE(CHAR(32),id % 1000) FROM t_1m 
SELECT SPACE(id % 1000) FROM t_1m

Average of two measures CPU Reads Writes Duration
REPLICATE() 11709 5938   72220
SPACE() 4877 5938   72311

Durations are the same, but SPACE() CPU is near from 40% of REPLICATE() one.
No conclusion to draw because of measures imprecision and because the test is perhaps not reliable.
Does somebody have more information about this ?

Leonid, thanks again for sharing your experience and let us learn from you.
I hope that things learned here will be useful for solving real world problems.

If you have questions about this challenge the dedicated forum thread is still open to discuss: http://beyondrelational.com/groups/tsqlchallenge/forum/t/66.aspx

Thank you, see you soon for the next TSQL Challenge!

TSQL Challenge 8 - Solution by Syed Mehroz Alam

Yesterday, we saw the solution for TSQL Challenge #8 by Matthieu Hodin. Another winner of TSQL Challenge 8 is Syed Mehroz Alam.
First of all, I would like to congratulate Syed. Yes Syed, as you said: it’s a pleasure  to learn that the hard work has paid off!


Syed already won the TSQL Challenge 7. He is supporting our community at his blog.

But let’s present Syed.

Syed

 

 

 

Syed Mehroz Alam, living in Karachi, Pakistan, is primarily a developer focusing on Microsoft technologies. He has completed his bachelors as a Computer Systems Engineer in 2006 and is currently pursuing a Masters degree in Computer Science. Despite developing rich Internet applications, he loves to work with SQL Business Intelligence platform that enhances his TSQL expertise. He is fond of logical challenges and has won several speed programming competitions which are listed here.

Mehroz writes articles at CodeProject, and expresses his experiences with .NET and SQL server at his blog. When he has time, he contributes to MSDN and Silverlight forums. He loves to play football (Soccer) and computer games, especially first person shooters and RPGs.

How did Syed manage this TSQL Challenge?

This is the story of this challenge solving, told by Syed.

The big challenge was the condition that the CTE should not contain any filter for a specific manager.
Here is some code for populating test data.

--Populate test data
DECLARE @Employees TABLE (EmpID INT, EmpName VARCHAR(20), ReportsTo INT)
INSERT INTO @Employees(EmpID, EmpName, ReportsTo)
  SELECT 1, 'Jacob', NULL UNION ALL
  SELECT 2, 'Rui', NULL UNION ALL
  SELECT 3, 'Jacobson', NULL UNION ALL
  SELECT 4, 'Jess', 1 UNION ALL
  SELECT 5, 'Steve', 1 UNION ALL
  SELECT 6, 'Bob', 1 UNION ALL
  SELECT 7, 'Smith', 2 UNION ALL
  SELECT 8, 'Bobbey', 2 UNION ALL
  SELECT 9, 'Steffi', 3 UNION ALL
  SELECT 10, 'Bracha', 3 UNION ALL
  SELECT 11, 'John', 5 UNION ALL
  SELECT 12, 'Michael', 6 UNION ALL
  SELECT 13, 'Paul', 6 UNION ALL
  SELECT 14, 'Lana', 7 UNION ALL
  SELECT 15, 'Johnson', 7 UNION ALL
  SELECT 16, 'Mic', 8 UNION ALL
  SELECT 17, 'Stev', 8 UNION ALL
  SELECT 18, 'Paulson', 9 UNION ALL
  SELECT 19, 'Jessica', 10

I started by creating a recursive CTE for the whole table like this:

;With Hierarchy(EmpName, EmpID, Level, FullyQualifiedName)
As
(
  Select E.EmpName, E.EmpID, 0, Cast('.'+E.EmpName+'.' as Varchar(MAX))
    From @Employees E
    Where E.ReportsTo is null
  Union all
  Select E.EmpName, E.EmpID, H.Level+1, H.FullyQualifiedName+'.'+E.EmpName+'.'
    from @Employees E
    inner join Hierarchy H on H.EmpID=E.ReportsTo
)

Select Space(Level*4) + H.EmpName
  from Hierarchy H
  order by H.FullyQualifiedName

tsql8-1

Notice that I am constructing a FullyQualifiedName value for each row. This value consists of full path from root to the current person. This approach helped me in filtering the records in the final select statement where I just needed to look for a '.' + ManagerName + '.' filter. Here is a select query that extracts all subordinates for a particular manager.

Select Space(Level*4) + H.EmpName
  from Hierarchy H
  where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0
  order by H.FullyQualifiedName

The only issue that remains is that when the query is run for a non-parent employee, we get extra spaces in the beginning. E.g. when we run the query for “Paul”, we get an output like:

tsql8-2

I solved it by inserted a sub-query in the space() expression like this:

Select Space((Level-(Select Top(1) Level from Hierarchy H2 Where H2.EmpName=@manager))*4) + EmpName
  from Hierarchy H
  where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0
  order by H.FullyQualifiedName

Instead of two subqueries, I tried to use a join but the performance using sub-queries was better so I submitted the subquery solution. Here’s the join version I am referring, it looks very nice and clean:

Select Space((H.Level-H2.Level)*4) + H.EmpName
  from Hierarchy H
  inner join Hierarchy H2 on CHARINDEX('.'+H2.EmpName+'.', H.FullyQualifiedName) > 0
  where H2.EmpName=@manager
  order by H.FullyQualifiedName
 

In the last, here is the complete solution:

--Populate test data
DECLARE @Employees TABLE (EmpID INT, EmpName VARCHAR(20), ReportsTo INT)
INSERT INTO @Employees(EmpID, EmpName, ReportsTo)
  SELECT 1, 'Jacob', NULL UNION ALL
  SELECT 2, 'Rui', NULL UNION ALL
  SELECT 3, 'Jacobson', NULL UNION ALL
  SELECT 4, 'Jess', 1 UNION ALL
  SELECT 5, 'Steve', 1 UNION ALL
  SELECT 6, 'Bob', 1 UNION ALL
  SELECT 7, 'Smith', 2 UNION ALL
  SELECT 8, 'Bobbey', 2 UNION ALL
  SELECT 9, 'Steffi', 3 UNION ALL
  SELECT 10, 'Bracha', 3 UNION ALL
  SELECT 11, 'John', 5 UNION ALL
  SELECT 12, 'Michael', 6 UNION ALL
  SELECT 13, 'Paul', 6 UNION ALL
  SELECT 14, 'Lana', 7 UNION ALL
  SELECT 15, 'Johnson', 7 UNION ALL
  SELECT 16, 'Mic', 8 UNION ALL
  SELECT 17, 'Stev', 8 UNION ALL
  SELECT 18, 'Paulson', 9 UNION ALL
  SELECT 19, 'Jessica', 10

--Solution starts here
DECLARE @manager VARCHAR(20)
SELECT @manager = 'Jacob'

--CTE
;With Hierarchy(EmpName, EmpID, Level, FullyQualifiedName)
As
(
  Select E.EmpName, E.EmpID, 0, Cast('.'+E.EmpName+'.' as Varchar(MAX))
    From @Employees E
    Where E.ReportsTo is null
  Union all
  Select E.EmpName, E.EmpID, H.Level+1, H.FullyQualifiedName+'.'+E.EmpName+'.'
    from @Employees E
    inner join Hierarchy H on H.EmpID=E.ReportsTo
)

--Result
Select Space((Level-(Select Top(1) Level from Hierarchy H2 Where H2.EmpName=@manager))*4) + EmpName
  from Hierarchy H
  where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0
  order by H.FullyQualifiedName

You can read Syed’s solution here at his blog.

Syed already pointed out two key points of this challenge:

  • The CTE itself and how to build the path, that Syed nicely named FullyQualifiedName, perhaps in reference to networks FQDN :-)
  • Do no forget to remove extra spaces in the beginning.

So, what’s to learn from this solution ?

For this TSQL Challenge, we required a CTE, resulting in very similar solutions. Challengers had to find a way to make their solutions shaped.
We all like the words readibility, performance, scalability, understandable. All solutions were readable and understandable, thanks to all challengers!
Scalability is out in this cas. So let’s optimize!

Optimization

Syed already explained that he was not happy with the JOIN for removing extra spaces in the beginning.

Here are some explanations of the evaluation process and a comparison of two methods.
As we did not either provide a DDL script to create the table, neither populating data, we did 3 evaluations:

  • First, with some test data using the variable table that was provided with this challenge.
  • Secondly, with an employee table populated with 1000 lines, a primary key and clustered index on EmpId, with and without an index on EmpName.
  • Lastly with an employee table populated with 3000 lines, a primary key and clustered index on EmpId, with and without an index on EmpName.

Here is a comparison of Syed’s solution, between JOIN and subqueries.

  CPU Reads Writes Duration
JOIN 21294 808381 11158 26749
SubQueries 17454 813791 11169 23871

More I/O, but faster.

 

Also, Syed did not mention it, but he chose to use CHARINDEX() instead of LIKE.
It seems that  CHARINDEX() is more efficient than LIKE in this case: separators are before and after the searched string.
LIKE uses the index only for S-ARGS, when the wild card is at the end (EmpName LIKE ‘SearchedString%’).

Choose carefully recursive CTEs anchor(s)

Challengers who are not comfortable with CTEs could be interested in CTEs related TSQL Challenges.

When using recursive CTEs, it crucial to get only the needed anchor(s), as Syed did (FROM @Employees WHERE ReportsTo IS NULL).
No need to do comparison here, perfomance issue IS HUGE!!

Do no trust data

As Matthieu told us. Do not rely on data. It’s acceptable to use empName column for path, but you should ensure that there will be no separators issue.

 

Syed, thanks again for sharing your experience and let us learn from you.

If you have questions about this challenge the dedicated forum thread is still open to discuss: http://beyondrelational.com/groups/tsqlchallenge/forum/t/66.aspx

Thanks, see you soon for the next TSQL Challenge!

TSQL Challenge 8 – Solution By Matthieu Hodin

For this TSQL Challenge #8, Matthieu provides us a nice, readable and fast solution.


image

 

 

 

Matthieu Hodin is Project Manager at Wygwam, developper since he  got an HP48GX in hands (who remember?), and web developper since 2002.
He is interested in "data" and "interface".
Activity preferred at job : put MVP's at work and challenge them to find solutions (for customers and not only for these SQL challenges ;p) ) !

Matthieu already enjoyed racing TSQL Challenge 6 with Aurelien Verla .

Here is the solution of Matthieu with his explanations

DECLARE @Employees TABLE (EmpID INT, EmpName VARCHAR(20), ReportsTo INT)
INSERT INTO @Employees(EmpID, EmpName, ReportsTo)
    SELECT 1, 'Jacob', NULL UNION ALL
    SELECT 2, 'Rui', NULL UNION ALL
    SELECT 3, 'Jacobson', NULL UNION ALL
    SELECT 4, 'Jess', 1 UNION ALL
    SELECT 5, 'Steve', 1 UNION ALL
    SELECT 6, 'Bob', 1 UNION ALL
    SELECT 7, 'Smith', 2 UNION ALL
    SELECT 8, 'Bobbey', 2 UNION ALL
    SELECT 9, 'Steffi', 3 UNION ALL
    SELECT 10, 'Bracha', 3 UNION ALL
    SELECT 11, 'John', 5 UNION ALL
    SELECT 12, 'Michael', 6 UNION ALL
    SELECT 13, 'Paul', 6 UNION ALL
    SELECT 14, 'Lana', 7 UNION ALL
    SELECT 15, 'Johnson', 7 UNION ALL
    SELECT 16, 'Mic', 8 UNION ALL
    SELECT 17, 'Stev', 8 UNION ALL
    SELECT 18, 'Paulson', 9 UNION ALL
    SELECT 19, 'Jessica', 10

DECLARE @manager VARCHAR(20)
SELECT @manager = 'Smith';

WITH CTE(Id,Name,p,parentsId,level)
AS(
	select *
		,cast('('+cast(empId as varchar)+')'as varchar)
		,0
	from @employees where ReportsTo is null
	UNION ALL
	select EmpId,EmpName,ReportsTo,cast(parentsId+'('+convert(varchar,empid)+')'as varchar),level+1
	from @employees
	inner join CTE on ReportsTo=Id
	where EmpId<>Id
)

select space((level-(select level from cte where name=@manager))*4)+name
from CTE 
inner join @employees on CharIndex('('+cast(EmpId as varchar)+')',ParentsId)>0
where EmpName=@manager
order by parentsId

 

Matthieu: When I saw this challenge for the first time, it seemed quite easy compared to other ones, but in fact, the challenge was for me the "local level" of the employee.

The first part of the query is a basic CTE. I’ve choose to use the EmpId in order to create the "EmpIdParentPath", as a developper "must not rely on data", it is not suitable (for me) to concat EmpName as the EmpName could contain your separator... and i've never see an int or int32 with brackets in it...

The second reason why i choose to concat EmpId is that the resulting string should be shorter ...and shorter might be faster (? may be...)
I used CAST() & CONVERT() to varchar and not to varchar(max) as in this challenge the parent path may not be longer than 8000 characters.

At this point I have a CTE with all employees, then i joined it on the employee table for each line where the EmpId is in the parentPath.
I choose to use a charindex ... maybe because in javascript i use str.indexOf('x') and it's quite similar !
In order to finish the first part of the query i've simply to put a WHERE with the parameter.

The last part is to re-create the level. As each line in the CTE has the local level info, i've only to substract the level info from the manager ( i'm not proud to reuse the CTE...), multiply it by 4, then SPACE this number.

I'm sure now that ms must have create this SPACE function only to solve this SQL Challenge as i rarely used it !

What can we learn from this ?

Obviously, some web developers are die hard fans of SQL Server. Maybe because they like lightning fast webpages.
It seems that coding with JavaScript helps to write TSQL code. In any case, it does not hurt, and it's quite good to be open minded :-)

Do not rely on data

exploits_of_a_mom

Matthieu used EmpId to avoid any possible separator related issue with EmpName.
What would have happened if Matthieu had chosen to use the EmpName, with “|” as separator, rather than the EmpId,  and if “Jacobson” was “Jacob|son” ?

Small is beautiful

Matthieu has assumed that the path length is a maximum of 8000 characters. Sounds reasonable.
Let’s do some experimentations with an employee table populated with 3000 lines, a primary key and clustered index on EmpId and an index on EmpName, using varchar and varchar(max).
The query returned 2983 lines.

  CPU Reads Writes Duration
VARCHAR 21013 146652   24585
VARCHAR(MAX) 28704 1229002 9424 33667

CHARINDEX()  or LIKE ?

Matthieu used CHARINDEX, not LIKE. It seems that  CHARINDEX() is more efficient than LIKE in this case: separators are before and after the searched string.
LIKE uses the index only for S-ARGS, when the wild card is at the end (EmpName LIKE ‘SearchedString%’).

 

Matthieu, thanks again for sharing your experience and for your great explanations. Congratulations for being part of winners!

If you have questions about this challenge the dedicated forum thread is still open to discuss: http://beyondrelational.com/groups/tsqlchallenge/forum/t/66.aspx

Have fun!