TSQL Challenges

TSQL Challenges intend to help you to test and enhance SET based querying skills using TSQL.





TSQL Challenge 8 – Solution by Leonid Koyfman

Yesterday, we saw the solution from Syed Mehroz Alam and discovered how Syed used subqueries. Another winner of TSQL Challenge 8 is Leonid Koyfman.
First of all, I would like to congratulate Leonid who is a faifthfull challenger and a many times winner.
Leonid not only writes regularly excellent solutions, but also sends to us some great new ideas for TSQL Challenges learning community.
Congratulations Leonid, for being an outstanding challenger and thanks for your support!


Leonid is a SQL Server Expert, he runs a blog to share technologies and techniques.

How did Leonid puzzle out this challenge ?

Here is the story, as narrated by Leonid:

Leonid: From the given data sample I made an assumption that there is no employees with the same name there. (It would be more appropriate to have an EmpID as a parameter.)

In the beginning I used recursive CTE to make a list of employees with their corresponding lists of parents and hierarchy level indicators.
In anchor part of CTE I started with managers

WHERE ReportsTo is null 

and in recursive part pulled the rest of employees.

To build a list of Employee parents I concatenated EmpName using '|' as a delimiter.

Using Empid for this purpose (as Mattthieu Hodin did in his solution) would be safer.
I optimistically assumed that there are no last names with "|" in them :-)

Adding delimiter in the beginning and at the end of the list makes furhter searches against it a bit simplier.

Otherwise, instead of

WHERE EmpNameList like'%|'+@manager+'|%'

we would need to use

WHERE '|'+EmpNameList+'|' like'%|'+@manager+'|%'.

I didn't want to use a subquery in each employee row to calculate @manager level: it's the same value for each row.

To calculate it just once and then apply to each row I used instead

   CROSS JOIN

            (     SELECT      [level]

                  FROM        org_chart

                  WHERE       EmpName =@manager

            ) manager

Calculating identation to format output I used

(oc.[level]-manager.[level])

to always display hierarchy left aligned, disregarding to the subordination level of the @manager in the organization chart.

And then I just repeated space character and concatenated it with EmpName in the output:

Hierarchy=REPLICATE(char(32),Ident)+EmpName

I prefer REPLICATE(char(32),Ident) over  SPACE(Ident) because REPLICATE function is more generic.

It can repeat not just space but any character(s).


Thanks Leonid for this great explanation. 
Here is Leonid’s complete solution:

DECLARE @Employees TABLE (
	EmpID INT, 
	EmpName VARCHAR(20), 
	ReportsTo INT
)
INSERT INTO @Employees(
	EmpID, 
	EmpName, 
	ReportsTo
)
   
SELECT 1, 'Jacob'	,	NULL UNION ALL    
SELECT 2, 'Rui'		,	NULL UNION ALL    
SELECT 3, 'Jacobson',	NULL UNION ALL    
SELECT 4, 'Jess'	,	1	 UNION ALL    
SELECT 5, 'Steve'	,	1	 UNION ALL    
SELECT 6, 'Bob'		,	1	 UNION ALL    
SELECT 7, 'Smith'	,	2	 UNION ALL    
SELECT 8, 'Bobbey'	,	2	 UNION ALL    
SELECT 9, 'Steffi'	,	3	 UNION ALL    
SELECT 10,'Bracha'	,	3	 UNION ALL    
SELECT 11,'John'	,	5	 UNION ALL    
SELECT 12,'Michael'	,	6	 UNION ALL    
SELECT 13,'Paul'	,	6	 UNION ALL    
SELECT 14,'Lana'	,	7	 UNION ALL    
SELECT 15,'Johnson'	,	7	 UNION ALL    
SELECT 16,'Mic'		,	8	 UNION ALL    
SELECT 17,'Stev'	,	8	 UNION ALL    
SELECT 18,'Paulson'	,	9	 UNION ALL    
SELECT 19,'Jessica'	,	10

DECLARE @manager varchar(20)
SET @manager='Jacob'

;WITH org_chart 
AS ( 
	SELECT 
		Empid,
		EmpName,
		EmpNameList=cast('|'+EmpName+'|' as varchar(max)) ,
		ReportsTo ,
		[level]=0 
	FROM @Employees 
	WHERE ReportsTo is null
UNION ALL 
	SELECT  
		e.Empid,
		e.EmpName,
		EmpNameList=CAST(EmpNameList+e.EmpName+'|'as varchar(max)) ,
		e.ReportsTo , 
		[level]=[level]+1 
	FROM @Employees e join org_chart oc on e.ReportsTo=oc.Empid
)
SELECT 
	Hierarchy=REPLICATE(char(32),Ident)+EmpName
FROM(
	SELECT 
		Ident=(oc.[level]-manager.[level])*4,
		EmpName,
		EmpNameList
	FROM 
		org_chart oc
		CROSS JOIN 
		(	SELECT	[level] 
			FROM	org_chart 
			WHERE	EmpName =@manager
		)manager
	WHERE EmpNameList like'%|'+@manager+'|%'
)x
ORDER BY EmpNameList


This is the last post of the “winners of TSQL Challenge 8” serie.
So, let’s try to piece together what we have learn so far, and see if there is more to find out from Leonid’s solution.

What can we learn from this TSQL Challenge ?

We learned that Common Table Expressions, or CTEs, introduced in SQL Server 2005,  are great for solving many kind of problems.
CTEs are readable and understandable.
CTEs, and recursive CTE's are great too for SET based querying instead of RBAR (“Row By Agonizing Row”).
It fits well with TSQL Challenges goal: help you to test and enhance SET based querying skills using TSQL.

We’ve already pointed out some possible issues: 

And discussed about using CHARINDEX() or LIKE (see Matthieu Hodin’s solution)

Leonid is talking about:

JOIN to calculate manager’s level.

  • Matthieu joins the CTE and the employee table: simple and direct way. That would be the one i’ll use in production.
  • Syed preferred to use subqueries over the employee table: nice trick for small resultsets :)
  • Leonid did a self join over the CTE: elegant, as usual :-)
    Note thant using an employee table of 10000 rows, a resultset of 9983 lines,
    using INNER JOIN instead of CROSS JOIN was twice faster on my machine!

REPLICATE() or SPACE() ?

As Leonid said, REPLICATE() is more generic than SPACE(). It’s seems logical that a simpler fonction could outperforms a more complex one.
Note that SPACE() cannot handle more than 8000 characters. One point to Leonid for scalability!  

So, let’s do some trial learning.

For this solution, there is no performance difference between SPACE() and REPLICATE() using an employee table populated with 3000 lines, a primary key and clustered index on EmpId.

Using a table t_1 populated with 1M rows containing just an ID int column and a clustered index.

SELECT REPLICATE(CHAR(32),1000) FROM t_1m 
SELECT SPACE(1000) FROM t_1m

Average of two measures CPU Reads Writes Duration
REPLICATE() 7400 5938   101965
SPACE() 6800 5938   90472

Due to measures imprecision, it seems similar.

 

Now, trying to do some dynamic allocations.

SELECT REPLICATE(CHAR(32),id % 1000) FROM t_1m 
SELECT SPACE(id % 1000) FROM t_1m

Average of two measures CPU Reads Writes Duration
REPLICATE() 11709 5938   72220
SPACE() 4877 5938   72311

Durations are the same, but SPACE() CPU is near from 40% of REPLICATE() one.
No conclusion to draw because of measures imprecision and because the test is perhaps not reliable.
Does somebody have more information about this ?

Leonid, thanks again for sharing your experience and let us learn from you.
I hope that things learned here will be useful for solving real world problems.

If you have questions about this challenge the dedicated forum thread is still open to discuss: http://beyondrelational.com/groups/tsqlchallenge/forum/t/66.aspx

Thank you, see you soon for the next TSQL Challenge!