TSQL Challenge 8 – Solution by Leonid Koyfman
Yesterday, we saw the solution from Syed Mehroz Alam and discovered how Syed used subqueries. Another winner of TSQL Challenge 8 is Leonid Koyfman.
First of all, I would like to congratulate Leonid who is a faifthfull challenger and a many times winner.
Leonid not only writes regularly excellent solutions, but also sends to us some great new ideas for TSQL Challenges learning community.
Congratulations Leonid, for being an outstanding challenger and thanks for your support!
Leonid is a SQL Server Expert, he runs a blog to share technologies and techniques.
How did Leonid puzzle out this challenge ?
Here is the story, as narrated by Leonid:
Leonid: From the given data sample I made an assumption that there is no employees with the same name there. (It would be more appropriate to have an EmpID as a parameter.)
In the beginning I used recursive CTE to make a list of employees with their corresponding lists of parents and hierarchy level indicators.
In anchor part of CTE I started with managers
WHERE ReportsTo is null
and in recursive part pulled the rest of employees.
To build a list of Employee parents I concatenated EmpName using '|' as a delimiter.
Using Empid for this purpose (as Mattthieu Hodin did in his solution) would be safer.
I optimistically assumed that there are no last names with "|" in them :-)
Adding delimiter in the beginning and at the end of the list makes furhter searches against it a bit simplier.
Otherwise, instead of
WHERE EmpNameList like'%|'+@manager+'|%'
we would need to use
WHERE '|'+EmpNameList+'|' like'%|'+@manager+'|%'.
I didn't want to use a subquery in each employee row to calculate @manager level: it's the same value for each row.
To calculate it just once and then apply to each row I used instead
CROSS JOIN
( SELECT [level]
FROM org_chart
WHERE EmpName =@manager
) manager
Calculating identation to format output I used
(oc.[level]-manager.[level])
to always display hierarchy left aligned, disregarding to the subordination level of the @manager in the organization chart.
And then I just repeated space character and concatenated it with EmpName in the output:
Hierarchy=REPLICATE(char(32),Ident)+EmpName
I prefer REPLICATE(char(32),Ident) over SPACE(Ident) because REPLICATE function is more generic.
It can repeat not just space but any character(s).
Thanks Leonid for this great explanation.
Here is Leonid’s complete solution:
DECLARE @Employees TABLE (
EmpID INT,
EmpName VARCHAR(20),
ReportsTo INT
)
INSERT INTO @Employees(
EmpID,
EmpName,
ReportsTo
)
SELECT 1, 'Jacob' , NULL UNION ALL
SELECT 2, 'Rui' , NULL UNION ALL
SELECT 3, 'Jacobson', NULL UNION ALL
SELECT 4, 'Jess' , 1 UNION ALL
SELECT 5, 'Steve' , 1 UNION ALL
SELECT 6, 'Bob' , 1 UNION ALL
SELECT 7, 'Smith' , 2 UNION ALL
SELECT 8, 'Bobbey' , 2 UNION ALL
SELECT 9, 'Steffi' , 3 UNION ALL
SELECT 10,'Bracha' , 3 UNION ALL
SELECT 11,'John' , 5 UNION ALL
SELECT 12,'Michael' , 6 UNION ALL
SELECT 13,'Paul' , 6 UNION ALL
SELECT 14,'Lana' , 7 UNION ALL
SELECT 15,'Johnson' , 7 UNION ALL
SELECT 16,'Mic' , 8 UNION ALL
SELECT 17,'Stev' , 8 UNION ALL
SELECT 18,'Paulson' , 9 UNION ALL
SELECT 19,'Jessica' , 10
DECLARE @manager varchar(20)
SET @manager='Jacob'
;WITH org_chart
AS (
SELECT
Empid,
EmpName,
EmpNameList=cast('|'+EmpName+'|' as varchar(max)) ,
ReportsTo ,
[level]=0
FROM @Employees
WHERE ReportsTo is null
UNION ALL
SELECT
e.Empid,
e.EmpName,
EmpNameList=CAST(EmpNameList+e.EmpName+'|'as varchar(max)) ,
e.ReportsTo ,
[level]=[level]+1
FROM @Employees e join org_chart oc on e.ReportsTo=oc.Empid
)
SELECT
Hierarchy=REPLICATE(char(32),Ident)+EmpName
FROM(
SELECT
Ident=(oc.[level]-manager.[level])*4,
EmpName,
EmpNameList
FROM
org_chart oc
CROSS JOIN
( SELECT [level]
FROM org_chart
WHERE EmpName =@manager
)manager
WHERE EmpNameList like'%|'+@manager+'|%'
)x
ORDER BY EmpNameList
This is the last post of the “winners of TSQL Challenge 8” serie.
So, let’s try to piece together what we have learn so far, and see if there is more to find out from Leonid’s solution.
What can we learn from this TSQL Challenge ?
We learned that Common Table Expressions, or CTEs, introduced in SQL Server 2005, are great for solving many kind of problems.
CTEs are readable and understandable.
CTEs, and recursive CTE's are great too for SET based querying instead of RBAR (“Row By Agonizing Row”).
It fits well with TSQL Challenges goal: help you to test and enhance SET based querying skills using TSQL.
We’ve already pointed out some possible issues:
And discussed about using CHARINDEX() or LIKE (see Matthieu Hodin’s solution)
Leonid is talking about:
JOIN to calculate manager’s level.
- Matthieu joins the CTE and the employee table: simple and direct way. That would be the one i’ll use in production.
- Syed preferred to use subqueries over the employee table: nice trick for small resultsets :)
- Leonid did a self join over the CTE: elegant, as usual :-)
Note thant using an employee table of 10000 rows, a resultset of 9983 lines,
using INNER JOIN instead of CROSS JOIN was twice faster on my machine!
REPLICATE() or SPACE() ?
As Leonid said, REPLICATE() is more generic than SPACE(). It’s seems logical that a simpler fonction could outperforms a more complex one.
Note that SPACE() cannot handle more than 8000 characters. One point to Leonid for scalability!
So, let’s do some trial learning.
For this solution, there is no performance difference between SPACE() and REPLICATE() using an employee table populated with 3000 lines, a primary key and clustered index on EmpId.
Using a table t_1 populated with 1M rows containing just an ID int column and a clustered index.
SELECT REPLICATE(CHAR(32),1000) FROM t_1m
SELECT SPACE(1000) FROM t_1m
| Average of two measures |
CPU |
Reads |
Writes |
Duration |
| REPLICATE() |
7400 |
5938 |
|
101965 |
| SPACE() |
6800 |
5938 |
|
90472 |
Due to measures imprecision, it seems similar.
Now, trying to do some dynamic allocations.
SELECT REPLICATE(CHAR(32),id % 1000) FROM t_1m
SELECT SPACE(id % 1000) FROM t_1m
| Average of two measures |
CPU |
Reads |
Writes |
Duration |
| REPLICATE() |
11709 |
5938 |
|
72220 |
| SPACE() |
4877 |
5938 |
|
72311 |
Durations are the same, but SPACE() CPU is near from 40% of REPLICATE() one.
No conclusion to draw because of measures imprecision and because the test is perhaps not reliable.
Does somebody have more information about this ?
Leonid, thanks again for sharing your experience and let us learn from you.
I hope that things learned here will be useful for solving real world problems.
If you have questions about this challenge the dedicated forum thread is still open to discuss: http://beyondrelational.com/groups/tsqlchallenge/forum/t/66.aspx
Thank you, see you soon for the next TSQL Challenge!