Yesterday, we saw the solution for TSQL Challenge #8 by Matthieu Hodin. Another winner of TSQL Challenge 8 is Syed Mehroz Alam.
First of all, I would like to congratulate Syed. Yes Syed, as you said: it’s a pleasure to learn that the hard work has paid off!
Syed already won the TSQL Challenge 7. He is supporting our community at his blog.
But let’s present Syed.
Syed Mehroz Alam, living in Karachi, Pakistan, is primarily a developer focusing on Microsoft technologies. He has completed his bachelors as a Computer Systems Engineer in 2006 and is currently pursuing a Masters degree in Computer Science. Despite developing rich Internet applications, he loves to work with SQL Business Intelligence platform that enhances his TSQL expertise. He is fond of logical challenges and has won several speed programming competitions which are listed here.
Mehroz writes articles at CodeProject, and expresses his experiences with .NET and SQL server at his blog. When he has time, he contributes to MSDN and Silverlight forums. He loves to play football (Soccer) and computer games, especially first person shooters and RPGs.
How did Syed manage this TSQL Challenge?
This is the story of this challenge solving, told by Syed.
The big challenge was the condition that the CTE should not contain any filter for a specific manager.
Here is some code for populating test data.
--Populate test data
DECLARE @Employees TABLE (EmpID INT, EmpName VARCHAR(20), ReportsTo INT)
INSERT INTO @Employees(EmpID, EmpName, ReportsTo)
SELECT 1, 'Jacob', NULL UNION ALL
SELECT 2, 'Rui', NULL UNION ALL
SELECT 3, 'Jacobson', NULL UNION ALL
SELECT 4, 'Jess', 1 UNION ALL
SELECT 5, 'Steve', 1 UNION ALL
SELECT 6, 'Bob', 1 UNION ALL
SELECT 7, 'Smith', 2 UNION ALL
SELECT 8, 'Bobbey', 2 UNION ALL
SELECT 9, 'Steffi', 3 UNION ALL
SELECT 10, 'Bracha', 3 UNION ALL
SELECT 11, 'John', 5 UNION ALL
SELECT 12, 'Michael', 6 UNION ALL
SELECT 13, 'Paul', 6 UNION ALL
SELECT 14, 'Lana', 7 UNION ALL
SELECT 15, 'Johnson', 7 UNION ALL
SELECT 16, 'Mic', 8 UNION ALL
SELECT 17, 'Stev', 8 UNION ALL
SELECT 18, 'Paulson', 9 UNION ALL
SELECT 19, 'Jessica', 10
I started by creating a recursive CTE for the whole table like this:
;With Hierarchy(EmpName, EmpID, Level, FullyQualifiedName)
As
(
Select E.EmpName, E.EmpID, 0, Cast('.'+E.EmpName+'.' as Varchar(MAX))
From @Employees E
Where E.ReportsTo is null
Union all
Select E.EmpName, E.EmpID, H.Level+1, H.FullyQualifiedName+'.'+E.EmpName+'.'
from @Employees E
inner join Hierarchy H on H.EmpID=E.ReportsTo
)
Select Space(Level*4) + H.EmpName
from Hierarchy H
order by H.FullyQualifiedName
Notice that I am constructing a FullyQualifiedName value for each row. This value consists of full path from root to the current person. This approach helped me in filtering the records in the final select statement where I just needed to look for a '.' + ManagerName + '.' filter. Here is a select query that extracts all subordinates for a particular manager.
Select Space(Level*4) + H.EmpName
from Hierarchy H
where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0
order by H.FullyQualifiedName
The only issue that remains is that when the query is run for a non-parent employee, we get extra spaces in the beginning. E.g. when we run the query for “Paul”, we get an output like:
I solved it by inserted a sub-query in the space() expression like this:
Select Space((Level-(Select Top(1) Level from Hierarchy H2 Where H2.EmpName=@manager))*4) + EmpName
from Hierarchy H
where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0
order by H.FullyQualifiedName
Instead of two subqueries, I tried to use a join but the performance using sub-queries was better so I submitted the subquery solution. Here’s the join version I am referring, it looks very nice and clean:
Select Space((H.Level-H2.Level)*4) + H.EmpName
from Hierarchy H
inner join Hierarchy H2 on CHARINDEX('.'+H2.EmpName+'.', H.FullyQualifiedName) > 0
where H2.EmpName=@manager
order by H.FullyQualifiedName
In the last, here is the complete solution:
--Populate test data
DECLARE @Employees TABLE (EmpID INT, EmpName VARCHAR(20), ReportsTo INT)
INSERT INTO @Employees(EmpID, EmpName, ReportsTo)
SELECT 1, 'Jacob', NULL UNION ALL
SELECT 2, 'Rui', NULL UNION ALL
SELECT 3, 'Jacobson', NULL UNION ALL
SELECT 4, 'Jess', 1 UNION ALL
SELECT 5, 'Steve', 1 UNION ALL
SELECT 6, 'Bob', 1 UNION ALL
SELECT 7, 'Smith', 2 UNION ALL
SELECT 8, 'Bobbey', 2 UNION ALL
SELECT 9, 'Steffi', 3 UNION ALL
SELECT 10, 'Bracha', 3 UNION ALL
SELECT 11, 'John', 5 UNION ALL
SELECT 12, 'Michael', 6 UNION ALL
SELECT 13, 'Paul', 6 UNION ALL
SELECT 14, 'Lana', 7 UNION ALL
SELECT 15, 'Johnson', 7 UNION ALL
SELECT 16, 'Mic', 8 UNION ALL
SELECT 17, 'Stev', 8 UNION ALL
SELECT 18, 'Paulson', 9 UNION ALL
SELECT 19, 'Jessica', 10
--Solution starts here
DECLARE @manager VARCHAR(20)
SELECT @manager = 'Jacob'
--CTE
;With Hierarchy(EmpName, EmpID, Level, FullyQualifiedName)
As
(
Select E.EmpName, E.EmpID, 0, Cast('.'+E.EmpName+'.' as Varchar(MAX))
From @Employees E
Where E.ReportsTo is null
Union all
Select E.EmpName, E.EmpID, H.Level+1, H.FullyQualifiedName+'.'+E.EmpName+'.'
from @Employees E
inner join Hierarchy H on H.EmpID=E.ReportsTo
)
--Result
Select Space((Level-(Select Top(1) Level from Hierarchy H2 Where H2.EmpName=@manager))*4) + EmpName
from Hierarchy H
where CHARINDEX('.'+(Select Top(1) E.EmpName from @Employees E Where E.EmpName=@manager)+'.', H.FullyQualifiedName) > 0
order by H.FullyQualifiedName
You can read Syed’s solution here at his blog.
Syed already pointed out two key points of this challenge:
- The CTE itself and how to build the path, that Syed nicely named FullyQualifiedName, perhaps in reference to networks FQDN :-)
- Do no forget to remove extra spaces in the beginning.
So, what’s to learn from this solution ?
For this TSQL Challenge, we required a CTE, resulting in very similar solutions. Challengers had to find a way to make their solutions shaped.
We all like the words readibility, performance, scalability, understandable. All solutions were readable and understandable, thanks to all challengers!
Scalability is out in this cas. So let’s optimize!
Optimization
Syed already explained that he was not happy with the JOIN for removing extra spaces in the beginning.
Here are some explanations of the evaluation process and a comparison of two methods.
As we did not either provide a DDL script to create the table, neither populating data, we did 3 evaluations:
- First, with some test data using the variable table that was provided with this challenge.
- Secondly, with an employee table populated with 1000 lines, a primary key and clustered index on EmpId, with and without an index on EmpName.
- Lastly with an employee table populated with 3000 lines, a primary key and clustered index on EmpId, with and without an index on EmpName.
Here is a comparison of Syed’s solution, between JOIN and subqueries.
| |
CPU |
Reads |
Writes |
Duration |
| JOIN |
21294 |
808381 |
11158 |
26749 |
| SubQueries |
17454 |
813791 |
11169 |
23871 |
More I/O, but faster.
Also, Syed did not mention it, but he chose to use CHARINDEX() instead of LIKE.
It seems that CHARINDEX() is more efficient than LIKE in this case: separators are before and after the searched string.
LIKE uses the index only for S-ARGS, when the wild card is at the end (EmpName LIKE ‘SearchedString%’).
Choose carefully recursive CTEs anchor(s)
Challengers who are not comfortable with CTEs could be interested in CTEs related TSQL Challenges.
When using recursive CTEs, it crucial to get only the needed anchor(s), as Syed did (FROM @Employees WHERE ReportsTo IS NULL).
No need to do comparison here, perfomance issue IS HUGE!!
Do no trust data
As Matthieu told us. Do not rely on data. It’s acceptable to use empName column for path, but you should ensure that there will be no separators issue.
Syed, thanks again for sharing your experience and let us learn from you.
If you have questions about this challenge the dedicated forum thread is still open to discuss: http://beyondrelational.com/groups/tsqlchallenge/forum/t/66.aspx
Thanks, see you soon for the next TSQL Challenge!