beyondrelational.com



May 2009 - Posts

TSQL Challenge 6 Winners

Again there were a lot of participants for this challenge, and a lot of different solutions.

Regarding all these solutions this is the short list of the winners we choose for this challenge:

  • Arnaud Lhopiteau

    Arnaud Lhopiteau is a MCSD .NET & Senior developer at Alltitude.com, his father introduced him to database systems with R:Base in late 80’s .
    He loves to take care of his family, read Joe Celko’s book, smoke pipe and drink tea. Currently he uses SQL Server 2008 in his day to day work.
    Arnaud's solution is a nice GROUP BY WITH ROLLUP one.

 

  • Aurelien already closed the initial winners of the TSQL Challenge 5. He's solution for this challenge is a nice GROUP BY WITH ROLLUP one too.  

    Aurelien Verla

    Aurelien Verla is Online Strategies Team Manager at Wygwam. Web Developer since 10 years, his first approach with databases systems was Access !

    MVP around ASP.NET Platform, he also spend a lot of time around SQL Server for one reason: If the UI is slow, first look at the database ! You can find it's english blog here and the french one here.

  • Leonid Koyfman
     
    Again for this challenge, Leonid Koyfman provides us an elegant solution. In fact, Leonid sent two solutions: one using GROUP BY WITH ROLLUP like Aurelien's and Arnaud's one, and the other using one UNION for each level.
    There will be for sure some discussions about performance issues ! 
    Leonid is a ‘many times winner’ of the challenge and SQL Server Expert.

 

  • Cleber Augusto Martins

    Cleber is a new challenger. He provides us a nice and readable solution using UNION and CROSS JOIN.
    Cleber Martins works for ACI Worldwide as a Sr. Business Solutions Analyst for the Risk products. Cleber is  based in Brazil and has 8 years of experience supporting financial institutions to prevent fraud and money laundering activities with neural models, rules based technology and consulting services.
    Cleber's experience with SQL Server is very successful in transactional warehouses handling 1/2Billion transactions a month on terabyte databases where peaks can go over 4MM transactions per hour.

    Cleber said about TSQL Challenges:
    "I've been on engagements where customers want to extract very complex and detailed information from our product's databases, so once I'm not a SQL professional then I'm always researching the internet for clues. Now that I found beyondrelational.com and the challenges then I'm able to grow my knowledge on weekly basis and do my job better everyday."

 

We will publish in the few next days explanations for each solution.

The key point of this challenge was to show how it is possible to produce a full report inside the database without any other reporting facilities. I am not saying that Reporting Services or any other reporting solution is not good, I just say that most of the times this reporting computation should be done simply directly from the database in Tsql. What are the benefits of this solution? In a few words we can say:

  • all is done in the database, you should hope better performance ;-)
  • decrease unneeded network traffic
  • just get what you need

The idea is that you should have better performance if you avoid retrieving a lot of data to analyze in your application (that’s quite obvious isn’t it?). But all is not ideal, because with that you will have other troubles:

  • you will introduce domain logic in your database (and you should probably don’t want doing this if you work with an ORM)
  • You will have to write all in Tsql (as you do it by the hand, there is no drag and drop reporting facilities ;-).
  • It should be harder to write, understand and maintain (one big query is always harder than a lot of smaller queries).

All of that just to says that the best choice for doing reporting depends on your needs ;-). I just want with this challenge to present an old-fashioned way of doing reporting that is still a good option today!

The problem in this challenge is that we have to produce a complete report table with rows of data (the original rows of data) but also introduce some consolidation rows.

There are different ways to handle the problem in pure TSql. The options are:

  • Iterate through a CTE on each consolidation level
  • Suppose that you have a predefined number of levels and create one query for each consolidation level that you join with unions
  • Use the old ROLLUP functionality of TSQL (which may be deprecated in future versions but still useful right now)

I think we have already talked a lot about CTE in this challenges, and that’s why I present a solution based on the ROLLUP grouping.

So, for newcomers what is that? The grouping with ROLLUP function is a TSql option of the GROUP BY that will add automatically a consolidation row for each group in a hierarchical way. Let’s see a short example:

DECLARE @T TABLE (ID INT,Label CHAR(4),Tag CHAR(4))
INSERT INTO @T (ID,Label,Tag) 
SELECT 1,'A','SQL' UNION SELECT 2,'B','XML' 
UNION SELECT 3,'C','SQL' UNION SELECT 4,'B','SQL' 
UNION SELECT 5,'A','XML' UNION SELECT 8,'C','SQL' 
UNION SELECT 7,'B','SQL' UNION SELECT 6,'A','XML'
 
SELECT    Tag,Label,COUNT(ID) AS ItemsCount,
        GROUPING(Tag)+GROUPING(Label) As Level
FROM @T GROUP BY Tag,Label WITH ROLLUP

Which will produce the following table:

Tag    Label    ItemsCount   Level
SQL     A       1            0
SQL     B       2            0
SQL     C       2            0
SQL     NULL    5            1
XML     A       2            0
XML     B       1            0
XML     NULL    3            1
NULL    NULL    8            2

We can see that for each consolidation level of the hierarchy a row is added, a null is set for group by fields that are under the current level and that the aggregates fields are automatically computed for this level.

One other interesting feature is the GROUPING function.  The GROUPING function returns 0 if the column value came from the fact data, and 1 if the column value is a NULL generated by the aggregation. Adding a grouping for each group by column we can create the hierarchical levels.

We can tune our original example query to have better label information using this method:

SELECT  Tag = CASE WHEN GROUPING(Tag) = 1 THEN 'ALL' ELSE Tag END,
        Label= 
        CASE GROUPING(Tag)+GROUPING(Label)
            WHEN 1 THEN 'Total ' + Tag
            WHEN 2 THEN 'Total ALL TAGS'
            ELSE Label END,
        COUNT(ID) AS ItemsCount,
        GROUPING(Tag)+GROUPING(Label) As Level
FROM @T GROUP BY Tag,Label WITH ROLLUP

 

Which produce:

Tag  Label          ItemsCount  Level
---- -------------- ----------- -----
SQL  A              1           0
SQL  B              2           0
SQL  C              2           0
SQL  Total SQL      5           1
XML  A              2           0
XML  B              1           0
XML  Total XML      3           1
ALL  Total ALL TAGS 8           2

Coming back  to the challenge, there is one solution using this feature. It is not the shortest or the best, but is relatively readable and (I hope) maintainable:

SELECT [PERIOD_ID]= 
        CASE [LEVEL]
        WHEN 0 THEN CAST([YEAR] AS NVARCHAR)+ CAST([WEEK] AS NVARCHAR) 
             + CAST([WEEKDAY]    AS NVARCHAR)        
        WHEN 1 THEN CAST([YEAR] AS NVARCHAR)+ CAST([WEEK] AS NVARCHAR) + 'T'
        WHEN 2 THEN CAST([YEAR] AS NVARCHAR)+ 'T'
        END    
        ,[LEVEL]
        ,[PERIOD] =
            CASE [LEVEL]
                WHEN 0 THEN '   ' + [WEEKDAYNAME] + ' ' + [MONTHDAY]                
                WHEN 1 THEN 'TOTAL WEEK ' + CAST([WEEK] AS NVARCHAR)
                WHEN 2 THEN 'TOTAL YEAR ' + CAST([YEAR] AS NVARCHAR)
            END
        ,[HOME] = SUM(CASE WHEN Page = 'Home' THEN TOTAL ELSE 0 END)
        ,[CONTACT] = SUM(CASE WHEN Page = 'Contact' THEN TOTAL ELSE 0 END)
        ,[PRODUCTS] = SUM(CASE WHEN Page = 'Products' THEN TOTAL ELSE 0 END)
FROM (
SELECT    Page
        ,[YEAR] = DATEPART(YEAR,VisitDate)
        ,[WEEK] = DATEPART(WEEK,VisitDate)
        ,[WEEKDAY] = DATEPART(WEEKDAY,VisitDate)
        ,[WEEKDAYNAME] = LEFT(UPPER(DATENAME(WEEKDAY,VisitDate)),3)
        ,[MONTHDAY] = CAST(DATEPART(DAY,VisitDate) AS NVARCHAR) + '/' 
                    + CAST(DATEPART(MONTH,VisitDate) AS NVARCHAR)
        ,[TOTAL] = SUM(NbVisitors)
        ,[LEVEL] = GROUPING(DATEPART(WEEKDAY,VisitDate)) 
                 + GROUPING(DATEPART(YEAR,VisitDate)) 
                 + GROUPING(DATEPART(WEEK,VisitDate)) 
                 + GROUPING(CAST(DATEPART(DAY,VisitDate) AS NVARCHAR) 
                 + '/' + CAST(DATEPART(MONTH,VisitDate) AS NVARCHAR))
FROM @T
GROUP BY Page
        ,DATEPART(YEAR,VisitDate)
        ,DATEPART(WEEK,VisitDate)
        ,DATEPART(WEEKDAY,VisitDate)
        ,DATENAME(WEEKDAY,VisitDate)
        ,CAST(DATEPART(DAY,VisitDate) AS NVARCHAR) + '/' 
         + CAST(DATEPART(MONTH,VisitDate) AS NVARCHAR)
WITH ROLLUP
) T
WHERE 1=1
    AND [LEVEL] < 3
GROUP BY 
        CASE [LEVEL] --> by Period Id
        WHEN 0 THEN CAST([YEAR] AS NVARCHAR)+ CAST([WEEK] AS NVARCHAR) 
             + CAST([WEEKDAY]    AS NVARCHAR)        
        WHEN 1 THEN CAST([YEAR] AS NVARCHAR)+ CAST([WEEK] AS NVARCHAR) + 'T'
        WHEN 2 THEN CAST([YEAR] AS NVARCHAR)+ 'T'
        END,     
        CASE [LEVEL] --> by period
        WHEN 0 THEN '   ' + [WEEKDAYNAME] + ' ' + [MONTHDAY]                
        WHEN 1 THEN 'TOTAL WEEK ' + CAST([WEEK] AS NVARCHAR)
        WHEN 2 THEN 'TOTAL YEAR ' + CAST([YEAR] AS NVARCHAR)
        END
        ,[LEVEL]
ORDER BY [PERIOD_ID]

If you have questions or comments about this solution, discuss it on the dedicated forum: http://beyondrelational.com/groups/tsqlchallenge/forum/t/125.aspx

If you have generic questions or comments about the TSQL Challenge 6 you can go to this forum: http://beyondrelational.com/forums/t/37.aspx

Stay tunned and don't forget to check for TSQL Challenge 9


TSQL Challenge 5 - Solution by Aurelien Verla

Aurelien is a new challenger that closes the initial winners of the TSQL Challenge 5.

Aurelien Verla

Aurelien Verla is Online Strategies Team Manager at Wygwam. Web Developer since 10 years, his first approach with databases systems was Access !

MVP around ASP.NET Platform, he also spend a lot of time around SQL Server for one reason: If the UI is slow, first look at the database ! You can find it's english blog here and the french one here.

Here is his solution:

;WITH [FILTER_IDX] ([id], [data], [idx], [start], [stop]) AS (
      SELECT [id], [data], 1, 1, CHARINDEX(',', [data])
      FROM @filter
      UNION ALL
      SELECT [id], [data], [idx] + 1, [stop] + 1, 
            CHARINDEX(',', [data], [stop] + 1)
      FROM [FILTER_IDX]
      WHERE [stop] > 0
), [BLOG_IDX] ([id], [tags], [idx], [start], [stop]) AS (
      SELECT [id], [tags], 1, 1, 
            CHARINDEX(',', [tags])
      FROM @blog
      UNION ALL
      SELECT [id], [tags], [idx] + 1, [stop] + 1, 
            CHARINDEX(',', [tags], [stop] + 1)
      FROM [BLOG_IDX]
      WHERE [stop] > 0
), [FILTER_SPLIT] ([id], [data], [value]) AS (
    SELECT [id], [data], SUBSTRING([data], [start], 
        CASE WHEN [stop] > 0 
            THEN [stop] - [start] 
            ELSE LEN([data]) END)
    FROM [FILTER_IDX]
), [BLOG_SPLIT] ([id], [tags], [value]) AS (
    SELECT [id], [tags], SUBSTRING([tags], [start], 
        CASE WHEN [stop] > 0 
        THEN [stop] - [start] 
        ELSE LEN([tags]) END)
    FROM [BLOG_IDX]
)
SELECT T1.[id], T1.[data], T3.[name], T2.[tags], COUNT(*) AS 'RELEVANCE'
FROM [FILTER_SPLIT] AS T1
LEFT JOIN [BLOG_SPLIT] AS T2 ON T1.[value] = T2.[value]
INNER JOIN @blog AS T3 ON T3.[id] = T2.[id]
GROUP BY T1.[id], T1.[data], T3.[name], T2.[tags]
ORDER BY T1.[id], [RELEVANCE] DESC

Here is his comments about the solution:

  • My first goal was to make a clear and “read-able” query. That’s why I used CTE to split the query into logical parts: separator positions, split and then the join for relevance.
  • Probably not the best performance here, but a nice compromise between being sure that  someone else could understand it and performance.
In fact that this is also a key point that I think is really important in Sql, the readability! Most of the times, the Tsql code do its job, but if one wants to modify it after some time, it always needs to re-write it.
 
If you have questions or comments about this solution, discuss it on the dedicated forum: http://beyondrelational.com/forums/t/69.aspx

If you have generic questions or comments about the TSQL Challenge 5 you can go to this forum: http://beyondrelational.com/forums/t/18.aspx

Stay tunned and don't forget to check the current TSQL Challenge #8.

Congratulations Aurelien for this solutions and thanks for sharing it with us!

TSQL Challenge 8

Welcome to TSQL Challenge 8!

This challenge is more about processing HIERARCHIES. SQL Server 2008 introduced a new data type HIERARCHYID which is pretty good for processing hierarchies. However, the problem explained in this challenge should be fixed without using the HIERARCHYID data type.

The Problem

The problem is all about identifying all the employees directly or indirectly reporting to a given manager. Here is the organization chart of my fictitious company.

/*
Jacob
Jess
Steve
John
Bob
Michael
Paul
Rui
Smith
Lana
Johnson
Bobbey
Mic
Stev
Jacobson
Steffi
Paulson
Bracha
Jessica
*/

The Challenge

The challenge is to write a query that can take a Manager Name as a parameter and list all the employees reporting to that manager, directly or indirectly. If the parameter contains “Smith” the query should return:

Smith
Lana
Johnson

If the parameter passed is ‘Jacob’, the query should return:

Jacob
Jess
Steve
John
Bob
Michael
Paul

Your query should ideally look like the following:

DECLARE @manager VARCHAR(20)
SELECT @manager = 'Smith'
-- Your query here:

Adding Complexity
This query is pretty easy to write using a recursive CTE. To make the query complex, so that it will meet the complexity level expected for the ‘TSQL Challenges’ series, I am adding the following restriction.
 
“The query should be written using a recursive CTE, but the filter for "@manager” should not be applied inside the CTE”

Sample Data

Use the following sample data for writing your queries:

DECLARE @Employees TABLE (EmpID INT, EmpName VARCHAR(20), ReportsTo INT)
INSERT INTO @Employees(EmpID, EmpName, ReportsTo)
SELECT 1, 'Jacob', NULL UNION ALL
SELECT 2, 'Rui', NULL UNION ALL
SELECT 3, 'Jacobson', NULL UNION ALL
SELECT 4, 'Jess', 1 UNION ALL
SELECT 5, 'Steve', 1 UNION ALL
SELECT 6, 'Bob', 1 UNION ALL
SELECT 7, 'Smith', 2 UNION ALL
SELECT 8, 'Bobbey', 2 UNION ALL
SELECT 9, 'Steffi', 3 UNION ALL
SELECT 10, 'Bracha', 3 UNION ALL
SELECT 11, 'John', 5 UNION ALL
SELECT 12, 'Michael', 6 UNION ALL
SELECT 13, 'Paul', 6 UNION ALL
SELECT 14, 'Lana', 7 UNION ALL
SELECT 15, 'Johnson', 7 UNION ALL
SELECT 16, 'Mic', 8 UNION ALL
SELECT 17, 'Stev', 8 UNION ALL
SELECT 18, 'Paulson', 9 UNION ALL
SELECT 19, 'Jessica', 10

Notes:

  1. The query should run on SQL Server 2005 and 2008.
  2. Write a single query that returns the results. No temp tables, functions etc should be used.
  3. Send your entries to tc@beyondrelational.com with subject ‘TSQL Challenge 8’ and send your code as an attachment (.sql file)
  4. Use 4 spaces for indentation while generating the output hierarchy.
  5. Siblings within a parent node may be sorted either by Name or by ID (I have sorted them by ID in the sample given above)
  6. Last date to submit your entries: 1 June 2009
  7. Visit this forum for general questions on TSQL Challenges.
  8. Visit this forum for any question on TSQL Challenge 8.

TSQL Challenge 7

The Context

You have done some Sql Server development tasks for a small company that has one Sql Server with several bases. Before you leave, as they will not have any dba to monitor databases, they ask you to write a simple script to help them monitor space used by their data.

The Challenge

You have to write the shortest script to list the 5 biggest tables on the server. It should include all databases that are not system databases. The resulting table will include only the following fields: BASE as database name, TABLE as table name, SIZE as table size used only by data (no index or unused reserved space) in KB and then ROWS as the number of rows in the table (in order to correlate if the space used is big by data or because of the number of rows).

This is an example resulting table:

BASE     TABLE   SIZE        ROWS      
----------------------------------
base1    T1      162720 KB   14233       
base2    T2      38000 KB    8462       
base1    T3      10720 KB    5343        
base1    T4      9040 KB     5006       
base3    T5      2800 KB     1899  

The script should work on any SQL Server “as is”.

Notes:

  • The target of the challenge is to write the shortest code. We will only count the chars for this challenge, so don’t try to make it readable. Keep just the essential code.
  • You should write it as you want with any of the capabilities provided by a standard installation of SQL Server.
  • The script should target any version of SQL Server and it is not limited to only one query.
  • The script should not create any persistent objects (you should use temp tables or create proc/udf if you remove them after but not leave them on the server after your script runs).
  • The result has to be a table as described before (this means not a print) and should be reusable for future actions (we can plan for example to send the result table by mail or stored it on an other data table every 5 min with a job…)
  • Send your entries to tc@beyondrelational.com
  • Do not paste the code in the email body. Include it as an attachment: if possible a file named firstname_lastname_tsqlchallenge7.sql
  • Mention ‘TSQL Challenge 7’ in the email subject.
  • Last Date: 25 May 2009.
  • There is a forum linked to 'TSQL Challenges' for general questions and discussions on the 'TSQL Challenges' series.
  • For any questions or comments specific to TSQL Challenge 7, visit the dedicated thread

TSQL Challenges and Social Networking

To be able to connect better with the admirers of TSQL Challenges, we have been working on creating our workspace on a number of social networking websites/applications. Many thanks to my friend Rui, who is very passionate about TSQL Challenges and is taking lead on this front.

You can connect with ‘TSQL Challenges’ on any or all of the following locations.

twitter

linkedin

facebook

In addition to this, there is a dedicated group within www.beyondrelational.com for TSQL Challenges.

Happy Social Networking!


TSQL Challenge 5 - Solution by Friedrich Paul

Friedrich Paul is another many-times-winner of the TSQL Challenges.

He uses like Leonid only one CTE with the same split parsing recursively mechanism . This solution is very goods in terms of code optimization (even if it is too much compressed for a good readability in my point of view ;-)).

Here is his solution:

;with extract(id,name,tags,old_pos,new_pos)
as (
select id,name,
    substring(tags,0,
    case when charindex(',',tags,0) = 0 
    then len(tags) else charindex(',',tags,0) end),
    0 as old_pos,charindex(',',tags,0) + 1 as new_pos 
from @blog --Anchor member function
Union all
select b.id ,b.name,
    substring(b.tags,new_pos,
    case when charindex(',', b.tags,new_pos) = 0 
    then len(b.tags) 
    else charindex(',',b.tags,new_pos) - new_pos end),
    charindex(',',b.tags,new_pos) as old_pos,
    charindex(',',b.tags,new_pos) + 1 as new_pos
from @blog b 
inner join extract e on b.id = e.id 
where new_pos ! = 1 --Recursive member function
)
select f.id,f.data,e.name,b.tags,
    count(*) as relevance 
from @filter f,extract e,@blog b
where b.id = e.id and f.data like '%' + e.tags + '%'
group by f.id,f.data,e.name,b.tags
order by id,relevance desc

 

You can find all the explanations of this solution on the Friedrich's blog post about that: http://datawarehousejourney.blogspot.com/2009/05/solution-for-tsql-challenge-5.html

If you have questions or comments about this solution, discuss it on the dedicated forum: http://beyondrelational.com/forums/t/56.aspx

If you have generic questions or comments about the TSQL Challenge 5 you can go to this forum: http://beyondrelational.com/forums/t/18.aspx

Stay tunned and don't forget to check the TSQL Challenge #6.

Congratulations Freddy for this solutions and thanks for sharing it with us!


TSQL Challenge 5 - Solution by Divya Agrawal

Divya is a new challenger that sends us a very interesting solution. She show us that it can be done without CTE in one very short and simple query. We really enjoyed her solution not only because it is a completely different solution but also because of the global ergonomics of the query

She is working as a SQL Server developer  and had written many articles on MsSqlTips and SqlServerCentral.

Here is her solution:

select id,data,name,tags,COUNT(relevance) as Relevance
from
(
    select F.id,
           F.data,
           B.name,
           B.tags,
           filterdata,
           case when CHARINDEX(filterdata,B.tags) >0 
           then 1 else 0 end as relevance
    from (
        select F1.id,
               F1.data,
               O.filterdata              
        from
            (
                select *,
                    cast('<X>'+
                        replace(F.data,',','</X><X>')+
                        '</X>' as XML) 
                        as xmlfilter
                from @filter F
            )F1
            cross apply
            ( 
                select fdata.D.value('.','varchar(50)') as filterdata 
                from f1.xmlfilter.nodes('X') as fdata(D)) O
            ) F
            ,@blog B
)F1
where relevance =1
 group by F1.id,F1.data,F1.name,F1.tags
order by F1.id,Relevance desc

The key point of this solution is to transform the coma separated values to an xml values series then to get the values by a cross apply. Simple but efficient!.

You can find all the explanations of this solution on the Divya's blog post about that: http://divya-sqlserver.blogspot.com/2009/05/solution-of-t-sql-challenge-5.html.

Tejas also wrote a post about reading coma separated values with xml that should help you understanding this hack: http://beyondrelational.com/blogs/tejas/archive/2009/05/04/sql-server-read-values-from-comma-separated-variable.aspx.

If you have questions or comments about this solution, discuss it on the dedicated forum: http://beyondrelational.com/forums/t/55.aspx

If you have generic questions or comments about the TSQL Challenge 5 you can go to this forum: http://beyondrelational.com/forums/t/18.aspx

Stay tunned and don't forget to check the TSQL Challenge #6.

Congratulations Divya for this solutions and thanks for sharing it with us!

TSQL Challenge 5 - Solution by Leonid Koyfman

There were a lot of different solutions for this challenge but globaly the key point to release the split in line in an unique query was to use CTEs. As most people send us valuable queries with CTEs,  we took in consideration also other notation points as the length, the ergonomy of the query and brevity/simplicity.

 

Again for this challenge, Leonid Koyfman provides us a simple and sexy solution. There is only one CTE, the code is clear, readable, and also one with the best performances, just the things we need. Leonid is a ‘many times winner’ of the challenge and SQL Server Expert.

Here is it's solution:

;with cte
as(
    select 
        id,
        tag=case charindex(',',tags)
            when 0 then tags 
            else substring(tags,1,charindex(',',tags,1)-1)    
            end ,
        txt=substring(tags,charindex(',',tags,2)+1,len(tags))    
    from @blog
    union all
    select     
        id,
        tag=case charindex(',',txt)    
            when 0 then txt    
            else substring(txt,1,charindex(',',txt,1)-1) 
            end ,
        txt=substring(txt,charindex(',',txt,2)+1,len(txt))
    from cte
    where txt!=tag
)
select 
    f.id,
    f.data,
    b.[name],
    b.tags,
    RELEVANCE=count(*)
from
    cte join    
    @blog b on b.id=cte.id join    
    @filter f 
        on charindex(','+cte.tag+',' , ','+f.data+',')>0
group by 
    f.id,
    f.data,
    b.[name],
    b.tags
order by 
    id,
    count(*)desc,
    b.[name]

You can find a very detailled explanation of his solution on his blog: http://beyondrelational.com/blogs/leonid/archive/2009/05/11/solution-for-challenge-5.aspx.

If you have questions or comments about this solution, discuss it on the dedicated forum: http://beyondrelational.com/forums/t/54.aspx

If you have generic questions or comments about the TSQL Challenge 5 you can go to this forum: http://beyondrelational.com/forums/t/18.aspx

Stay tunned and don't forget to check the TSQL Challenge #6.

Again, congratulations Leonid and thanks for sharing this solution with us!


TSQL Challenge 5 winners

I spent a lot of time reviewing all the solutions and regarding the key points of our current notation for the challenge this is the list of the winners:

  • Leonid Koyfman which is a many times winner of the challenge and provide as usual a clean and very efficient solution
  • Divya Agrawal which share the first place with Leonid and which provide the shortest solution (and without CTE!)
  • Friedrich Paul and Aurélien Verla which provide a more traditional but efficient solution.

We will publish in the next days explanations for each solution.

I would like to mention also the fantastic solution of Michael Coles which provide a full Xml solution that we also talk about in a few days.

A few words about this challenge. The key starting point for this one was to wonder if it is possible to find some solutions to realize inline, inside a stand alone sql query something like the traditional sql split that all us sql developpers have used at least one time.

The exercice was interesting because in definitive, coma separated values are very often used (even if I admit it is not in most cases a good practice…).

We can imagine in this exemple that our query should help to populate an auto-complete search list by tags in a blog application. Coma separated values are not as easy as it seems to share between the application and the database. Most of the times, developpers move this difficulty from sql side to application side my doing the same with objects manipulation. I am sure we can write a pretty linq query with the same effect. But most of the times for larger data volumes that’s not the good performance solution.

Awaiting to present the different solutions of each winners, this is my first solution. It’s one solution, not the better one as you will see in next posts:

;WITH FILTER (id , data, VALUE) AS (
    SELECT    id
            ,data = CAST(data AS NVARCHAR(MAX))
            ,VALUE = CAST('' AS NVARCHAR(MAX))
    FROM @filter
    UNION ALL
    SELECT    id, data =    
            CASE WHEN CHARINDEX(',',data) > 0 
            THEN SUBSTRING(
                    data,
                    CHARINDEX(',',data)+1,
                    LEN(data)-CHARINDEX(',',data))
            ELSE '' END
            , VALUE =    
            CASE WHEN CHARINDEX(',',data) > 0 
            THEN SUBSTRING(data,1,CHARINDEX(',',data)-1) 
            ELSE data END
    FROM FILTER
    WHERE LEN(data) > 0
),
 BLOG_TAGS (id , tags, VALUE) AS (
    SELECT    id
            ,tags = CAST(tags AS NVARCHAR(MAX))
            ,VALUE = CAST('' AS NVARCHAR(MAX))
    FROM @blog
    UNION ALL
    SELECT    id, tags =    
            CASE WHEN CHARINDEX(',',tags) > 0 
            THEN SUBSTRING(
                    tags,
                    CHARINDEX(',',tags)+1,
                    LEN(tags)-CHARINDEX(',',tags))
            ELSE '' END
            , VALUE =    
            CASE WHEN CHARINDEX(',',tags) > 0 
            THEN SUBSTRING(tags,1,CHARINDEX(',',tags)-1) 
            ELSE tags END
    FROM BLOG_TAGS
    WHERE LEN(tags) > 0
)
 
SELECT F.ID,F.data,B.name,B.tags,RELEVANCE = COUNT(B.ID)
FROM(
    SELECT    F.ID,F.data,FILTER.VALUE 
    FROM    FILTER INNER JOIN @filter F ON FILTER.ID = F.ID
    WHERE LEN(VALUE) > 0
) F
INNER JOIN 
(
    SELECT    B.ID, B.name, B.tags,T.VALUE
    FROM    @blog B INNER JOIN BLOG_TAGS T ON B.ID = T.ID
    WHERE LEN(VALUE) > 0
) B ON F.VALUE = B.VALUE
GROUP BY F.ID,F.data,B.name,B.tags
ORDER BY F.ID,RELEVANCE DESC

If you have questions or comments about this solution, discuss it on the dedicated forum: http://beyondrelational.com/forums/t/53.aspx

If you have generic questions or comments about the TSQL Challenge 5 you can go to this forum: http://beyondrelational.com/forums/t/18.aspx

Stay tunned and don't forget to check the TSQL Challenge #6

cheers!

Posted: 05-08-2009 1:28 AM by Rui Carvalho | with no comments
Filed under: ,

TSQL Challenge 6

The context

For this challenge, you will have to analyse logs on the web server of your company.

Here is the sample data you have to work with:

ID          VisitDate               Page            NbVisitors   
----------- ----------------------- --------------- -----------   
1           2009-03-23 00:00:00.000 Home            10  
2           2009-03-24 00:00:00.000 Home            14  
3           2009-03-26 00:00:00.000 Home            22  
4           2009-03-27 00:00:00.000 Home            3  
5           2009-03-29 00:00:00.000 Home            4  
6           2009-04-01 00:00:00.000 Home            33  
7           2009-04-03 00:00:00.000 Home            2  
8           2009-03-26 00:00:00.000 Contact         22  
9           2009-03-27 00:00:00.000 Contact         10  
10          2009-03-29 00:00:00.000 Contact         35  
11          2009-03-30 00:00:00.000 Contact         13  
12          2009-03-27 00:00:00.000 Products        8  
13          2009-03-29 00:00:00.000 Products        12  
14          2009-04-01 00:00:00.000 Products        16  
  

The Challenge

There is an existing application that takes tabular data with hierarchical consolidation inside. Your boss ask you to produce a report about web site activity using this application. In order to use the hierarchical capabilites of this application, each row of the report you provide needs to have an id and hierarchical level information. The Id is a construction based on year, week number and day in week number. For each consolidation row A T is added at the end (see the resulting table).

The key point of this report is to count for each day, each week, each year, the number of visitors of each page of the site. A period label should be also produced with the following format : '3 letter of the name of the day in week' day / month (exemple: MON 23/3) for the day rows, TOTAL WEEK XX (exemple TOTAL WEEK 25) for the week rows and finally TOTAL YEAR XXXX for the year rows.
We supose that we only have 3 sections on the site corresponding to Home,Contact and Products. This should be considered as static information and will not be updated in the future.

This is the expected output based on the input data provided.

PERIOD_ID  LEVEL PERIOD          HOME        CONTACT     PRODUCTS    
---------- ----- --------------- ----------- ----------- -----------    
2009131    0        MON  23/3    10          0           0    
2009132    0        TUE  24/3    14          0           0    
2009134    0        THU  26/3    22          22          0    
2009135    0        FRI  27/3    3           10          8    
2009137    0        SUN  29/3    4           35          12    
200913T    1     TOTAL WEEK 13   53          67          20    
2009141    0        MON  30/3    0           13          0    
2009143    0        WED  1/4     33          0           16    
2009145    0        FRI  3/4     2           0           0    
200914T    1     TOTAL WEEK 14   35          13          16    
2009T      2     TOTAL YEAR 2009 88          80          36   


Notes:

  • You should write this in any number of queries to produce the final correct table (but solutions with only one query will have better score)
  • You have to use only pure T-SQL, no CLR.
  • The query can assume that there are only three pages in the website: home, contact and products.
  • The Query should target any version of Sql Server (but a solution that works on all will be prefered)
  • You have to use the exact sample data provided (please don't re-write the sample data with your naming,etc...)
  • Send your entries to jacob@beyondrelational.com
  • Do not paste the code in the email body. Include it as an attachment (.sql file)
  • Mention ‘TSQL Challenge 6’ in the email subject
  • Last Date: 18 May 2009
  • There is a forum linked to 'TSQL Challenges' for general questions and discussions on the 'TSQL Challenges' series.
  • For any questions specific to 'TSQL Challenge 6', visit this thread.


Sample Data:

SET DATEFIRST 1    
SET DATEFORMAT YMD    
SET NOCOUNT ON    
DECLARE @T TABLE (ID INT IDENTITY(1,1), VisitDate DATETIME, Page NVARCHAR(15),NbVisitors INT)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-03-23','Home',10)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-03-24','Home',14)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-03-26','Home',22)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-03-27','Home',3)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-03-29','Home',4)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-04-01','Home',33)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-04-03','Home',2)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-03-26','Contact',22)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-03-27','Contact',10)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-03-29','Contact',35)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-03-30','Contact',13)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-03-27','Products',8)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-03-29','Products',12)    
INSERT INTO @T (VisitDate,Page,NbVisitors) VALUES ('2009-04-01','Products',16)   

Copyright © Beyondrelational.com