TSQL Challenge 9
What is new with TSQL Challenges
‘TSQL Challenges Team’ is working very hard to take this series to the next level. As part of this process, we have redefined the goal and structure of TSQL Challenges.
Goal
The goal of TSQL Challenges is to help people enhance their SET based query writing skills. Most of the times, a SET based query performs better than a row-by-row (often referred as RBAR – Row By Agonizing Row). Most people find trouble writing set based operations because they are more familiar with procedural programming model. It takes some practice and exercises to develop the SET based thinking and query writing skills.
The challenges we bring through ‘TSQL Challenges’ series will encourage people to write SET based queries to solve common problems. It will help people who are not familiar with SET based queries, to start learning it. It will help people who are already familiar with SET based queries, to learn different, and often better ways of writing queries.
Structure
The challenges we publish through ‘TSQL Challenges’ series will follow a uniform structure. All the challenges will focus on transforming data from one form to another. Each challenge will show a set of source data and expected results. The solution of the challenge should be written using a single TSQL Query. The query can use CTEs if needed and it will be considered as part of the query that uses the CTE.
Your comments
We will be very happy to hear your comments and feedback on what we do with TSQL Challenges. You can post a comment or send an email to tc@beyondrelational.com.
TSQL Challenge 9
Let us move to Challenge 9. This is a real-life challenge that our ‘Challenge Team Lead’, Rui encountered recently. Here is the problem description in his own words.
The Context
“We are managing a messaging system that needs some consolidation functionalities. There are two systems that exchanges messages. For each message, there is a flag that indicates that the message was sent or not and another flag that indicates that the message was received or not. Obviously, the receive flag can not be true if sent flag is false. Delivery of a message may fail due to a number of reasons such as high network traffic, packet loss, network failure, server busy and so on.
We want to build a synchronization tool that is able to analyze the status of the messages and ask for synchronization updates during low network traffic periods. This is process workflow:
- multiple times per day when the traffic is low, the 'target server' will ask for a synchronization report to the source server
- The source server will build a report of ranges of messages depending on their status
- The target server will ask the source server to re-send messages range by range.
- The target server will send acknowledgement for each message received
to from the source server.”
The Challenge
The challenge is to build the ranges of sequential values. The table with these ranges will have the following properties:
- The first message identifier of the range
- The last message identifier of the range
- The Send status of the range
- The Acknowledgement status
The ranges of the messages to take will be selected between two dates. The table has an IDENTITY column to maintain the sequence.
Source Data
ID CreationDate Content SendState AckState
----------- ----------------------- ---------- --------- --------
1 2009-05-27 22:15:43.647 Msg #1 0 0
2 2009-05-28 00:39:43.647 Msg #2 0 0
3 2009-05-28 03:03:43.647 Msg #3 1 1
4 2009-05-28 05:27:43.647 Msg #4 1 1
5 2009-05-28 07:51:43.647 Msg #5 1 1
6 2009-05-28 10:15:43.647 Msg #6 1 0
7 2009-05-28 12:39:43.647 Msg #7 1 0
8 2009-05-28 15:03:43.647 Msg #8 1 0
9 2009-05-28 17:27:43.647 Msg #9 1 0
10 2009-05-28 19:51:43.647 Msg #10 1 1
Result Data
FirstIdInclusive LastIdInclusive SendState AcknoledgeState
---------------- --------------- --------- ---------------
1 2 0 0
3 5 1 1
6 9 1 0
10 10 1 1
Performance is a key factor for this challenge. There are plenty of solutions for this challenge but the work to build sequential ranges is not natural in SQL and not all solutions will perform and scale as well. You should suppose that messages selected between two dates may grow up to 1 million rows and that the ranges size may be one to thousands. As the process should be launched a lot of times per day, it needs to be as fast as possible.
You have complete freedom to choose any approach to solve this problem. There is no restriction on the version of SQL server you should target to. Your solution will be acceptable as long as it produces the correct results and considers performance and scalability.
Sample Data
Use the following script to generate the sample data for this challenge.
DECLARE @tc9 TABLE(
ID INT IDENTITY(1,1),
CreationDate DATETIME,
Content NVARCHAR(10),
SendState BIT,
AckState BIT
)
INSERT INTO @tc9 (CreationDate,Content,SendState,AckState)
SELECT GETDATE()-1.0,'Msg #1',0,0 UNION
SELECT GETDATE()-0.9,'Msg #2',0,0 UNION
SELECT GETDATE()-0.8,'Msg #3',1,1 UNION
SELECT GETDATE()-0.7,'Msg #4',1,1 UNION
SELECT GETDATE()-0.6,'Msg #5',1,1 UNION
SELECT GETDATE()-0.5,'Msg #6',1,0 UNION
SELECT GETDATE()-0.4,'Msg #7',1,0 UNION
SELECT GETDATE()-0.3,'Msg #8',1,0 UNION
SELECT GETDATE()-0.2,'Msg #9',1,0 UNION
SELECT GETDATE()-0.1,'Msg #10',1,1
SELECT * FROM @tc9
Notes
- Along with your entry, write an explanation of your code and logic used.
- Make sure that you have created your profile at beyondrelational.com and a photo is uploaded to the profile. If you have not registered with the site, you can do so by clicking on the ‘join’ link on the top right corner. If your entry wins, we will link to your profile when we publish the results.
- By submitting your entry for a challenge, you authorize beyondrelational.com to ‘edit’ (if needed) and ‘publish’ it.
- Visit this forum for general questions on TSQL Challenges.
- Visit this forum for any question on TSQL Challenge 9.