August 2009 - Posts
This is a real life scenario I came across recently. We build applications
for the health care industry and one of the modules in the application generates
invoices for the services provided to the patients and sent them to the
insurance companies. The information is sent to the insurance companies as
electronic files. Each company have their own format and rules for the
electronic file they accept.
Before I explain the rest of the requirements, let me show you some sample
data.
InvID BatchNumber InvoiceNumber VisitDate Amount
------ ----------- ------------- ----------- ------
1 10000001 20001 2009-01-01 50.00
2 10000001 20001 2009-01-02 50.00
3 10000001 20001 2009-01-03 50.00
4 10000001 20001 2009-01-04 50.00
5 10000001 20002 2009-01-01 50.00
6 10000001 20002 2009-01-02 50.00
7 10000001 20002 2009-01-03 50.00
8 10000001 20003 2009-01-01 50.00
9 10000001 20003 2009-01-02 50.00
10 10000001 20004 2009-01-01 50.00
11 10000001 20004 2009-01-02 50.00
12 10000001 20004 2009-01-03 50.00
13 10000001 20004 2009-01-04 50.00
14 10000001 20004 2009-01-05 50.00
15 10000001 20004 2009-01-06 50.00
16 10000001 20005 2009-01-01 50.00
17 10000001 20005 2009-01-02 50.00
18 10000001 20005 2009-01-13 50.00
19 10000001 20005 2009-01-14 50.00
20 10000001 20005 2009-01-15 50.00
21 10000001 20005 2009-01-06 50.00
22 10000001 20005 2009-01-07 50.00
23 10000001 20005 2009-01-08 50.00
24 10000001 20006 2009-01-01 50.00
25 10000002 20007 2009-01-01 50.00
26 10000002 20007 2009-01-02 50.00
27 10000002 20007 2009-01-03 50.00
28 10000002 20008 2009-01-01 50.00
29 10000002 20008 2009-01-02 50.00
30 10000002 20008 2009-01-03 50.00
Let me start explaining the rules and make the requirement further clear. The
insurance company needs an electronic file in a certain format (the format is
irrelevant for us). A separate file needs to be generated for each batch. So
based on the sample data we need to generate two files (one for each batch).
Now comes the complex part. Each batch should contain not more than 10 rows.
The maximum number of lines we can put in a file is 10. So we need to break each
batch into parts that contains not more than 10 rows. When breaking the batches,
there is one more rule to follow. While breaking the batches, we should take
care that we do not break invoices.
The challenge here is to generate an additional column that helps us to break
the batches into pieces of 10 or lesser rows. Here is the expected output:
InvID BatchNumber InvoiceNumber VisitDate Amount Set
------ ----------- ------------- ----------- ------ ---
1 10000001 20001 2009-01-01 50.00 1
2 10000001 20001 2009-01-02 50.00 1
3 10000001 20001 2009-01-03 50.00 1
4 10000001 20001 2009-01-04 50.00 1
5 10000001 20002 2009-01-01 50.00 1
6 10000001 20002 2009-01-02 50.00 1
7 10000001 20002 2009-01-03 50.00 1
8 10000001 20003 2009-01-01 50.00 1
9 10000001 20003 2009-01-02 50.00 1
10 10000001 20004 2009-01-01 50.00 2
11 10000001 20004 2009-01-02 50.00 2
12 10000001 20004 2009-01-03 50.00 2
13 10000001 20004 2009-01-04 50.00 2
14 10000001 20004 2009-01-05 50.00 2
15 10000001 20004 2009-01-06 50.00 2
16 10000001 20005 2009-01-01 50.00 3
17 10000001 20005 2009-01-02 50.00 3
18 10000001 20005 2009-01-13 50.00 3
19 10000001 20005 2009-01-14 50.00 3
20 10000001 20005 2009-01-15 50.00 3
21 10000001 20005 2009-01-06 50.00 3
22 10000001 20005 2009-01-07 50.00 3
23 10000001 20005 2009-01-08 50.00 3
24 10000001 20006 2009-01-01 50.00 3
25 10000002 20007 2009-01-01 50.00 1
26 10000002 20007 2009-01-02 50.00 1
27 10000002 20007 2009-01-03 50.00 1
28 10000002 20008 2009-01-01 50.00 1
29 10000002 20008 2009-01-02 50.00 1
30 10000002 20008 2009-01-03 50.00 1
Sample Data
Use the following script to generate the sample data needed for this challenge
DECLARE @t TABLE (
InvID INT IDENTITY,
BatchNumber INT,
InvoiceNumber INT,
VisitDate DATETIME,
Amount MONEY )
INSERT INTO @t(BatchNumber, InvoiceNumber, VisitDate, Amount)
SELECT 10000001,20001,'2009-01-01',50.00 UNION ALL
SELECT 10000001,20001,'2009-01-02',50.00 UNION ALL
SELECT 10000001,20001,'2009-01-03',50.00 UNION ALL
SELECT 10000001,20001,'2009-01-04',50.00 UNION ALL
SELECT 10000001,20002,'2009-01-01',50.00 UNION ALL
SELECT 10000001,20002,'2009-01-02',50.00 UNION ALL
SELECT 10000001,20002,'2009-01-03',50.00 UNION ALL
SELECT 10000001,20003,'2009-01-01',50.00 UNION ALL
SELECT 10000001,20003,'2009-01-02',50.00 UNION ALL
SELECT 10000001,20004,'2009-01-01',50.00 UNION ALL
SELECT 10000001,20004,'2009-01-02',50.00 UNION ALL
SELECT 10000001,20004,'2009-01-03',50.00 UNION ALL
SELECT 10000001,20004,'2009-01-04',50.00 UNION ALL
SELECT 10000001,20004,'2009-01-05',50.00 UNION ALL
SELECT 10000001,20004,'2009-01-06',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-01',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-02',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-13',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-14',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-15',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-06',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-07',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-08',50.00 UNION ALL
SELECT 10000001,20006,'2009-01-01',50.00 UNION ALL
SELECT 10000002,20007,'2009-01-01',50.00 UNION ALL
SELECT 10000002,20007,'2009-01-02',50.00 UNION ALL
SELECT 10000002,20007,'2009-01-03',50.00 UNION ALL
SELECT 10000002,20008,'2009-01-01',50.00 UNION ALL
SELECT 10000002,20008,'2009-01-02',50.00 UNION ALL
SELECT 10000002,20008,'2009-01-03',50.00
Notes
- Read the Submission Guidelines and make sure that your solution follows
them.
- The solution should work on SQL Server 2005, 2008 or later versions
- Last date to submit your entries: Monday 21 September 2009 Midnight GMT
- Use this forum for any
questions related to TSQL Challenge #13
About the Author.
For the TSQL Challenge #9, David BARBARIN is a new winner and he has decided to join TSQL Challenge Team.
David BARBARIN is living in Lyon, France. Actually, he finishs his engineer course. He has worked with SQL Server for 6 years, primarily as a developer and now he's dedicated to performance and tuning solutions. He's an active member and contributes of well-know french's site and forum www.developpez.com and msdn forum. Recently, he writes articles and expresses his experiences at his blog.
How did David work on the challenge?
Here is the comments by David on this challenge:
For this challenge i was inspired by a famous war's rule : "To attack a country, we must know its boundaries".
I like to make the analogies :-)
In the case of the challenge 9, to find this boundaries (firstIdInclusive and the lastIdInclusive), I used a simply auto left join operation with the table @tc9.
The column ID is shifted by using first ID value - 1 or first ID value + 1 to detect change which corresponding to the boundaries.
The null value column's ID represents respectively the value FirstIdInclusive and the value LastIdInclusive.
(Editor note : To explain I have modified the CTE : I have voluntary included the second ID Column and deleted the where condition.
To keep just the boundary in each case, we have to add a where condition (WHERE ID IS NULL))
For the first boundary :
-- FirstIdInclusive
WITH CTE_BORNE_INF
AS
(
SELECT
t.ID,
t.CreationDate,
t.Content,
t.SendState,
t.AckState,
t2.ID AS id_2,
ROW_NUMBER() OVER(ORDER BY t.ID) AS num
FROM @tc9 t
LEFT JOIN @tc9 t2
ON t.ID - 1 = t2.ID
AND t.SendState = t2.SendState
AND t.AckState = t2.AckState
)
The output is
------------------------------------------------------------------
1 2009-07-26 19:21:13.073 Msg #1 0 0 NULL 1 <-- FirstIdInclusive
2 2009-07-26 21:45:13.073 Msg #2 0 0 1 2
3 2009-07-27 00:09:13.073 Msg #3 1 1 NULL 3 <-- FirstIdInclusive
4 2009-07-27 02:33:13.073 Msg #4 1 1 3 4
5 2009-07-27 04:57:13.073 Msg #5 1 1 4 5
6 2009-07-27 07:21:13.073 Msg #6 1 0 NULL 6 <-- FirstIdInclusive
7 2009-07-27 09:45:13.073 Msg #7 1 0 6 7
8 2009-07-27 12:09:13.073 Msg #8 1 0 7 8
9 2009-07-27 14:33:13.073 Msg #9 1 0 8 9
10 2009-07-27 16:57:13.073 Msg #10 1 1 NULL 10 <-- FirstIdInclusive
For the second boundary :
-- LastIdInclusive
WITH CTE_BORNE_SUP
AS
(
SELECT
t.ID,
t.CreationDate,
t.Content,
t.SendState,
t.AckState,
t2.ID AS id_2,
ROW_NUMBER() OVER(ORDER BY t.ID) AS num
FROM @tc9 t
LEFT JOIN @tc9 t2
ON t.ID + 1 = t2.ID
AND t.SendState = t2.SendState
AND t.AckState = t2.AckState
)
The output is :
------------------------------------------------------------------
1 2009-07-26 19:22:37.293 Msg #1 0 0 2 1
2 2009-07-26 21:46:37.293 Msg #2 0 0 NULL 2 <-- LastIdInclusive
3 2009-07-27 00:10:37.293 Msg #3 1 1 4 3
4 2009-07-27 02:34:37.293 Msg #4 1 1 5 4
5 2009-07-27 04:58:37.293 Msg #5 1 1 NULL 5 <-- LastIdInclusive
6 2009-07-27 07:22:37.293 Msg #6 1 0 7 6
7 2009-07-27 09:46:37.293 Msg #7 1 0 8 7
8 2009-07-27 12:10:37.293 Msg #8 1 0 9 8
9 2009-07-27 14:34:37.293 Msg #9 1 0 NULL 9 <-- LastIdInclusive
10 2009-07-27 16:58:37.293 Msg #10 1 1 NULL 10 <-- LastIdInclusive
Now it's time to merge the results of two CTE. I used, for this, the ROW_NUMBER() function which ordered the CTE by ID value and give the exact correspondance to each set of boundaries (FirstIdInclusive and LastIdInclusive).
SELECT
t_inf.ID AS FirstIdInclusive,
t_sup.ID AS LastIdInclusive,
t_inf.SendState,
t_inf.AckState AS AcknoledgeState
FROM CTE_BORNE_INF t_inf
INNER JOIN CTE_BORNE_SUP t_sup
ON t_inf.num = t_sup.num;
In a final version I keep the most important columns for this challenge
WITH CTE_BORNE_SUP(ID,CreationDate,Content,SendState,AckState,Num)
AS
(
SELECT
t.ID,
t.CreationDate,
t.Content,
t.SendState,
t.AckState,
ROW_NUMBER() OVER(ORDER BY t.ID)
FROM @tc9 t
LEFT JOIN @tc9 t2
ON t.ID + 1 = t2.ID
AND t.SendState = t2.SendState
AND t.AckState = t2.AckState
WHERE t2.ID IS NULL
),
CTE_BORNE_INF(ID,CreationDate,Content,SendState,AckState,Num)
AS
(
SELECT
t2.ID,
t2.CreationDate,
t2.Content,
t2.SendState,
t2.AckState,
ROW_NUMBER() OVER(ORDER BY t2.ID)
FROM @tc9 t
RIGHT JOIN @tc9 t2
ON t.ID = t2.ID - 1
AND t.SendState = t2.SendState
AND t.AckState = t2.AckState
WHERE t.ID IS NULL
)
SELECT
t_inf.ID AS FirstIdInclusive,
t_sup.ID AS LastIdInclusive,
t_inf.SendState,
t_inf.AckState AS AcknoledgeState
FROM CTE_BORNE_INF t_inf
INNER JOIN CTE_BORNE_SUP t_sup
ON t_inf.num = t_sup.num;
So, congratulations again David and thank you for this great piece of explanations!
Why join TSQL Challenge Team ?
After this challenge David has decided to join TSQL Challenge's Team. He explains now why :
When I finished my first Challenge (Challenge 8), it was 2 am approximatively. I didn't see the time ! Damn I'll start working in 6 hours :-)
I joined immediately to the TSQL Challenge's concept ! That is why i wanted to be a volunteer !
Why to be volunteer ? Offering new challenges and selecting the winners require more work and I want to contribute to the TSQL Challenge project's developement. I think TSQL Challenge is a very good community projet and it's also a good human's experience. You will understand, I just want to share one of my passions with the TSQL's community !!
Submission Guidelines Make sure that your solution uses a single query that starts either with "SELECT" or "WITH". Make sure that you create a .sql file with your solution. The file should be named as 'firstname_lastname_tsqlchallenge_challengenumber...
The News
Here we go again with a new challenge, we hope that you will have some time to share with us working on the challenge even if it’s summer time. The TSQL Challenge teams are growing and we are very exited to have four new Sql Server MVPs in the persons of Mangal Pardeshi, Peter Larsson, Alejandro Messa and Adam Haines. I would like to wish them again a big welcome!
As usual we will be very happy to hear your comments and feedback on what we do with TSQL Challenges. You can post a comment or send an email to tc@beyondrelational.com.
The Context
You are working for an online gaming company and as usual when we talk about games we need to manage scores. Some features in your system had recently changed. Before when a user get connected you only shown him its last score, but now you have to present him a graph month by month of its best score in each one since its first game until the current date.
Here is the scores tables of an user for each month he played:
YearMonth Score
----------- -----------
200903 100
200803 95
200802 99
200801 100
200711 100
The Challenge
When a user connects for the first time after the deployment of the new system you will need to produce a table based on the original scores table with the following conditions:
-
Create a new couple year/month for each missing month between two valid months of the original table
-
For each new couple created, you should recopy the score of the last month he played.
-
Continue the list until the current month (included).
Here is the resulting table you need to produce:
YearMonth Score
----------- -----------
200908 100
200907 100
200906 100
200905 100
200904 100
200903 100
200902 95
200901 95
200812 95
200811 95
200810 95
200809 95
200808 95
200807 95
200806 95
200805 95
200804 95
200803 95
200802 99
200801 100
200712 100
200711 100
Sample data
Use the following script to generate the sample data:
DECLARE @Scores TABLE
(
YearMonth INT,
Score INT
)
INSERT @Scores VALUES(200903, 100)
INSERT @Scores VALUES(200803, 95)
INSERT @Scores VALUES(200802, 99)
INSERT @Scores VALUES(200801 ,100)
INSERT @Scores VALUES(200711, 100)
Notes
- Write a single query that produces the expected output.
- The query can target any version of Sql Server
- Send your entries to tc@beyondrelational.com
- Do not include your solutions in the body of the email. Send them as an attachment in the email. Name it “firstname_lastname.sql”
- Add ‘TSQL Challenge #12’ in the subject line of the email.
- Last date to submit your entries: Monday 24 August 2009
- Use this forum for any questions related to TSQL Challenge #12
About the Authors
|
Rui Carvalho is the director for TSQLChallenges. He is a senior developer on Sql Server and .Net mainly experienced in web applications. See complete profile.
|
|
Mangal is an E&TC Engineer from Pune University. He started his career as an ERP Technical Consultant in 2007. Currently he works as a BI Developer. His core areas of expertise are Data Warehousing and Business Intelligence.
See complete profile
|
|