We had a glitch in one of the automatic scripts that we use to evaluate the solutions and had to re-evaluate the solutions last night. After spending the whole night (and unlimited cups of coffee) evaluating and reviewing the different solutions, we have the final list of winning solutions.
Thank you Brad, Rui and David for your help in re-doing the evaluation.
Solution by Jim Carter
I found the solution by Jim Carter to be very simple and elegant. He used a CTE that uses ROW_NUMBER() and PARTITION BY along with an elegant logic to make sure that the invoices don’t break apart into different sets. Please see TSQL Challenge #13 for problem description, sample data and expected results.
Getting Started
Jim started with a simple CTE that produced a new column using ROW_NUMBER() which resets after each batch is completed.
SELECT
BatchNumber,
InvoiceNumber,
VisitDate,
(
ROW_NUMBER() OVER
(PARTITION BY BatchNumber ORDER BY InvoiceNumber)
) AS [Working_Set]
FROM @t
)
SELECT * FROM mycte
The above code produces the following output.
BatchNumber InvoiceNumber VisitDate Working_Set
----------- ------------- ---------- -----------
10000001 20001 2009-01-01 1
10000001 20001 2009-01-02 2
10000001 20001 2009-01-03 3
10000001 20001 2009-01-04 4
10000001 20002 2009-01-01 5
10000001 20002 2009-01-02 6
10000001 20002 2009-01-03 7
10000001 20003 2009-01-01 8
10000001 20003 2009-01-02 9
10000001 20004 2009-01-01 10
10000001 20004 2009-01-02 11
10000001 20004 2009-01-03 12
........ ..... ...continued..
Next, he added the code to group the rows into sets of 10 rows. This is done by doing a division by 10 on the new column working_set (taking advantage of the SQL behavior that always returns an INT when an arithmetic operation of two INT values takes place).
;WITH MyCTE AS (
SELECT
BatchNumber,
InvoiceNumber,
VisitDate,
(
(
ROW_NUMBER() OVER
(PARTITION BY BatchNumber ORDER BY InvoiceNumber) / 10
) + 1
) AS [Working_Set]
FROM @t
)
SELECT * FROM MyCTE
It produces the following output (Note the usage of division by 10 which produces an INT number - 0 for the first 9 rows, and 1 for the next 10 rows etc). Adding +1 to the result of the division helps to group the rows into the correct sets.
BatchNumber InvoiceNumber VisitDate Working_Set
----------- ------------- ---------- -----------
10000001 20001 2009-01-01 1
10000001 20001 2009-01-02 1
10000001 20001 2009-01-03 1
10000001 20001 2009-01-04 1
10000001 20002 2009-01-01 1
10000001 20002 2009-01-02 1
10000001 20002 2009-01-03 1
10000001 20003 2009-01-01 1
10000001 20003 2009-01-02 1
10000001 20004 2009-01-01 2
10000001 20004 2009-01-02 2
........ ..... continued....
Good so far! However, there is one more challenge. According to the problem description, an invoice should not be broken into two different sets. The above query does not implement this rule. If you examine the result set, you could see the following (which is wrong)
BatchNumber InvoiceNumber VisitDate Working_Set
----------- ------------- ---------- -----------
........ removed for brevity ....................
10000001 20005 2009-01-01 2
10000001 20005 2009-01-02 2
10000001 20005 2009-01-13 2
10000001 20005 2009-01-14 2
10000001 20005 2009-01-15 3 <= Problem!
10000001 20005 2009-01-06 3
10000001 20005 2009-01-07 3
10000001 20005 2009-01-08 3
........ removed for brevity ....................
Note that invoice “20005” is broken into two sets (set 2 and set 3). So the next challenge is to fix this behavior. All the rows of invoice “20005” should be either in set 2 or in set 3. At the same time, one set cannot have more than 10 rows.
Jim came up with the following logic to fix this problem (which produces the expected result).
;WITH MyCTE AS (
SELECT
Amount,
ROW_NUMBER() OVER (ORDER BY BatchNumber, InvoiceNumber) AS InvID,
BatchNumber,
InvoiceNumber,
VisitDate,
(
(
ROW_NUMBER() OVER
(PARTITION BY BatchNumber ORDER BY InvoiceNumber) / 10
) + 1
) AS [Working_Set]
FROM @t
)
SELECT
MyCTE.InvID,
MyCTE.BatchNumber,
MyCTE.InvoiceNumber,
MyCTE.VisitDate,
MyCTE.Amount,
t2.[Set]
FROM MyCTE
INNER JOIN (
SELECT
InvoiceNumber,
MAX([Working_Set]) AS [Set],
COUNT(InvID) AS number_in_file
FROM MyCTE
GROUP BY InvoiceNumber
) t2 ON MyCTE.InvoiceNumber = t2.InvoiceNumber
The final Solution
Here is the complete listing of the solution submitted by Jim.
DECLARE @t TABLE (
InvID INT IDENTITY,
BatchNumber INT,
InvoiceNumber INT,
VisitDate DATETIME,
Amount MONEY )
INSERT INTO @t(BatchNumber, InvoiceNumber, VisitDate, Amount)
SELECT 10000001,20001,'2009-01-01',50.00 UNION ALL
SELECT 10000001,20001,'2009-01-02',50.00 UNION ALL
SELECT 10000001,20001,'2009-01-03',50.00 UNION ALL
SELECT 10000001,20001,'2009-01-04',50.00 UNION ALL
SELECT 10000001,20002,'2009-01-01',50.00 UNION ALL
SELECT 10000001,20002,'2009-01-02',50.00 UNION ALL
SELECT 10000001,20002,'2009-01-03',50.00 UNION ALL
SELECT 10000001,20003,'2009-01-01',50.00 UNION ALL
SELECT 10000001,20003,'2009-01-02',50.00 UNION ALL
SELECT 10000001,20004,'2009-01-01',50.00 UNION ALL
SELECT 10000001,20004,'2009-01-02',50.00 UNION ALL
SELECT 10000001,20004,'2009-01-03',50.00 UNION ALL
SELECT 10000001,20004,'2009-01-04',50.00 UNION ALL
SELECT 10000001,20004,'2009-01-05',50.00 UNION ALL
SELECT 10000001,20004,'2009-01-06',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-01',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-02',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-13',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-14',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-15',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-06',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-07',50.00 UNION ALL
SELECT 10000001,20005,'2009-01-08',50.00 UNION ALL
SELECT 10000001,20006,'2009-01-01',50.00 UNION ALL
SELECT 10000002,20007,'2009-01-01',50.00 UNION ALL
SELECT 10000002,20007,'2009-01-02',50.00 UNION ALL
SELECT 10000002,20007,'2009-01-03',50.00 UNION ALL
SELECT 10000002,20008,'2009-01-01',50.00 UNION ALL
SELECT 10000002,20008,'2009-01-02',50.00 UNION ALL
SELECT 10000002,20008,'2009-01-03',50.00
;WITH MyCTE AS (
SELECT
ROW_NUMBER() OVER (ORDER BY BatchNumber, InvoiceNumber) AS InvID,
BatchNumber,
InvoiceNumber,
VisitDate,
(
(
ROW_NUMBER() OVER
(PARTITION BY BatchNumber ORDER BY InvoiceNumber) / 10
) + 1
) AS [Working_Set]
FROM @t
)
SELECT
MyCTE.InvID,
MyCTE.BatchNumber,
MyCTE.InvoiceNumber,
MyCTE.VisitDate,
t2.[Set]
FROM MyCTE
INNER JOIN (
SELECT
InvoiceNumber,
MAX([Working_Set]) AS [Set],
COUNT(InvID) AS number_in_file
FROM MyCTE
GROUP BY InvoiceNumber
) t2 ON MyCTE.InvoiceNumber = t2.InvoiceNumber
/*
InvID BatchNumber InvoiceNumber VisitDate Set
----- ----------- ------------- ---------- ----
1 10000001 20001 2009-01-01 1
2 10000001 20001 2009-01-02 1
3 10000001 20001 2009-01-03 1
4 10000001 20001 2009-01-04 1
5 10000001 20002 2009-01-01 1
6 10000001 20002 2009-01-02 1
7 10000001 20002 2009-01-03 1
8 10000001 20003 2009-01-01 1
9 10000001 20003 2009-01-02 1
10 10000001 20004 2009-01-01 2
11 10000001 20004 2009-01-02 2
12 10000001 20004 2009-01-03 2
13 10000001 20004 2009-01-04 2
14 10000001 20004 2009-01-05 2
15 10000001 20004 2009-01-06 2
16 10000001 20005 2009-01-01 3
17 10000001 20005 2009-01-02 3
18 10000001 20005 2009-01-13 3
19 10000001 20005 2009-01-14 3
20 10000001 20005 2009-01-15 3
21 10000001 20005 2009-01-06 3
22 10000001 20005 2009-01-07 3
23 10000001 20005 2009-01-08 3
24 10000001 20006 2009-01-01 3
25 10000002 20007 2009-01-01 1
26 10000002 20007 2009-01-02 1
27 10000002 20007 2009-01-03 1
28 10000002 20008 2009-01-01 1
29 10000002 20008 2009-01-02 1
30 10000002 20008 2009-01-03 1
*/
Please join me to congratulate Jim for this wonderful solution!