Check out the original post @ http://www.epperlys.com
The Challenge
The details of the challenge can be found on Beyond Relational’s website under TSQL Challenge 30 – A Matrix Transposition challenge in TSQL. The challenge seemed to be targeting a couple of different skills:
- Effective use of the ROW_NUMBER() function
- Though not required, CTEs
- Pivoting and Unpivoting (at least conceptually)
- Keeping proper track of rows and columns
Before we dive in, I should point out that I received the 7th highest score . . . not the greatest, but I was happy just to be on the list. I tell you this not to boast, but to warn you that my solution has problems . . . but we’re also going to look at the steps (actually, one defining step) the winner took and why his solution was better.
My Solution
Let’s start with a view of the whole script and then attempt to unpack the thought processes.
with m as
(
select id, col1, ROW_NUMBER() over(partition by id order by col1) c from TC30_Metrics union all
select id, col2, ROW_NUMBER() over(partition by id order by col2) c from TC30_Metrics union all
select id, col3, ROW_NUMBER() over(partition by id order by col3) c from TC30_Metrics union all
select id, col4, ROW_NUMBER() over(partition by id order by col4) c from TC30_Metrics union all
select id, col5, ROW_NUMBER() over(partition by id order by col5) c from TC30_Metrics
)
,m2 as (
select
id
,col1
,c
,row_number() over(partition by id, c order by col1) as r
from m
)
,m3 as (
select
m2.id
,case m2.c when 1 then col1 end as col1
,case m2.c when 2 then col1 end as col2
,case m2.c when 3 then col1 end as col3
,case m2.c when 4 then col1 end as col4
,case m2.c when 5 then col1 end as col5
,r
from m2
)
select
m3.id
,max(m3.col1) as col1
,max(m3.col2) as col2
,max(m3.col3) as col3
,max(m3.col4) as col4
,max(m3.col5) as col5
from m3
group by m3.id, m3.r
order by id
New Column Assignment
First, I wanted to tag the values with their appropriate columns. As such, the goal of the first CTE was to put all of the values from the 5 columns into the same column of our first virtual table (effectively unpivoting). Furthermore, we need to reorder with them, so we use the ROW_NUMBER() function, ordering by the respective column. The “partition by id” portion of the OVER() clause is required because the challenge states that there may be multiple matrices that need to be transposed. This will start the ordering process over for each new ID column. The new column is simply named “c” to because it is the surrogate identifier for the columnar data. Now the view of the data looks like:
id col1 c
-- ----------- --------------------
1 1 1
1 3 2
1 7 3
1 8 4
1 9 5
1 1 1
1 3 2
1 4 3
1 5 4
1 8 5
1 0 1
1 1 2
1 2 3
1 2 4
1 5 5
1 1 1
1 1 2
1 3 3
1 4 4
1 6 5
1 3 1
1 4 2
1 5 3
1 6 4
1 7 5
Row, Row, ROW_NUMBER()
The second step in my logic was to assign the columnar data to rows according to value order. Again, simply performed by another ROW_NUMBER() function, partitioned by ID and c then ordered on the value. I named the new virtual column “r” because it is the surrogate identifier for the row in which the value is to be placed. This makes the data look like:
id col1 c r
-- ----------- -------------------- --------------------
1 0 1 1
1 1 1 2
1 1 1 3
1 1 1 4
1 3 1 5
1 1 2 1
1 1 2 2
1 3 2 3
1 3 2 4
1 4 2 5
1 2 3 1
1 3 3 2
1 4 3 3
1 5 3 4
1 7 3 5
1 2 4 1
1 4 4 2
1 5 4 3
1 6 4 4
1 8 4 5
1 5 5 1
1 6 5 2
1 7 5 3
1 8 5 4
1 9 5 5
Put It Back Together
This view of the data gives us the proper coordinates for constructing the final query. I chose to do this in two steps, but the following could have easily been combined. First, I performed the first step of a pivot so that the values got into the appropriate columns. You can note, by the data below, that this causes NULL values to be inserted as any one combination from the data above renders a value in only one column at a time. It’s kinda fun because we get this nice diagonal effect of the data:
id col1 col2 col3 col4 col5 r
-- ----------- ----------- ----------- ----------- ----------- --------------------
1 0 NULL NULL NULL NULL 1
1 1 NULL NULL NULL NULL 2
1 1 NULL NULL NULL NULL 3
1 1 NULL NULL NULL NULL 4
1 3 NULL NULL NULL NULL 5
1 NULL 1 NULL NULL NULL 1
1 NULL 1 NULL NULL NULL 2
1 NULL 3 NULL NULL NULL 3
1 NULL 3 NULL NULL NULL 4
1 NULL 4 NULL NULL NULL 5
1 NULL NULL 2 NULL NULL 1
1 NULL NULL 3 NULL NULL 2
1 NULL NULL 4 NULL NULL 3
1 NULL NULL 5 NULL NULL 4
1 NULL NULL 7 NULL NULL 5
1 NULL NULL NULL 2 NULL 1
1 NULL NULL NULL 4 NULL 2
1 NULL NULL NULL 5 NULL 3
1 NULL NULL NULL 6 NULL 4
1 NULL NULL NULL 8 NULL 5
1 NULL NULL NULL NULL 5 1
1 NULL NULL NULL NULL 6 2
1 NULL NULL NULL NULL 7 3
1 NULL NULL NULL NULL 8 4
1 NULL NULL NULL NULL 9 5
Lastly, we have to get rid of the NULL values and make sure that each matrix is ordered together (via ID column). This is accomplished by grouping on the ID and the surrogate row identifier (“r”) we established in the second step. With that, we have our solution!
What the Winner Did Better
The winner of this challenge, Muhammad AlPasha, smoked us all with one major approach difference. Looking back on the first thing I did for my solution (which EVERY other submission on the winners list did as well), unpivoting by using UNION ALL led to our demise. Performing 5 SELECT statements on the same table with, though you end up with one result set, causes 5 reads on the base table. Muhammad did it with one read (view his solution). Instead of the unpivot, he tacked on the new column assignments for each respective value by create 5 new columns. I think this approach feels antithetical to most of us out here in TSQL land because it feels like something a mainframe application would attempt to do. But it was genius because the number of reads that this solution incurred would always be 1/5 of anything the immediate UNPIVITORs were going to do. Below are the query plans of the new column id of the two solutions; the 5 reads from my solution is quite apparent. I realize you can’t see the detail of the pictures, but you can make out the differences:


The remainder of the mental gymnastics isn’t necessarily identical, but is quite similar. The second CTE explodes the data so that it is similar to my first step. It then continues to calculate the appropriate row and pivot the data back for the final result. It took a little more code, but was the most definitely the superior plan. Kudos, indeed.
. . . complete my joy by being of the same mind, having the same love, being in full accord and of one mind. Do nothing from rivalry or conceit, but in humility count others more significant than yourselves.
Philippians 2:2-3 ESV