MisterMagoo:I might have a play with pseudo-recursion to see if that can help at all with this one...
dishdy: MisterMagoo:I might have a play with pseudo-recursion to see if that can help at all with this one... So you are the second person mentioning this term 'pseudo-recursion'.I would be curious to know what this refers to.Stefan mentioned it in the previous challenge but the solutions are not yet posted.Thus I'm crying out of curiosity. :-)
Well, I picked up the term from looking at one of Stefan's solutions here : http://beyondrelational.com/tc/submissions/3196/default.aspx
Basically, it seems to involve a recursive CTE that selects zero rows in the recursive part deliberately.
This is the first time I have tried it - but for me it did not provide any benefit - in fact it made my solution slower.
I am not sure exactly what the effect is - but when I have time to investigate the execution plan i am sure it will be obvious.
This article sorts of explains it
http://bradsruminations.blogspot.com/2010/03/this-article-on-recurson-is-entitled.html
My understanding is that it prevents the input to a recursive cte from being evaluated lots of times. Instead it's spooled into a work table.
For my solution is decreased the time but increased the number of reads
regards
David
dishdy: @jimbobmcgeeI see what you're asking.You want to see the output from the performance data to see if your solution, besides going like lightning, also produces correct results. :-)I can give you an answer later today when I get to my computer.
@jimbobmcgeeI see what you're asking.You want to see the output from the performance data to see if your solution, besides going like lightning, also produces correct results. :-)I can give you an answer later today when I get to my computer.
I've given up on 'going like lightning' for the time being. Too much talk of sub-400ms execution times for me!
Correct results FTW!
Stefan_G: jimbobmcgee: Stefan_G; Since your performance test code is essentially adding, I am right in thinking that you can summarise the expected results as follows...? No, that is not correct.
jimbobmcgee: Stefan_G; Since your performance test code is essentially adding, I am right in thinking that you can summarise the expected results as follows...?
Stefan_G; Since your performance test code is essentially adding, I am right in thinking that you can summarise the expected results as follows...?
No, that is not correct.
Yeah, I realised that pretty soon after!!
Stefan_G: The expected result from the performance test is this: FirstCardNo LastCardNo Changes ----------- ----------- ----------- 11 116 204 196 200 165 ...9994 9971 41 (98 row(s) affected)
The expected result from the performance test is this:
FirstCardNo LastCardNo Changes ----------- ----------- ----------- 11 116 204 196 200 165 ...9994 9971 41 (98 row(s) affected)
Thanks for these. At this rate, I reckon if I could ever get my V1 to actually finish recursing 10,000+ rows, I might have a good chance of matching the rowcount, at least!!
Pseudo-recursion is my name for a general purpose method of forcing SQL server to insert a spool at almost any point in a execution plan.This method is quite useful for these challenges but in real life using a temporary table is almost always better.
Lets look at an example
Use the following script to generate some test data.
if object_id('tempdb..#t') is not null drop table #t create table #t (dt datetime, value decimal(20,4)) declare @i int=0 set nocount on begin tran while @i<10000 begin insert into #t (dt, value) (select dateadd(second, rand()*1000000, '20110101'), rand()*10) set @i=@i+1 end commit
We now calculate a running sum using a few different methods:
-- triangular join select dt, value, sum(aval) as rsum from ( select b.dt, b.value, a.value as aval from #t a, #t b where a.dt<=b.dt ) t group by dt, value order by dt -- CPU time = 26582 ms, elapsed time = 27036 ms. -- recursion ; with cte1 as ( select dt, value, row_number() over (order by dt) as rn from #t ) , cte2 as ( select dt, value, value as rsum, rn from cte1 where rn=1 union all select n.dt, n.value, cast(p.rsum+n.value as decimal(20,4)) as rsum, n.rn from cte2 p, cte1 n where p.rn+1 = n.rn ) select * from cte2 option (maxrecursion 0) -- CPU time = 78999 ms, elapsed time = 80245 ms. -- Now add some pseudo-recursion to the recursive solution ; with cte1 as ( select dt, value, row_number() over (order by dt) as rn from #t union all -- pseudo-recursion. This query should not return any rows select * from cte1 where rn<0 ) , cte2 as ( select dt, value, value as rsum, rn from cte1 where rn=1 union all select n.dt, n.value, cast(p.rsum+n.value as decimal(20,4)) as rsum, n.rn from cte2 p, cte1 n where p.rn+1 = n.rn ) select * from cte2 option (maxrecursion 0) -- CPU time = 327 ms, elapsed time = 410 ms.
As you can see pseudo-recursion can in some cases increase performance quite dramatically.
To see what is happening we can look at the query plan for the recursive query and the query with pseudo-recursion:
This query executes the sort to generate row numbers for each iteration in the recursion.
This query first uses pseudo-recursion to store the result of cte1 in a index spool. This means that the sort is perfored only once. The data is then moved to an index spool which is used to find the next row using the join condition in cte2.
The performance increase in this case is dramatic!
Stefan_G: Pseudo-recursion is my name for a general purpose method of forcing SQL server to insert a spool at almost any point in a execution plan.
Pseudo-recursion is my name for a general purpose method of forcing SQL server to insert a spool at almost any point in a execution plan.
Thanks for that last post - it is a very interesting technique - but I figure the hardest part is understanding when it will help ...
MisterMagoo: Stefan_G:Good job!Impressive that you manage to do this with only three scans of the original table. Thanks - I must admit it was a shock to me too! I had to run it again and again before I believed it!The sudden change from 8 seconds to 300ms came about from 1 change in the code that I kind of felt might help, but I had no idea it would help this much.I have since added a little bit to the reads on the worktable as I noticed that chains that consisted of one row of "no change" data (e.g. 77=>77) were not being picked up , but the change to the stats was so small not to bother posting.I might have a play with pseudo-recursion to see if that can help at all with this one...
Stefan_G:Good job!Impressive that you manage to do this with only three scans of the original table.
Congratulations Leszek - that was a very close race in the end!
MisterMagoo: Congratulations Leszek - that was a very close race in the end!
Thanks, and yes, differences are minimal.
I just received a mail that my solution is not accepted because it timed out.
Strange, because when I ran the load test, I have following results :
SQL Server parse and compile time:
CPU time = 47 ms, elapsed time = 53 ms.
(296 row(s) affected)
Table 'Worktable'. Scan count 59485, logical reads 754060, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TC47'. Scan count 15, logical reads 1560, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 4632 ms, elapsed time = 2107 ms.