Use this thread to discuss TSQL Challenge 43
Can there be garbage? (which is not the same as unknown elements)
For instance, will the first element always be BS?
No garbage.All BS elements will always be BS*segmentname.Invalid elements could have a length > 2.For example:RFK*XYZBSM*123DDDD*BEF*ABC.The first asterisk will always be there.
Any clue on the number of rows the load test will run with?
It's important to know beforehand because I'm going to balance I/O with CPU sonsumption but there could be some threesold were it would become overkill.
Sergejack:Any clue on the number of rows the load test will run with?It's important to know beforehand because I'm going to balance I/O with CPU sonsumption but there could be some threesold were it would become overkill.
Maybe I've overseen it, but how should the result be ordered? Is order by Sr correct?
Output should be sorted by the segment name, i.e. the name that appears after BS*.
Below is some simple tricky data, just to make sure we're all on the same wavelength. It should produce the following result:
truncate table tc43 insert into TC43select 'BS*T1SEG99*garbage' union all--------------------------------------------------select 'BS*T1SEG10*garbage' union allselect 'BE*' union all select 'BS*T1SEG09' union allselect 'SH*'+REPLICATE('garbage',200) union all select 'BS*T1SEG08' union allselect 'NM*'+REPLICATE('garbage',200) union allselect 'SH*'+REPLICATE('garbage',200) union allselect 'BE*' union all select 'BS*T1SEG07' union allselect 'RF*'+REPLICATE('garbage',200) union allselect 'NM*'+REPLICATE('garbage',200) union allselect 'SH*'+REPLICATE('garbage',200) union all select 'BS*T1SEG06' union allselect 'NM*'+REPLICATE('garbage',200) union allselect 'RF*'+REPLICATE('garbage',200) union allselect 'BE*' union all--------------------------------------------------select 'BS*T1SEG20*garbage' union allselect 'BAD1*'+REPLICATE('garbage',200) union allselect 'BE*' union all select 'BS*T1SEG19' union allselect 'SH*'+REPLICATE('garbage',200) union allselect 'BAD2*'+REPLICATE('garbage',200) union allselect 'BAD1*'+REPLICATE('garbage',200) union all select 'BS*T1SEG18' union allselect 'BAD3*'+REPLICATE('garbage',200) union allselect 'BAD2*'+REPLICATE('garbage',200) union allselect 'BAD1*'+REPLICATE('garbage',200) union allselect 'NM*'+REPLICATE('garbage',200) union allselect 'SH*'+REPLICATE('garbage',200) union allselect 'BE*' union all select 'BS*T1SEG17' union allselect 'RF*'+REPLICATE('garbage',200) union allselect 'NM*'+REPLICATE('garbage',200) union allselect 'SH*'+REPLICATE('garbage',200) union allselect 'BAD4*'+REPLICATE('garbage',200) union allselect 'BAD3*'+REPLICATE('garbage',200) union allselect 'BAD2*'+REPLICATE('garbage',200) union allselect 'BAD1*'+REPLICATE('garbage',200) union all select 'BS*T1SEG16' union allselect 'NM*'+REPLICATE('garbage',200) union allselect 'RF*'+REPLICATE('garbage',200) union allselect 'BAD5*'+REPLICATE('garbage',200) union allselect 'BAD4*'+REPLICATE('garbage',200) union allselect 'BAD3*'+REPLICATE('garbage',200) union allselect 'BAD2*'+REPLICATE('garbage',200) union allselect 'BAD1*'+REPLICATE('garbage',200) union allselect 'BE*'--------------------------------------------------select * from TC43
insert into TC43select 'BS*T1SEG99*garbage' union all--------------------------------------------------select 'BS*T1SEG10*garbage' union allselect 'BE*' union all
select 'BS*T1SEG09' union allselect 'SH*'+REPLICATE('garbage',200) union all
select 'BS*T1SEG08' union allselect 'NM*'+REPLICATE('garbage',200) union allselect 'SH*'+REPLICATE('garbage',200) union allselect 'BE*' union all
select 'BS*T1SEG07' union allselect 'RF*'+REPLICATE('garbage',200) union allselect 'NM*'+REPLICATE('garbage',200) union allselect 'SH*'+REPLICATE('garbage',200) union all
select 'BS*T1SEG06' union allselect 'NM*'+REPLICATE('garbage',200) union allselect 'RF*'+REPLICATE('garbage',200) union allselect 'BE*' union all--------------------------------------------------select 'BS*T1SEG20*garbage' union allselect 'BAD1*'+REPLICATE('garbage',200) union allselect 'BE*' union all
select 'BS*T1SEG19' union allselect 'SH*'+REPLICATE('garbage',200) union allselect 'BAD2*'+REPLICATE('garbage',200) union allselect 'BAD1*'+REPLICATE('garbage',200) union all
select 'BS*T1SEG18' union allselect 'BAD3*'+REPLICATE('garbage',200) union allselect 'BAD2*'+REPLICATE('garbage',200) union allselect 'BAD1*'+REPLICATE('garbage',200) union allselect 'NM*'+REPLICATE('garbage',200) union allselect 'SH*'+REPLICATE('garbage',200) union allselect 'BE*' union all
select 'BS*T1SEG17' union allselect 'RF*'+REPLICATE('garbage',200) union allselect 'NM*'+REPLICATE('garbage',200) union allselect 'SH*'+REPLICATE('garbage',200) union allselect 'BAD4*'+REPLICATE('garbage',200) union allselect 'BAD3*'+REPLICATE('garbage',200) union allselect 'BAD2*'+REPLICATE('garbage',200) union allselect 'BAD1*'+REPLICATE('garbage',200) union all
select 'BS*T1SEG16' union allselect 'NM*'+REPLICATE('garbage',200) union allselect 'RF*'+REPLICATE('garbage',200) union allselect 'BAD5*'+REPLICATE('garbage',200) union allselect 'BAD4*'+REPLICATE('garbage',200) union allselect 'BAD3*'+REPLICATE('garbage',200) union allselect 'BAD2*'+REPLICATE('garbage',200) union allselect 'BAD1*'+REPLICATE('garbage',200) union allselect 'BE*'--------------------------------------------------select * from TC43
dishdy,
thanks for the clarification - this make my third solution attempt :-)
Guenter
BTW: has anyone found a solution requiring less than one table scan per BS/BE-group? I think it should be possible but haven't found a better method to combine the 'missing' elements...
Guenther,I too have some performance problems. But I think it's probably due to my use of "for xml path('')" to pivot those bloody invalid elements.In any case, let's try this load testing script I included below. It has two variants, one with garbage data and one without garbage data (see the @garbage variable at the beginning). I get the following (horrible) statistics:
With garbage:(500 row(s) affected)Table 'Worktable'. Scan count 1004, logical reads 49186, physical reads 0, read-ahead reads 0, lob logical reads 560, lob physical reads 0, lob read-ahead reads 0.Table 'TC43'. Scan count 1514, logical reads 4286134, physical reads 1, read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 14671 ms, elapsed time = 14734 ms.Without garbage:(500 row(s) affected)Table 'Worktable'. Scan count 1004, logical reads 49186, physical reads 0, read-ahead reads 0, lob logical reads 560, lob physical reads 0, lob read-ahead reads 0.Table 'TC43'. Scan count 1514, logical reads 37850, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 11563 ms, elapsed time = 11562 ms.
truncate table tc43declare @maxsegs intdeclare @iseg intdeclare @ngood intdeclare @maxdups intdeclare @ndup intdeclare @i intdeclare @garbage varchar(2000)set @garbage=''set @garbage=REPLICATE('x',2000) -- comment this for the no-garbage version of the testset @maxsegs=500set @maxdups=15set @i=rand(1) -- fixed seed to make it repeatableset @iseg=@maxsegswhile @iseg>0 begin -- write start of sements insert into TC43 select 'BS*T2SEG'+right(convert(varchar,@iseg+1000000),5)+'*'+@garbage -- determine if we should generate no valid records or SH records or SH & NM records or SH & NM & RF records set @ngood=rand()*4 if @ngood>=1 begin set @ndup=rand()*@maxdups+1 set @i=0 while @i<@ndup begin insert into TC43 select 'RF*'+@garbage set @i=@i+1 end end if @ngood>=2 begin set @ndup=rand()*@maxdups+1 set @i=0 while @i<@ndup begin insert into TC43 select 'NM*'+@garbage set @i=@i+1 end end if @ngood>=3 begin set @ndup=rand()*@maxdups+1 set @i=0 while @i<@ndup begin insert into TC43 select 'SH*'+@garbage set @i=@i+1 end end -- in one forth of the segments generate a random number of invalid elements if rand()<=0.25 begin set @ndup=rand()*@maxdups+1 set @i=0 while @i<@ndup begin set @i=@i+1 insert into TC43 select 'BAD'+CONVERT(varchar,@ndup-@i+1)+'*'+@garbage insert into TC43 select 'BAD'+CONVERT(varchar,@ndup-@i+1)+'*'+@garbage end end -- for 4 out 5 segments generate a BE row set @i=RAND()*5 if @i>0 insert into TC43 select 'BE*'+@garbage set @iseg=@iseg-1 endselect * from TC43 order by Sr
@Dishdy, Nice load test script.
My first solution was much worse than yours so this inspired me to rewrite my solution completely.
I now get the following stats (with garbage)
(500 row(s) affected) Table 'Worktable'. Scan count 1123, logical reads 34778, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'TC43'. Scan count 129, logical reads 365199, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 920 ms, elapsed time = 1030 ms.
(500 row(s) affected)
Table 'Worktable'. Scan count 1123, logical reads 34778, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'TC43'. Scan count 129, logical reads 365199, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 920 ms, elapsed time = 1030 ms.
dishdy: All BS elements will always be BS*segmentname.
All BS elements will always be BS*segmentname.
I think this is slightly incorrect.
Quote from the challenge rules: The “BS” element may have additional information embedded into it (separated by asteriks). For example, “BS*AB12*SOMETHING ELSE*ETC”. In this example, “AB12’ should be identified as the segment name.
In other words: You cannot just assume that everything after BS* is the segmentname.
/SG
Stefan_G: dishdy: All BS elements will always be BS*segmentname.I think this is slightly incorrect.Quote from the challenge rules: The “BS” element may have additional information embedded into it (separated by asteriks). For example, “BS*AB12*SOMETHING ELSE*ETC”. In this example, “AB12’ should be identified as the segment name.In other words: You cannot just assume that everything after BS* is the segmentname./SG
dishdy: Stefan_G: dishdy: All BS elements will always be BS*segmentname.I think this is slightly incorrect.Quote from the challenge rules: The “BS” element may have additional information embedded into it (separated by asteriks). For example, “BS*AB12*SOMETHING ELSE*ETC”. In this example, “AB12’ should be identified as the segment name.In other words: You cannot just assume that everything after BS* is the segmentname./SG You're right. I stand corrected.
dishdy: On another topic, is anyone else using "for xml path('')" to pivot the bad elements into a comma separated list?I'm truly miffed by how badly it peforms inside a CTE (10000ms - yes, that's ten thousand) as opposed to outside a CTE (100ms - yes, that's one hundred) with the same data.Another thing that's surpising me is the effect of "options (force order)". Whenever that improves performance I rearrange tables in my queries until that effect goes away. Here it just doen't seem to go away. Has anyone every encountered this phenomena?
On another topic, is anyone else using "for xml path('')" to pivot the bad elements into a comma separated list?I'm truly miffed by how badly it peforms inside a CTE (10000ms - yes, that's ten thousand) as opposed to outside a CTE (100ms - yes, that's one hundred) with the same data.Another thing that's surpising me is the effect of "options (force order)". Whenever that improves performance I rearrange tables in my queries until that effect goes away. Here it just doen't seem to go away. Has anyone every encountered this phenomena?
Yes, I also use for xml path('').
Inside or outside a CTE should not matter. Can you show the two queries so we can take a look ?