-
I guess I don't understand why people value doing something "using a single query" so much. And, without a bit of documentation in the code, most people won't recognize what is being done.
commented on Jun 11 2012 7:14AM
|
-
Its a performance's matter. If you don't care about making 2 somewhat equivalent queries run in one transaction, you'd better have spare hardware resources.
commented on Jun 11 2012 7:51AM
|
-
If you insert to the temp table, then select from it, I can't really see this being a huge resource hog. Even if the select is from a real table there just can't be that much performance improvement with the MERGE. The server has to put the rows into the table variable either way. If it doesn't spill into tempdb it will all be in memory, anyway. If it does spill, you would have that issue. And if you really need all that data in a table variable it's because you're going to do something with it--most likely some kind of row-by-row processing or further inserting, which will necessitate reads from tempdb anyway.
Last, optimizing before you know there's a performance problem can be wasteful. Using MERGE in this way is clever, but it's not a perfect home run. Have you considered all the bugs that have been revealed with MERGE, especially in conjunction with parallelism? MERGE also is subject to race conditions that can yield blocking, deadlocking, or incorrect values (you still have to use appropriate locks when doing things like Max() from a table).
As I understand it, code should be, in this order:
- Correct: it must do the right thing
- Clear: it should be easy to understand (without sacrificing correctness)
- Concise: it should be a short as possible (as long as it is still clear and correct)
- Fast
Fast comes last. Sacrificing the other 3 to fast comes only after proven real-world system considerations force it (making a device driver able to perform real-time, a tight loop GUI routine able to paint fast enough, etc.).
commented on Jun 11 2012 11:41AM
|
-
Its a performance's matter. If you don't care about making 2 somewhat equivalent queries run in one transaction, you'd better have spare hardware resources.
Ok... you said it... now prove it. ;-) How is it a "performance's [sic] matter"??? I've taken many a long running query (up to several hours long) and have reduced them to 3 second to 10 minute queries simply by dividing up such "all in one queries" into smaller queries that use interim tables in Temp DB. One such example was a dupe check across 63 databases... they really want to do the check across 94 database but couldn't because the code took 10 to 24 hours to run and it usually failed. Using "Divide,'n'Conquer" methods, I got it down to 11 minutes and it hasn't failed yet.
commented on Jun 11 2012 5:51PM
|
-
@Jeff, for what it's worth, I'm deeply disappointed with SQL Server's inability to (reliably? at all?) reuse the data from a CTE when it is referenced multiple times. I have worked with Oracle not much, but have done a couple of complicated queries using CTEs with multiple references. The execution plan only had the CTE once, proving that Oracle, at least, does know how to reuse data from a CTE, whereas SQL Server would have had it once for each reference.
I mention this not to disagree with you on the utility of using temp tables in SQL Server, but simply to point out that at least some of the need to do so is actually a query engine limitation. I had to convert a CTE into a temp table just last week because the execution plan showed it doing the same work 3 times--but then I had serious problems with my SSIS package trying to get metadata because it uses SET FMTONLY ON, plus then the column data types had changed. It seems like sometimes you just can't win!
commented on Jun 11 2012 11:05PM
|
-
I haven't heard of MERGE being buggy. Do you have any online resources that discuss the matter ?
Anyway about the performance improvement consider this scenario:
You have two tables, a "parents" table and a "children" table where (as usual) a "parent" can have any number of "children".
You need to retrieve a few "parents" according to a very complex filtering rules (which is time consuming and requires joins with several other table).
And you also need to retrieve every "children" of those parents.
Now you can make it like this:
TRANSACTION
SELECT Parents + Very complex filtering (slow + many locks)
SELECT Children + JOIN Parents + Same very complex filtering (slow + many locks*)
COMMIT
- Way to raise lock escalation and deadlock situations.
Or you can make it like this:
TRANSACTION
MERGE : SELECT Parents & STORE parentID + Very complex filtering (slow + many locks)
SELECT Children + Filter on parentID (very fast + lock on "Children" only)
COMMIT
Hope, you'll find that relevant.
commented on Jun 12 2012 2:51AM
|
-
Joining on in on the discussion I too have found that splitting up large, complex queries into smaller, simplified building blocks makes for better code.
Better being (IMHO) :
- easier to understand for the person who has to maintain it
- apparently easier to understand for the Query Optimizer as in most cases (99%) these things run A LOT faster than their original counterparts.
An additional benefit being that you can report progress in-between the different steps.
There seems to be this 'misunderstanding' with the new generation of programmers that 'short code' means 'fast code'. In SQL this translates to people thinking that when they do a SELECT * FROM [someview], this MUST be fast, after all it's just 25-something characters long ! To get there they'll join gazillions of tables and views upon views until nobody really knows any more what's coming from where. Same with CTE's although there at least you see how things grow big quickly. In c# I've noticed people doing stuff like Dictionary.AddRange( [somedataset].ToArray() ).Contains([...]) (++) etc. It looks so innocent and yet gargantuan amounts of IO are hidden in those few 'words' ! Sure, in Test these things just work, but when they go to production everybody seems to be surprised it takes several minutes of 100% CPU to go through the thousands of entries in the starting collection.. Often I wished people would be forced to learn machine-code again before they can touch 4th generation languages... some basic understanding on how a CPU works wouldn't hurt.
(++: from the top of my head, I don't write c# often but am 'allowed' to suggest improvements here and there)
PS: I gave up on the TSQL challenges because I really can't see the benefit of me learning how to tweak stuff within the restriction that it needs to be a single query. Looking at some of the solutions there I can appreciate their ingenuity but most of them are "Rube Goldberg queries" IMHO.
commented on Jun 12 2012 5:56AM
|
-
Sergejack, here are a few of the bugs with MERGE. Some may be worked out.
Looking at your example, couldn't one just OUTPUT the ParentIDs to a table variable/temp table without needing to use MERGE at all? Then join to this for the children?
commented on Jun 12 2012 12:09PM
|
-
I haven't heard of MERGE being buggy.
Do you have any online resources that
discuss the matter ?
Anyway about the performance
improvement consider this scenario:
You have two tables, a "parents" table
and a "children" table where (as
usual) a "parent" can have any number
of "children". You need to retrieve a
few "parents" according to a very
complex filtering rules (which is time
consuming and requires joins with
several other table). And you also
need to retrieve every "children" of
those parents.
Now you can make it like this:
TRANSACTION
SELECT Parents + Very complex
filtering (slow + many locks)
SELECT Children + JOIN Parents + Same
very complex filtering (slow + many
locks*)
COMMIT
Way to raise lock escalation and
deadlock situations. Or you can make
it like this:
TRANSACTION
MERGE : SELECT Parents & STORE
parentID + Very complex filtering
(slow + many locks)
SELECT Children + Filter on parentID
(very fast + lock on "Children" only)
COMMIT
Hope, you'll find that relevant.
Apologies but that's no proof. That just you saying the same thing again without proof.
Let's do a little test. You write some "single query" code to a create a "Materialized Binary Path" and I'll do the same thing using a WHILE Loop (just to make it really interesting). If you're interested, I'll even provide the code to generate a million row "DAG" Hierarchy for you to play with. Then we'll compare and see which is faster and which uses more resources.
I have many more examples that we can test but one should be good for a start.
commented on Jun 12 2012 7:24PM
|
-
To clarify, the single query should convert an "Adjacency List" (using EmployeeID and ManagerID as the two concerning columns) to a binary "Materialized Path". To make life simple, the original IDs should be INT. The materialized path should be the INT converted to BINARY(4) and concatenated into a VARBINARY(8000).
Like I said, if you're game, I'll be happy to provide the code to build the "DAG" Hierarchy.
commented on Jun 12 2012 7:30PM
|
-
Jeff, I won't discuss that matter here, as it's not related to the tips.
Just know that I also have taken long running queries (taking hours) to reduce their duration to a few milliseconds (using indexes, and something hints).
commented on Jun 13 2012 2:28AM
|
-
@ErikEckhardt
Looking at your example, couldn't one just OUTPUT the ParentIDs to a table variable/temp table without needing to use MERGE at all? Then join to this for the children?
As you also need to retrieve the "parents' row" this would result in 3 queries.
- STORE parentID (INSERT)
- SELECT Parents (using parentID)
- SELECT Children (using parentID)
You could say "it's not that bad", but what if you don't only want "parents" then "children" but "parents + extra data" then "children"?
You could still store parentID + extra Data but that would lead to even more disk's writings.
My examples were quite simple on purpose but don't let that make you think this pattern isn't useful ;-)
commented on Jun 13 2012 5:08AM
|
-
Jeff, I won't discuss that matter
here, as it's not related to the tips.
To heck they're not. You brought up that doing things all in one query is better than breaking them up.
If you don't want to participate, that's fine. Just don't put something out there and not expect people to come back on you if they think you're wrong. ;-)
commented on Jun 24 2012 5:23PM
|
-
If my tip was "How to change your desktop wall paper" you would be asking "Why? Is the default not good enough?".
Explaining you what benefits there are in using a fewer number of statements and avoiding temporary table sounds as tiresome as explaining you why some people prefer to see a puppy instead of the sea when then turn on their computer.
commented on Jun 25 2012 2:17AM
|
-
Not a very good attitude, Sergejack. Fewer numbers of statements are patently not a pancea for performance nor is the avoidance of temporary tables. I issued a challenge to that notion. Either take the challenge or stop insisting it's the better way. Sometimes it is and sometimes it isn't.
commented on Jun 25 2012 6:47AM
|
-
You got me wrong, I don't disrespect your saying I'm just not willing to discuss it and I gave you my feelings as an invitation to check what may be said about it by people that actually wanted to discuss it.
commented on Jun 25 2012 7:02AM
|
-
Looking at your example, couldn't one just OUTPUT the ParentIDs to a table variable/temp table without needing to use MERGE at all? Then join to this for the children?
As you also need to retrieve the
"parents' row" this would result in 3
queries. - STORE parentID (INSERT) -
SELECT Parents (using parentID) -
SELECT Children (using parentID)
You could say "it's not that bad", but
what if you don't only want "parents"
then "children" but "parents + extra
data" then "children"? You could still
store parentID + extra Data but that
would lead to even more disk's
writings.
Then INSERT parents (all data) to a temp table, SELECT this data, and SELECT children by joining to the temp table. If the result set is small enough, it will fit in memory without using tempdb (whether a table variable or temp table). If the individual-statement method I'm suggesting has to spill to tempdb, I doubt you'll be getting the lazy-spool type of operation needed to avoid spilling to tempdb with the MERGE either.
I appreciate the tip you offered and like knowing how to use and abuse every feature of SQL Server. How else can I aspire to expertise in it but to know it inside-and-out this way? Yet, it was the performance claim that led me to object. I don't think it's good to sacrifice correctness, clarity, or conciseness for performance without dire need, and only after identifying the correct, clear, concise code as a true bottleneck or significant performance inhibitor.
Your query-choosing model seems to be about trying to spare the database from using unneeded resources (especially writing to disk). What I'm saying is that the MERGE statement may not in fact give the performance improvement you're looking for, or its performance improvement may be minuscule and thus not well-chosen for violating clarity. Have you considered that using the MERGE may in fact involve just as many reads and writes as the alternate ways I suggested? The number of SQL statements does not match up perfectly with the work performed, as you well know.
I would be intensely interested in your publishing a scenario showing the MERGE method giving substantially fewer reads than a more clear temp-table-insertion version.
commented on Jun 25 2012 12:50PM
|
-
Here you are.
SET NOCOUNT ON
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[t388Part]') AND type in (N'U'))
DROP TABLE [dbo].[t388Part]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[t388Article]') AND type in (N'U'))
DROP TABLE [dbo].[t388Article]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[t388Blacklist]') AND type in (N'U'))
DROP TABLE [dbo].[t388Blacklist]
GO
SELECT
Src.name -- Non unique
INTO t388Blacklist
FROM (
VALUES('A%'), ('LAS%'), ('ERREM'), ('CORPORA'), ('AT'), ('SEA'), ('SIT'), ('PETENTIUM'), ('QUAERENDUM'), ('EU'), ('FASTIDII'), ('OFFICIIS'), ('MEI'), ('AN'), ('EST'), ('TE'), ('FABULAS'), ('PERICULIS'), ('EU'), ('TATION'), ('OFFICIIS'), ('HIS'), ('SIT'), ('AUTEM'), ('INVIDUNT'), ('UT')
) AS Src([name])
GO
SELECT DISTINCT
Src.name -- Unique
INTO t388Article
FROM (
VALUES('STET'), ('PORRO'), ('PUTENT'), ('NO'), ('PRO'), ('DICO'), ('DOLORES'), ('ACCOMMODARE'), ('NEC'), ('EU'), ('USU'), ('TE'), ('ZRIL'), ('SCRIPSERIT'), ('NECESSITATIBUS'), ('IUS'), ('SUMO'), ('QUIDAM'), ('FACILISIS'), ('EA'), ('IUS'), ('AT'), ('ATQUI'), ('REFERRENTUR'), ('EA'), ('VIRIS'), ('OCURRERET'), ('CONSTITUAM'), ('MEI'), ('PRI'), ('IN'), ('IRIURE'), ('RECTEQUE'), ('VOLUPTATUM'), ('EX'), ('DOLORE'), ('PUTENT'), ('VOLUPTATUM'), ('USU')
) AS Src([name])
CROSS APPLY (
SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS BIGINT))
) AS Random(Number)
GO
SELECT
A.name AS refArticle
, X.Number
INTO t388Part
FROM t388Article AS A
CROSS APPLY (
SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS BIGINT))
) AS Random(Number)
-- 1 to 5 parts
CROSS APPLY (
SELECT Src.Number
FROM ( VALUES(1),(2),(3),(4),(5) ) AS Src(Number)
WHERE Src.Number <= (Random.Number % 5) + 1
) AS X
GO
SET STATISTICS IO ON
PRINT '
[No op]
'
PRINT '[No op] Step 1: Articles blacklisted more than once'
SELECT
A.name, COUNT(*) AS NbBlacklist, SUM(CASE WHEN A.name = BL.name THEN 1 ELSE 0 END) AS ExactMatch
FROM t388Article AS A
INNER JOIN t388Blacklist AS BL ON (
A.name LIKE BL.name
)
GROUP BY A.name
HAVING COUNT(*) > 1
PRINT('')
PRINT '[No op] Step 2: Related parts'
SELECT
P.refArticle
, P.Number
FROM t388Part AS P
WHERE P.refArticle IN (
SELECT
A.name
FROM t388Article AS A
INNER JOIN t388Blacklist AS BL ON (
A.name LIKE BL.name
)
GROUP BY A.name
HAVING COUNT(*) > 1
)
PRINT('')
GO
PRINT '
[insert]
'
PRINT '[insert] Step 1: Articles blacklisted more than once'
DECLARE @tmp TABLE (
[name] VARCHAR(MAX)
, NbBlacklist INT
, ExactMatch INT
)
INSERT @tmp
SELECT
A.name, COUNT(*) AS NbBlacklist, SUM(CASE WHEN A.name = BL.name THEN 1 ELSE 0 END) AS ExactMatch
FROM t388Article AS A
INNER JOIN t388Blacklist AS BL ON (
A.name LIKE BL.name
)
GROUP BY A.name
HAVING COUNT(*) > 1
SELECT * FROM @tmp
PRINT('')
PRINT '[insert] Step 2: Related parts'
SELECT
P.refArticle
, P.Number
FROM t388Part AS P
INNER JOIN @tmp AS T ON (
P.refArticle = T.name
)
PRINT('')
GO
PRINT '
[Merge]
'
PRINT '[Merge] Step 1: Articles blacklisted more than once'
DECLARE @tmp TABLE (
[name] VARCHAR(MAX)
)
MERGE @tmp AS T
USING (
SELECT
A.name, COUNT(*) AS NbBlacklist, SUM(CASE WHEN A.name = BL.name THEN 1 ELSE 0 END) AS ExactMatch
FROM t388Article AS A
INNER JOIN t388Blacklist AS BL ON (
A.name LIKE BL.name
)
GROUP BY A.name
HAVING COUNT(*) > 1
) AS Src
ON ( 1 = 2 )
WHEN NOT MATCHED THEN
INSERT ([name])
VALUES (Src.[name])
OUTPUT Src.*
;
PRINT('')
PRINT '[Merge] Step 2: Related parts'
SELECT
P.refArticle
, P.Number
FROM t388Part AS P
INNER JOIN @tmp AS T ON (
P.refArticle = T.name
)
PRINT('')
GO
commented on Jun 26 2012 3:17AM
|
-
Here you are.
SET NOCOUNT ON
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[t388Part]') AND type in (N'U'))
DROP TABLE [dbo].[t388Part]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[t388Article]') AND type in (N'U'))
DROP TABLE [dbo].[t388Article]
GO
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[t388Blacklist]') AND type in (N'U'))
DROP TABLE [dbo].[t388Blacklist]
GO
SELECT
Src.name -- Non unique
INTO t388Blacklist
FROM (
VALUES('A%'), ('LAS%'), ('ERREM'), ('CORPORA'), ('AT'), ('SEA'), ('SIT'), ('PETENTIUM'), ('QUAERENDUM'), ('EU'), ('FASTIDII'), ('OFFICIIS'), ('MEI'), ('AN'), ('EST'), ('TE'), ('FABULAS'), ('PERICULIS'), ('EU'), ('TATION'), ('OFFICIIS'), ('HIS'), ('SIT'), ('AUTEM'), ('INVIDUNT'), ('UT')
) AS Src([name])
GO
SELECT DISTINCT
Src.name -- Unique
INTO t388Article
FROM (
VALUES('STET'), ('PORRO'), ('PUTENT'), ('NO'), ('PRO'), ('DICO'), ('DOLORES'), ('ACCOMMODARE'), ('NEC'), ('EU'), ('USU'), ('TE'), ('ZRIL'), ('SCRIPSERIT'), ('NECESSITATIBUS'), ('IUS'), ('SUMO'), ('QUIDAM'), ('FACILISIS'), ('EA'), ('IUS'), ('AT'), ('ATQUI'), ('REFERRENTUR'), ('EA'), ('VIRIS'), ('OCURRERET'), ('CONSTITUAM'), ('MEI'), ('PRI'), ('IN'), ('IRIURE'), ('RECTEQUE'), ('VOLUPTATUM'), ('EX'), ('DOLORE'), ('PUTENT'), ('VOLUPTATUM'), ('USU')
) AS Src([name])
CROSS APPLY (
SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS BIGINT))
) AS Random(Number)
GO
SELECT
A.name AS refArticle
, X.Number
INTO t388Part
FROM t388Article AS A
CROSS APPLY (
SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS BIGINT))
) AS Random(Number)
-- 1 to 5 parts
CROSS APPLY (
SELECT Src.Number
FROM ( VALUES(1),(2),(3),(4),(5) ) AS Src(Number)
WHERE Src.Number <= (Random.Number % 5) + 1
) AS X
GO
SET STATISTICS IO ON
PRINT '
[No op]
'
PRINT '[No op] Step 1: Articles blacklisted more than once'
SELECT
A.name, COUNT(*) AS NbBlacklist, SUM(CASE WHEN A.name = BL.name THEN 1 ELSE 0 END) AS ExactMatch
FROM t388Article AS A
INNER JOIN t388Blacklist AS BL ON (
A.name LIKE BL.name
)
GROUP BY A.name
HAVING COUNT(*) > 1
PRINT('')
PRINT '[No op] Step 2: Related parts'
SELECT
P.refArticle
, P.Number
FROM t388Part AS P
WHERE P.refArticle IN (
SELECT
A.name
FROM t388Article AS A
INNER JOIN t388Blacklist AS BL ON (
A.name LIKE BL.name
)
GROUP BY A.name
HAVING COUNT(*) > 1
)
PRINT('')
GO
PRINT '
[insert]
'
PRINT '[insert] Step 1: Articles blacklisted more than once'
DECLARE @tmp TABLE (
[name] VARCHAR(MAX)
, NbBlacklist INT
, ExactMatch INT
)
INSERT @tmp
OUTPUT inserted.*
SELECT
A.name, COUNT(*) AS NbBlacklist, SUM(CASE WHEN A.name = BL.name THEN 1 ELSE 0 END) AS ExactMatch
FROM t388Article AS A
INNER JOIN t388Blacklist AS BL ON (
A.name LIKE BL.name
)
GROUP BY A.name
HAVING COUNT(*) > 1
PRINT('')
PRINT '[insert] Step 2: Related parts'
SELECT
P.refArticle
, P.Number
FROM t388Part AS P
INNER JOIN @tmp AS T ON (
P.refArticle = T.name
)
PRINT('')
GO
PRINT '
[Merge]
'
PRINT '[Merge] Step 1: Articles blacklisted more than once'
DECLARE @tmp TABLE (
[name] VARCHAR(MAX)
)
MERGE @tmp AS T
USING (
SELECT
A.name, COUNT(*) AS NbBlacklist, SUM(CASE WHEN A.name = BL.name THEN 1 ELSE 0 END) AS ExactMatch
FROM t388Article AS A
INNER JOIN t388Blacklist AS BL ON (
A.name LIKE BL.name
)
GROUP BY A.name
HAVING COUNT(*) > 1
) AS Src
ON ( 1 = 2 )
WHEN NOT MATCHED THEN
INSERT ([name])
VALUES (Src.[name])
OUTPUT Src.*
;
PRINT('')
PRINT '[Merge] Step 2: Related parts'
SELECT
P.refArticle
, P.Number
FROM t388Part AS P
INNER JOIN @tmp AS T ON (
P.refArticle = T.name
)
PRINT('')
GO
You will notice insert and merge performs the same, but you know it wouldn't if the temporary/variable table were filled enough to be written on the disks.
Note: I didn't found the submit button on the edit page :S
commented on Jun 26 2012 3:23AM
|
-
I modified your script to end up with the following row counts:
- t388article: 340,000
- t388part: 1,019,564
- t388BlackList: 26 -- all with '%' afterward to match my unique article names with numbers appended.
Here are the results:
Insert
Insert Step 1: Articles blacklisted more than once
- Table '#540C7B00'. Scan count 0, logical reads 30103, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Table 'Worktable'. Scan count 1, logical reads 920229, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Table 't388Article'. Scan count 1, logical reads 942, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Table 't388Blacklist'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Insert Step 2: Related parts
- Table '#540C7B00'. Scan count 1, logical reads 104, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Table 't388Part'. Scan count 3, logical reads 3326, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Merge
Merge Step 1: Articles blacklisted more than once
- Table '#55F4C372'. Scan count 0, logical reads 30073, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Table 'Worktable'. Scan count 1, logical reads 920229, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Table 't388Article'. Scan count 1, logical reads 942, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Table 't388Blacklist'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Merge Step 2: Related parts
- Table '#55F4C372'. Scan count 1, logical reads 74, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
- Table 't388Part'. Scan count 3, logical reads 3326, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Summary
- Insert: 954,706 reads
- Merge: 954,646 reads -- 60 fewer
That's 0.00628% fewer reads. If I did my testing wrong, please let me know. But so far, I don't think the obscurity of the MERGE method is worth this tiny improvement. Feel free to add indexes, or restructure the tables, anything you like. Please forgive me reformatting the code, it helps me see it when it's in my usual style.
SET NOCOUNT ON
SET STATISTICS TIME OFF
SET STATISTICS IO OFF
IF Object_ID('dbo.t388Part', 'U') IS NOT NULL DROP TABLE dbo.t388Part;
IF Object_ID('dbo.t388Article', 'U') IS NOT NULL DROP TABLE dbo.t388Article;
IF Object_ID('dbo.t388Blacklist', 'U') IS NOT NULL DROP TABLE dbo.t388Blacklist;
GO
SELECT
Src.Name -- Non unique
INTO t388Blacklist
FROM (
VALUES('A%'), ('LAS%'), ('ERREM%'), ('CORPORA%'), ('AT%'), ('SEA%'), ('SIT%'), ('PETENTIUM%'), ('QUAERENDUM%'), ('EU%'), ('FASTIDII%'), ('OFFICIIS%'), ('MEI%'), ('AN%'), ('EST%'), ('TE%'), ('FABULAS%'), ('PERICULIS%'), ('EU%'), ('TATION%'), ('OFFICIIS%'), ('HIS%'), ('SIT%'), ('AUTEM%'), ('INVIDUNT%'), ('UT%')
) Src(Name);
GO
WITH A (N) AS (SELECT 1 FROM (VALUES (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)) X (N)),
B (N) AS (SELECT 1 FROM A A1, A A2, A A3),
C (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM B B1, B B2, B B3),
Nums AS (SELECT N FROM C WHERE N <= 10000)
SELECT DISTINCT
Src.Name + Convert(varchar(11), Nums.N) Name -- Unique
INTO t388Article
FROM
(
VALUES('STET'), ('PORRO'), ('PUTENT'), ('NO'), ('PRO'), ('DICO'), ('DOLORES'), ('ACCOMMODARE'), ('NEC'), ('EU'), ('USU'), ('TE'), ('ZRIL'), ('SCRIPSERIT'), ('NECESSITATIBUS'), ('IUS'), ('SUMO'), ('QUIDAM'), ('FACILISIS'), ('EA'), ('IUS'), ('AT'), ('ATQUI'), ('REFERRENTUR'), ('EA'), ('VIRIS'), ('OCURRERET'), ('CONSTITUAM'), ('MEI'), ('PRI'), ('IN'), ('IRIURE'), ('RECTEQUE'), ('VOLUPTATUM'), ('EX'), ('DOLORE'), ('PUTENT'), ('VOLUPTATUM'), ('USU')
) Src(Name)
CROSS JOIN Nums
GO
SELECT
A.Name refArticle,
X.Number
INTO t388Part
FROM
t388Article A
CROSS APPLY (
SELECT ABS(CAST(CAST(NEWID() AS VARBINARY) AS BIGINT))
) Random (Number)
-- 1 to 5 parts
CROSS APPLY (
SELECT Src.Number
FROM (VALUES (1),(2),(3),(4),(5)) Src (Number)
WHERE Src.Number <= (Random.Number % 5) + 1
) X
GO
SET STATISTICS IO ON;
PRINT '
No op
';
PRINT 'No op Step 1: Articles blacklisted more than once'
SELECT
A.Name,
Count(*) NbBlacklist,
Sum(CASE WHEN A.Name = BL.Name THEN 1 ELSE 0 END) ExactMatch
FROM
t388Article A
INNER JOIN t388Blacklist BL
ON A.Name LIKE BL.Name
GROUP BY A.Name
HAVING Count(*) >= 2;
PRINT '';
PRINT 'No op Step 2: Related parts'
SELECT
P.refArticle,
P.Number
FROM
t388Part P
WHERE
EXISTS (
SELECT *
FROM
t388Article A
INNER JOIN t388Blacklist BL
ON A.Name LIKE BL.Name
WHERE
P.refArticle = A.Name
GROUP BY A.Name
HAVING Count(*) >= 2
);
PRINT '';
GO
PRINT '
insert
';
PRINT 'insert Step 1: Articles blacklisted more than once';
DECLARE @tmp TABLE (
Name varchar(Max),
NbBlacklist int,
ExactMatch int
);
INSERT @tmp
OUTPUT inserted.*
SELECT
A.Name,
Count(*) NbBlacklist,
Sum(CASE WHEN A.Name = BL.Name THEN 1 ELSE 0 END) ExactMatch
FROM
t388Article A
INNER JOIN t388Blacklist BL
ON A.Name LIKE BL.Name
GROUP BY A.Name
HAVING Count(*) >= 2;
PRINT '';
PRINT 'insert Step 2: Related parts';
SELECT
P.refArticle,
P.Number
FROM
t388Part P
INNER JOIN @tmp T
ON P.refArticle = T.Name;
PRINT '';
GO
PRINT '
Merge
';
PRINT 'Merge Step 1: Articles blacklisted more than once';
DECLARE @tmp TABLE (
Name varchar(MAX)
);
MERGE @tmp T
USING (
SELECT
A.Name,
Count(*) NbBlacklist,
Sum(CASE WHEN A.Name = BL.Name THEN 1 ELSE 0 END) ExactMatch
FROM
t388Article A
INNER JOIN t388Blacklist BL
ON A.Name LIKE BL.Name
GROUP BY A.Name
HAVING Count(*) >= 2
) Src
ON 1 = 2
WHEN NOT MATCHED THEN
INSERT (Name)
VALUES (Src.Name)
OUTPUT Src.*;
PRINT '';
PRINT 'Merge Step 2: Related parts'
SELECT
P.refArticle,
P.Number
FROM
t388Part P
INNER JOIN @tmp T
ON P.refArticle = T.Name;
PRINT '';
GO
select count(*) from t388part
select count(*) from t388article
select count(*) from t388Blacklist
commented on Jun 26 2012 1:30PM
|
|