Query Plan:
Query plan is a read-only data structure with set of commands that specify all the physical operations required to execute a particular query. Among all the operations, join operator and parallelism operator are the point of interest here.
For a single query, two copies of query plan can be used and they are serial plan and the parallel plan.
Join Operator:
Using Single table only Once in Single SELECT statement generate execution plan which have JOIN operator. Explain the reason for the same.
Nested loops joins perform a search on the inner-table for each row of the outer table, typically using an index. SQL Server engine decides
based on anticipated costs, whether to sort the outer input in order to improve locality of the searches on the index over the inner input.
The most obvious case for nested loops is when one input is very small and the other input is ideally indexed for the join.
Join operator is used when a query requires more references to the source data set. To answer this question, I have created a test table
and insert 600 records into this table.
---Quiz_Test_Bonus Table
create table Quiz_Test_Bonus (ID int identity (1,1) primary key, ProdID int, ItemCost money)
go
declare @i int
declare @pID int
declare @cost money
set @pID =0
set @cost = 25
set @i=0;
while @i<600
begin
Begin tran
INSERT INTO Quiz_Test_Bonus (ProdID,ItemCost) values (@pID+1,@cost+25)
select @i=@i+1
select @pID = @pID+1
select @cost = @cost+25
commit tran
end
In this example, there 600 records and the following statement will create an execution plan that has join operator, along with a table
spool and lazy spool. The most interesting feature of SQL Server Optimizer is, it caches data in the intermediate results which is a
Table Spool or Lazy Spool. Lazy spool reads data only when individual rows are required. This creates a temp table and builds this table
in a lazy manner, which means; it reads and stores data in a temp table only when the parent operator asks for a row. In the following
example, the Table Spool holds the values of ProdID column of the Compute Scalar. The second stream aggregate is doing an average
of ItemCost.
A spool reads the data and saves it on TempDB. This process is used whenever the SQL Optimizer knows that the density of the column
is high and the intermediate result is very complex to calculate. If this is the case, SQL makes the computation once, and stores the result
in the temporary space so it can search it later.
The Nested Loop operator joins the first and second parts of the execution plan as shown below. As we now, the nested loops scan a
table and join it with another table one row at time. When the aggregation is done, the Spool operator is called again, and it in turn calls
the Segment operator, which reads another segment of rows. This cycle repeats until all rows are read. It will create a temporary table
in the TempDB database, and store all data returned from the Segment operator. The output of the Spool operator is just all data stored
in the TempDB table.
Following query produces join operator in execution plan:
--First Query
Select *,
avg(ItemCost) OVER(Partition By ProdID) As AvgOfItemCost
From dbo.Quiz_Test_Bonus
Here is the execution plan showing join operator: Join Operator Plan
Parallelism Operator:
Using Single table only Once in Single SELECT statement generate execution plan which have parallelism operator. Explain the reason
for the same.
The Parallelism operator performs the distribute streams, gather streams, and repartition streams logical operations. Sometimes a query
can be so complex that the SQL Optimizer needs to extensively iterate through the optimization phases. While optimizing the query, if it finds
that the cost of the processing strategy is more than the cost threshold for parallelism, and then it evaluates the cost of the query using
multiple CPUs. If this is not the case in processing the query, than the SQL optimizer proceeds with the serial plan for that query.
Since parallel queries require more memory, the optimizer determines the amount of memory available before choosing the parallel plan.
The amount of memory required increases with the degree of parallelism. Cost threshold for parallelism default values is five second.
This is basically a cost estimated by the optimizer for the execution of a query. By adjusting the max degree of parallelism can also be
beneficial if a system has multiple processors, and this is to limit the number of processors used by parallel query executions. In order to
trigger the parallelism operator, records is incremented to 600000 as shown below:
Let is truncate QuizTestBonus table and then insert 600000 records.
truncate table Quiz_Test_Bonus
Again insert records in the table.
declare @i int
declare @pID int
declare @cost money
set @pID =0
set @cost = 25
set @i=0;
while @i<600000
begin
Begin tran
INSERT INTO Quiz_Test_Bonus (ProdID,ItemCost) values (@pID+1,@cost+25)
select @i=@i+1
select @pID = @pID+1
select @cost = @cost+25
commit tran
end
Select Statement that produces Parallelism operator in the execution plan
---Second Query
Select *,
avg(ItemCost) OVER(Partition By ProdID) As AvgOfItemCost
From dbo.Quiz_Test_Bonus
Execution Plan with Parallelism Operator: Parallelism Operator Plan
The main difference in these two select statements is the data. In the first query, we have 600 records and the select statement triggered
Nested Loops (inner join), and the second query has 600000 records. Query optimizer should consider many factors in evaluating a query
for parallelism. Some of the factors are load on the SQL Server, how the query is designed, I/O throughput, memory, processor, and
other processes in the server.
Bonus:
Create a single query which satisfies both of the above statement.
Query that produces both Parallelism and Join Operator. Actually, the second select statement with 600000 records will generate both
operator but I will add ROW_NUMBER in the query, which is shown below:
Select *,
avg(ItemCost) OVER(Partition By ProdID) As AvgOfItemCost,
ROW_NUMBER() OVER( order by ProdID)
From dbo.Quiz_Test_Bonus
Execution plan showing both join operator and parallelism operator from a single query above: Bonus Execution Plan
Thanks,
Abi Chapagai