Getting Started with Adobe After Effects - Part 6: Motion Blur
First Time? You can support us by signing up. It takes only 5 seconds. Click here to sign up. If you already have an account, click here to login.
Loading

1st Prize - Apple iPad


SQLServer Quiz 2011 - Do you know your execution plan - Two questions - One Answer

  • 1) Using Single table only Once in Single SELECT statement generate execution plan which have JOIN operator. Explain the reason for the same.

    2) Using Single table only Once in Single SELECT statement generate execution plan which have parallelism operator. Explain the reason for the same.

    Bonus: Create a single query which satisfy both of the above statement.

    Posted on 03-12-2011 00:00 |
    Pinal Dave
    146 · 1% · 326

4  Answers  

Subscribe to Notifications
  • Score
    10

    One example a JOIN operator is introduced is when the query requires more than one reference to the source data.

    In such scenarios usually the optimizer "decides" to benefit from 'caching' data as intermediate results (for instance Eager or Lazy Table Spool) by creating a table in tempdb and referring to that table during further processing instead of reading the data from disk again.

    Another reason is to store the original data when performing an UPDATE to protect against the "Halloween Problem" - but that's not related to the original question...

    A typical scenario leading to a JOIN operator and Table Spool operator is an aggregation including the OVER() clause.

    Here's an example: We'll create a table with 100 rows including duplicates:

    SELECT  TOP (100)
            ROW_NUMBER() OVER (ORDER BY (SELECT 0)) / 11 as data 
    INTO Test
    FROM    master.sys.columns C1;
    

    The following query will return each row of the table along with a separate column holding the number of occurence of this value in the DATA column:

    SELECT  DATA, COUNT  (DATA) OVER(PARTITION BY DATA%10) as number
    FROM    dbo.Test
    

    Here's the execution plan: Execution Plan with Joins

    The Table spool generated in the upper data stream hold the values of the DATA column together with the result of the Compute Scalar operator (that calculated the value of DATA%10 for each row). The data are ordered by [DATA%10] to make the COUNT() operation more efficient.

    The second data stream is used to count the number of rows per [DATA%10]. It's basically an internal SELECT COUNT(*) FROM dbo.Test GROUP BY (DATA%10). Those result sets then are joined to assign the value of the aggregation stream to the original data.

    This example did not include the Parallelism operator. Parallelism will be introduced if the query optimizer determines that there is a operation in the query with a cost higher than the value specified in "cost threshold for parallelism". In this case the load to perform the operation will be split between the available processors (whereas "available" does not only mean physically present but also involves avaialble threads and other factors).

    In order to visualize the effect we replace the Test table with one holding more rows (on my home PC parallelism kicked in at 100K rows but one might have to use a larger number depending on the system).

    DROP    TABLE dbo.Test;
    
    SELECT  TOP (1000000)
            ROW_NUMBER() OVER (ORDER BY (SELECT 0)) / 1179 as data 
    INTO Test
    FROM    master.sys.columns C1,
            master.sys.columns C2,
            master.sys.columns C3;
    

    The basic execution plan is similar, except for the additional parallelism object:

    Exceution plan with Parallelism

    There are scenarios where the additional effort to divide the operations and merge them back together adds more time than parallelism will gain. In such a scenario there are several options:

    1. if the slow down is realted to specific queries, it might help to reduce the number of allowed parallel processes by adding OPTION (MAXDOP "x"), wherreas "x" is the number of parallel processes. A value of 1 will lead to the same execution plan as before, a value of 0 allows to use all processes available.
    2. if parallel processing is used constantly "too early" or "too late", it might help to change the value of "cost threshold for parallelism" using sp_configure.
    3. if the degree of parallelism neds to be changed for all queries, the value can be changed using sp_configure 'max degree of parallelism'. This option should be used with caution since it will also influence maintenance tasks, such as reindexing.

    Analyzing the execution plan is more an art than it is a mystery. One of the beste resources when it comes to understanding execution plans is Grant Fritcheys book "Dissecting SQL Server Execution Plans".

    Replied on Mar 12 2011 6:47AM  . 
    lmu92
    0 · 0% · 0
  • Score
    10

    1) SELECT statement that has OVER() with Aggregate Functions will generate execution plan which has JOIN operator

    Use AdventureWorks
    Go 
    
    Select ProductID,
    SUM(ActualCost) OVER(Partition By TransactionID) As Total
    From Production.TransactionHistory
    

    It is similar to joining an aggregated copy of a SELECT to itself. The sql statement above is similar to:

    Select T.ProductID, T.ActualCost from Production.TransactionHistory T
    INNER JOIN
    (
        select ProductID, SUM(ActualCost) as Total from Production.TransactionHistory Group By ProductID
    ) H
    on T.ProductID=H.ProductID
    

    2) Using Single table only Once in Single SELECT statement generate execution plan which have parallelism operator.

    First of all - it is up to SQL Server to use parrallelism. You may prevent SQL Server from a parallel processing of single query with advanced configuration, but you can not force SQL Server to use more, then one processor for the execution of single query. Only SQL Server can decide how to take advantage of the available CPUs.

    So, if Maximum Degree of Parrallelism is not set to 1 (default 0 - use all available processors), and query execution time is more, then Cost Threshold for parallelism (default 5 sec), then the Query Optimizer will consider using parallelism for executing a query. The use of parallelism has its own overhead, and the time savings of using parallelism must exceed the overhead incurred before the Query Optimizer will use it. In a parallel query execution plan, the insert, update, and delete operators are executed serially. However, the WHERE clause of an UPDATE or a DELETE statement, or the SELECT part of an INSERT statement may be executed in parallel. The actual data changes are then serially applied to the database.

    So first assumption - we have configuration that allows parallel execution.

    We can compare very similar queries:

    Use AdventureWorks
    Go 
    --most probably will not trigger parallelism 
    Select  *,
    ROW_NUMBER() OVER(
        order by LocationID
    )  From Production.Location
    
    --probably will trigger parallelism
    Select *,
    ROW_NUMBER() OVER(
        order by productID
    ) 
    From production.TransactionHistory
    

    Difference - quantity of rows to be processed (needed processing time), first query has 14 rows, second query has 113443 rows. When the Query Optimizer evaluates a query for parallelism, it must consider many factors, such as the current load on the server, the nature of the query, I/O requirements, and so on.

    3. Bonus

    Select ProductID,
    SUM(ActualCost) OVER (Partition By TransactionID) As Total,
    ROW_NUMBER() OVER(order by productID) 
    From production.TransactionHistory42
    
    Replied on Mar 13 2011 12:27AM  . 
    Igor Zakharov
    158 · 1% · 300
  • Score
    10

    Hi Pinal,

    There are many cases when QO can create an execution plan that access the table using a join operator, and using parallelism. To show different Scenarios of our friends (who already showed the Spool) I’ll show the index intersection.

    It is used when SQL can join the result of two indexes to return the query.

    An interesting sample could be the following query using AdventureWorksDWR2:

    SELECT [ProductAlternateKey], [ProductSubcategoryKey]
      FROM DimProduct
     WHERE [ProductAlternateKey] = 'BA-8327'
        OR [ProductSubcategoryKey] = 12
    OPTION (RECOMPILE)
    

    The query is using the indexes by ProductAlternateKey and ProductSubcategoryKey on the table DimProduct to read all the data, joining the results using the Hash Join and then it applies the filter on the where clause.

    alt text

    Now to simulate a parallel plan, we can do some good tricks :-). First you can fool SQL about the number of CPUs that you have. Second you can fool SQL about the size of your table.

    To fool SQL about the number of CPUs you can start the SQL using the parameter –P16.

    SQL Server Configuration Manager:

    alt text

    Error Log file after restart SQL Server:

    alt text

    Alternatively you can check how many CPUs do you have using the DMV sys.dmossys_info.

    SELECT cpucount FROM sys.dmossysinfo

    cpu_count

    16

    And to fool SQL Server about the size of the table, we could change the statistics of the clustered index of the table. For that, we can use the UPDATE STATISTICS and specify the RowCount and PageCount options.

    UPDATE STATISTICS DimProduct [PK_DimProduct_ProductKey] WITH RowCount = 500000, PageCount = 100000
    

    After run the update statistics our query starts to use parallelism to perform the reads and join using all CPUs in parallel.

    SELECT [ProductAlternateKey], [ProductSubcategoryKey]
      FROM DimProduct
     WHERE [ProductAlternateKey] = 'BA-8327'
        OR [ProductSubcategoryKey] = 12
    OPTION (RECOMPILE)
    

    alt text

    Replied on Mar 23 2011 4:15AM  . 
    Fabiano Amorim
    839 · 0% · 34
  • Score
    10

    Query Plan:

    Query plan is a read-only data structure with set of commands that specify all the physical operations required to execute a particular query. Among all the operations, join operator and parallelism operator are the point of interest here. For a single query, two copies of query plan can be used and they are serial plan and the parallel plan.

    Join Operator:

    Using Single table only Once in Single SELECT statement generate execution plan which have JOIN operator. Explain the reason for the same.

    Nested loops joins perform a search on the inner-table for each row of the outer table, typically using an index. SQL Server engine decides based on anticipated costs, whether to sort the outer input in order to improve locality of the searches on the index over the inner input. The most obvious case for nested loops is when one input is very small and the other input is ideally indexed for the join. Join operator is used when a query requires more references to the source data set. To answer this question, I have created a test table and insert 600 records into this table.

    ---Quiz_Test_Bonus Table 
    create table Quiz_Test_Bonus (ID int identity (1,1) primary key, ProdID int, ItemCost money)
    go
    
    declare @i int
    declare @pID int
    declare @cost money
    set @pID =0
    set @cost = 25
    set @i=0;
    while @i<600
    begin
    Begin tran
    INSERT INTO Quiz_Test_Bonus (ProdID,ItemCost) values (@pID+1,@cost+25)
    select @i=@i+1
    select @pID = @pID+1
    select @cost = @cost+25
    commit tran
    end
    


    In this example, there 600 records and the following statement will create an execution plan that has join operator, along with a table spool and lazy spool. The most interesting feature of SQL Server Optimizer is, it caches data in the intermediate results which is a Table Spool or Lazy Spool. Lazy spool reads data only when individual rows are required. This creates a temp table and builds this table in a lazy manner, which means; it reads and stores data in a temp table only when the parent operator asks for a row. In the following example, the Table Spool holds the values of ProdID column of the Compute Scalar. The second stream aggregate is doing an average of ItemCost. A spool reads the data and saves it on TempDB. This process is used whenever the SQL Optimizer knows that the density of the column is high and the intermediate result is very complex to calculate. If this is the case, SQL makes the computation once, and stores the result in the temporary space so it can search it later.

    The Nested Loop operator joins the first and second parts of the execution plan as shown below. As we now, the nested loops scan a table and join it with another table one row at time. When the aggregation is done, the Spool operator is called again, and it in turn calls the Segment operator, which reads another segment of rows. This cycle repeats until all rows are read. It will create a temporary table in the TempDB database, and store all data returned from the Segment operator. The output of the Spool operator is just all data stored in the TempDB table.

    Following query produces join operator in execution plan:

    --First Query 
    Select *, 
    avg(ItemCost) OVER(Partition By ProdID) As AvgOfItemCost
    From dbo.Quiz_Test_Bonus
    

    Here is the execution plan showing join operator: Join Operator Plan

    Parallelism Operator:

    Using Single table only Once in Single SELECT statement generate execution plan which have parallelism operator. Explain the reason for the same.

    The Parallelism operator performs the distribute streams, gather streams, and repartition streams logical operations. Sometimes a query can be so complex that the SQL Optimizer needs to extensively iterate through the optimization phases. While optimizing the query, if it finds that the cost of the processing strategy is more than the cost threshold for parallelism, and then it evaluates the cost of the query using multiple CPUs. If this is not the case in processing the query, than the SQL optimizer proceeds with the serial plan for that query. Since parallel queries require more memory, the optimizer determines the amount of memory available before choosing the parallel plan. The amount of memory required increases with the degree of parallelism. Cost threshold for parallelism default values is five second. This is basically a cost estimated by the optimizer for the execution of a query. By adjusting the max degree of parallelism can also be beneficial if a system has multiple processors, and this is to limit the number of processors used by parallel query executions. In order to trigger the parallelism operator, records is incremented to 600000 as shown below:

    Let is truncate QuizTestBonus table and then insert 600000 records.

    truncate table  Quiz_Test_Bonus
    

    Again insert records in the table.

    declare @i int
    declare @pID int
    declare @cost money
    set @pID =0
    set @cost = 25
    set @i=0;
    while @i<600000
    begin
    Begin tran
    INSERT INTO Quiz_Test_Bonus (ProdID,ItemCost) values (@pID+1,@cost+25)
    select @i=@i+1
    select @pID = @pID+1
    select @cost = @cost+25
    commit tran
    end
    

    Select Statement that produces Parallelism operator in the execution plan

    ---Second Query 
    Select *, 
    avg(ItemCost) OVER(Partition By ProdID) As AvgOfItemCost
    From dbo.Quiz_Test_Bonus
    

    Execution Plan with Parallelism Operator: Parallelism Operator Plan

    The main difference in these two select statements is the data. In the first query, we have 600 records and the select statement triggered Nested Loops (inner join), and the second query has 600000 records. Query optimizer should consider many factors in evaluating a query for parallelism. Some of the factors are load on the SQL Server, how the query is designed, I/O throughput, memory, processor, and other processes in the server.

    Bonus:

    Create a single query which satisfies both of the above statement.

    Query that produces both Parallelism and Join Operator. Actually, the second select statement with 600000 records will generate both operator but I will add ROW_NUMBER in the query, which is shown below:

    Select *, 
    avg(ItemCost) OVER(Partition By ProdID) As AvgOfItemCost,
    ROW_NUMBER() OVER( order by ProdID)
    From dbo.Quiz_Test_Bonus
    

    Execution plan showing both join operator and parallelism operator from a single query above: Bonus Execution Plan

    Thanks,

    Abi Chapagai

    Replied on Apr 3 2011 4:03PM  . 
    Abi Chapagai
    69 · 3% · 808

Your Answer


Sign Up or Login to post an answer.
Please note that your answer will not be considered as part of the contest because the competition is over. You can still post an answer to share your knowledge with the community.

Copyright © Rivera Informatic Private Ltd.