Getting Started with Adobe After Effects - Part 6: Motion Blur
First Time? You can support us by signing up. It takes only 5 seconds. Click here to sign up. If you already have an account, click here to login.
Loading

1st Prize - Apple iPad


SQLServer Quiz 2011 - Are all Scalar User Defined Functions (UDF) always bad?

  • It is widely accepted that Scalar UDF (User Defined Functions) are really bad for performance. In most cases, the performance hit is very much noticeable if the UDF runs queries to read data from one or more tables.

    Most people create UDFs to encapsulate their business logic and reuse the code so that the logic remains in one place. Some of these rules may be very complex and others may be quite simple. The simple rules may not even need to read from any of the tables.

    Here is a UDF which performs a very simple calculation.

    CREATE FUNCTION GetExtPrice
    (
        @quantity INT,
        @rate MONEY
    )
    RETURNS MONEY 
    AS
    BEGIN
        DECLARE @price MONEY
        SELECT @price = @quantity * @rate 
        RETURN @price 
    END
    

    Look at the two queries given below. The first query uses the UDF to calculate the total price of items. The second query performs the calculation in-line and does not use the function.

    SELECT
        ItemNumber,
        dbo.GetExtPrice(Quantity, Rate) AS ExtPrice
    FROM SalesDetails 
    
    SELECT
        ItemNumber,
        Quantity * Rate AS ExtPrice
    FROM SalesDetails
    

    Is the first query bad? Explain and justify your answer.

    Posted on 03-02-2011 00:00 |
    Jacob Sebastian
    1 · 100% · 32235

16  Answers  

Subscribe to Notifications
Previous 1 | 2 Next
  • Score
    4
    • the optimizer can rewrite and optimize queries involving inline table valued UDFs. On the other hand, queries involving scalar UDFs are not rewritten by the optimizer – the execution of the first query includes one function call per row, which is very slow since it is having a scalar UDF there is nothing optimizer can do with it .
      • UDF often mean processing row by row rather than set-based and its reduce the performance.
      • yes, first query having a overhead on performance
    Replied on Mar 2 2011 1:30AM  . 
    indika saminda kannangara
    188 · 1% · 251
  • Score
    7

    Hello!

    Surprise! Contrary to popular belief, a single execution of the queries provided proves that there is a 50-50 split on the performance, i.e. both have the same performance impact when the actual execution plan is viewed.

    However, a simple modification, and the performance hit is seen. Here's the modification that results in a 52-48% performance split:

    SELECT
        ItemNumber,
        dbo.GetExtPrice(Quantity, Rate) AS ExtPrice
    FROM SalesDetails 
    WHERE dbo.GetExtPrice(Quantity, Rate) > 2000
    
    SELECT
        ItemNumber,
        Quantity * Rate AS ExtPrice
    FROM SalesDetails
    WHERE (Quantity * Rate) > 2000
    

    Here's the reason - scalar valued functions are non-deterministic. This can be confirmed by running the following T-SQL statement:

    SELECT OBJECTPROPERTY(OBJECT_ID('GetExtPrice'),'IsDeterministic')
    

    What this means is that SQL Server is forced to undertake RBAR (Row-By-Agonizing-Row), i.e. process each row, one at a time. In our case, there is an explicit filter operation that is undertaken once all values are computed - causing the performance hit.

    Thus, scalar UDFs are great when used in-line in the SELECT statements (there too, caution needs to be exercised, but generally they are pretty safe here). However, use then in WHERE and JOINs only if you absolutely have to.

    Until we meet next time,

    Be courteous. Drive responsibly.

    http://beyondrelational.com/blogs/nakul/default.aspx

    Replied on Mar 2 2011 3:23AM  . 
    Nakul Vachhrajani
    4 · 36% · 11606
  • Score
    5

    Given those two samples - both queries are pretty similar in performance.

    Still, the query that uses in-line calculation will perform better than query with UDF.

    Reason for that are not those usual concerns with scalar UDF because:

    1. No SELECT clause is used in it.
    2. UDF is not used in WHERE clause.

    GetExtPrice UDF does calcualation only. In this particular case row-by-row processing is similar to in-line calculation.

    And performance hit is with overhead caused by:

    1. Passing (assigning) values to UDF
    2. Returning values from UDF
    Replied on Mar 2 2011 1:13PM  . 
    Igor Zakharov
    162 · 1% · 300
  • Score
    10

    If one would follow the mantra that there are no stupid questions, then I would say there are no bad queries . . . just some that take a long time to answer and give some of us jobs :). So, is the first query bad? Eh - it depends.

    As the post by Nakul pointed out, the function, in its current state, is non-deterministic. However, it is not because it is scalar, but rather that it is not schema-bound. If the function were created like the following, it would be deterministic (note the with schemabinding clause included):

    CREATE FUNCTION GetExtPrice
    (
        @quantity INT,
        @rate MONEY
    )
    RETURNS MONEY with schemabinding
    AS
    BEGIN
        DECLARE @price MONEY
        SELECT @price = @quantity * @rate 
        RETURN @price
    END
    

    However, this is neither here nor there as the deterministic characteristic does not impact the behavior for this function. Deterministic or not, RBAR is eminent on the scalar UDF.

    "Not that there's anything wrong with that . . . " (-Seinfeld)

    Because good and bad are moral concepts and since we're not talking religion (where moral absolutes are defined), we'll fall back on the notion that good and bad can be defined by user perception. If the user of the query doesn't perceive any wrong-doing, such as poor performance, then all is well with the world. I could envision a scenario where this query is used perhaps once a day for a report and the company has only a couple thousand rows in the SalesDetails table (like a small consulting company might have). In such case, this query would execute sub-second on most any modern-day machine.

    It's important to remember that the function does the job and the query produces correct results. One could make the further argument that the function makes the business logic reusable, making for a good design pattern . . . but I wouldn't would be the one to suggest this :).

    What May Qualify It for Bad

    So, if we may introduce the DBA doctrine, scalar functions are evil (equating to bad for those of you would want to argue that evil is good :) ). They are evil because of potential & real IO and CPU cost they can impose on a system. To demonstrate, let's actually create & populate the table that these queries address so we can look at the performance impact the UDF adds:

    use tempdb;
    go
    
    ------------
    -- Set Up --
    ------------
    CREATE FUNCTION GetExtPrice
    (
        @quantity INT,
        @rate MONEY
    )
    RETURNS MONEY --with schemabinding
    AS
    BEGIN
        DECLARE @price MONEY
        SELECT @price = @quantity * @rate 
        RETURN @price
    END
    go
    create table SalesDetails (ItemNumber int primary key, Quantity int, Rate money)
    go
    with num(n) as
    (
        select 1 n union all
        select n + 1 from num where N < 100000
    )
    insert into SalesDetails
    select N, datepart(NANOSECOND, sysdatetime()) / 123456, 3
    from num
    option (maxrecursion 0)
    go
    
    -------------
    -- Queries --
    -------------
    set statistics io, time on;
    go
    
    SELECT
        ItemNumber,
        dbo.GetExtPrice(Quantity, Rate) AS ExtPrice
    FROM SalesDetails
    
    SELECT
        ItemNumber,
        Quantity * Rate AS ExtPrice
    FROM SalesDetails
    
    set statistics io, time off;
    go
    
    --------------
    -- Clean Up --
    --------------
    drop table SalesDetails;
    drop function GetExtPrice;
    

    Upon evaluation of the statistics IO & time output, we see that although the IO cost is identical for the two queries, the CPU cost for the first query is nearly 8x that of the second.

    alt text

    If we suppose that this query is executed, say, 2000 times a second on a 2 core machine, the CPU cost of using the scalar UDF should be enough to drive one toward using the inline calculation (the second query) - or at least updating their resume.

    Bottom line, it all comes down to how it's being used, who's using it (how much they yell), and how much it hurts. It's not so much about "good" and "bad" as it is about "works" and "works better." However, this perspective should not encourage and does not condone the use scalar UDFs in their overall design patterns . . . that would be bad :).

    Replied on Mar 2 2011 4:15PM  . 
    Scott Epperly
    160 · 1% · 301
  • Score
    10

    I don't consider the first query being bad. "Bad" would be a cursor or WHILE loop... since this would indicate the programmer forced the loop purposely. In the given scenario the programmer could have been trapped by a "hidden loop" he/she just didn't know about. But, for sure I consider it being suboptimal.

    In the first query, the function is called for each and every row, aka RBAR, usually causing a poor performance.

    Unfortunately, the Execution Plan will not indicate such a behavior, leading to a common but still false conclusion like the one drawn by Nakul Vachhrajani ("both have the same performance impact").

    Even STATISTICS IO won't show anything significant since it will usually show the same number for Scan count and logical reads. However, STATISTICS TIME will show a difference. But how to find the root cause? When running a Profiler trace with SQL:StmtCompleted and SQL:BatchCompleted we still don't see anything more detailed than we already know: the first query causes higher values for CPU and Duration.

    The reason becomes obvious when we include SP:Completed and flag the ObjectName column. All of a sudden the profiler trace will be flooded with repetitive rows of the SELECT statement indicating there is a RBAR involved with the GetExtPrice function as the root cause.

    The solution to create a reusable function and avoid RBAR at the same time is changing the Scalar-valued function to a Table-valued function:

     CREATE FUNCTION [dbo].[GetExtPriceScalar]
        (
            @quantity INT,
            @rate MONEY
        )
        WITH SCHEMABINDING
        RETURNS table 
        AS
        RETURN
        (    SELECT @quantity * @rate as ExtPrice
        )
    

    When comparing the performance using SET STATISTICS TIME ON, the impact usually is obvious. If needed, it can be demonstrated using Profiler and the SP:Completed event: The function will be called only once.

    As a side note: A minor performance improvement of the original function can be achieved by adding WITH SCHEMABINDING when creating the function. This will allow SQL Server to mark this function as schema-bound, not requiring additional work (e.g. Eager Spool) as well as choose better options for a query plan (especially when a WHERE clause is applied or the query is used in joins or insert statements). This will not have a significant impact on the given code sample. But since it's a function I would expect it is reused somewhere else ;-) However, this won't provide a significant improvement as much as changing the type of the function would do or using the inline-calculation approach.

    Finally, here are two related blog posts I'd like to highly recommend: Adam Machanic: "Scalar functions, inlining, and performance...", Kalen Delaney: Cost of Scalar Functions regarding the impact of scalar UDFs as well as TomerV: "Improving query plans..." regarding the advantage of a schema-bound UDF.

    Edit: link added

    Replied on Mar 2 2011 4:46PM  . 
    lmu92
    0 · 0% · 0
  • Score
    2

    On my machine statistics are as follow For first qurey CPU time = 952 ms, elapsed time = 1246 ms. For second query CPU time = 31 ms, elapsed time = 886 ms.

    Big difference in CPU time can be observed, proving second query much more faster then first one.

    Replied on Mar 3 2011 4:43AM  . 
    aasim abdullah
    23 · 7% · 2373
  • Score
    9

    If we put all the assumptions aside like “available indexes” or “number of records in table“ and only consider the mentioned function GetExtPrice() the way it is along with the provided two queries, I would say the first query is bad. Following are the reasons:

    (1) The function has been defined without “WITH SCHEMABINDING” option and thus it is non-deterministic function because, as Scott has already mentioned, a function defined without “WITH SCHEMABINDING” option is non-deterministic function. Due to this, the function would cause the query to recompile quite frequently. If you run SQL profiler, you can notice that there are frequent SP:CacheInsert events going on when you execute the first query repeatedly. As soon as you alter the function "WITH SCHEMABINDING" option, you will notice that the query starts to remain in procedure cache and frequent SP:CacheInsert events are stopped.

    But as this is not the case with us because the provided function GetExtPrice() is non-deterministic in the first query, it forces query to recompile frequently which does degrade the performance of the query. I executed both the queries after executing SET STATISTICS TIME ON. While executing both these queries repeatedly, I noticed that first query (query that uses function) takes CPU time almost 15 or more times than the CPU time taken by the second query which causes the first query to run 15 or more times slower than the second query. So, obviously the first query is bad.

    (2) The second reason why the first query is bad and its impact are even worst is that; imagine that you have users having only restricted and ReadOnly access to the database without “Execute” permission on our function. In this case, when your users will try to access the first query, the query will result in error as they are not having appropriate permission to execute the function GetExtPrice() in order to complete the query. You must assign appropriate permissions explicitly by any mean to allow them to execute the first query. At the same time, they are always able to execute the second query and retrieve desired data from SalesDetails table with desired calculation (ExtPrice) without assigning them any extra “Execute” permission.

    So, I conclude that the first query is bad.

    Replied on Mar 4 2011 2:34AM  . 
    Bihag Thaker
    829 · 0% · 35
  • Score
    8

    SQLServer Quiz 2011 - Are all Scalar User Defined Functions (UDF) always bad. Is the first query bad? Explain and justify your answer.

    Answer:

    What is User Defined Functions (UDF)?

    In any programming language functions are used to perform certain task and can be reused as needed. Microsoft SQL Server also uses user-defined functions and these functions are routines that accept input parameters, perform certain actions such as complex calculation, and return the result as a value. Some of the benefits of using UDFs are:

    • Modular programming capability.
    • Reusable.
    • Reduce the network traffic.
    • Faster execution.

    Types of UDF:

    There are two types of User Defined Functions and they are: Table Valued and Scalar functions.

    Table-valued Functions: It is known from the name of the function itself that user-defined table valued functions return a table data type.

    Scalar Functions: These functions return a single data value of the type defined in the return clause of the function. Scalar functions are UDFs that return a single value. If inline scalar function is used, there is no function body. With the inline scalar function the return value is the result set of a single SQL statement. In this example, the first query function is used and the function is called for each and every row which normally causes a poor performance. Whereas the second query is not using a function rather it is getting the result in a single select statement, which is better performance wise. As per my observation on these two queries, using actual execution plan I do not see the difference between them. Also by using SET STATISTICS ON and SET STATISTICS TIME ON, I do not see any difference in CPU time and logical reads between these two queries. Both queries are identical from the execution plan, looking at CPU time and the logical reads.

    Execution Plan, Logical Reads, CPU Time For first query: http://postimage.org/image/b1mbivtw/

    Execution Plan, Logical Reads, CPU Time for Second query: http://postimage.org/image/ayl61z8k/

    From my findings, both of the queries behave the same performance wise and I would not say first query is bad in this particular example. But User-Defined function is always great to use in inline functions because SQL Server Engine does not have to process row by row one at a time to get the result. As Nukul mentioned, the performance difference can be observed between these two queries using WHERE condition. Therefore, it really depends on how we use the UDF and what we are trying to achieve using the functions. There are several options available to improve the performance of scalar functions one is using SCHEMABINDING, using table-valued functions. Since the scalar user defined function is non-deterministic and it forces SQL Server engine to process each row one at a time and which causes performance issue. Therefore, in large sets of data, using scalar functions can be bad and inline functions can be better. I personally analyze both queries and look at the performance of them and the use the one that is taking less time to execute it. So proper testing and analyzing in our respective environment is a must before deciding to use one based on theory only. Performing benchmarking test is the best way to find out which UDF is performs better.

    Replied on Mar 4 2011 8:24PM  . 
    Abi Chapagai
    70 · 3% · 808
  • Score
    9

    In order to support my previous answer I had posted, I would like to add one more point to it.

    (3) In order to get better performance, if you decide to add one calculated column named ExtPrice to your table that is based on Quantity*Rate calculation and create an index on this calculated column or you decide to create an indexed view which uses the inline calculation for ExtPrice in its SELECT query, you can do so without any hassle in case of inline calculation. But in case of our user defined function, you can neither create an index on calculated column ExtPrice whose value is calculated by our user defined function nor can you create an indexed view which uses the same function for ExtPrice column in its SELECT query. The reason is, we have non deterministic function here.

    The most important benefit we get from User Defined Function is code reusability. But I would never sacrifice the performance of the queries in order to get code reusability for such simple business logic.

    Replied on Mar 7 2011 2:31AM  . 
    Bihag Thaker
    829 · 0% · 35
  • Score
    0

    I replaced lengthy T-SQL with scalar function - the queries were for each subject and Booked or Completed, days were Monday to Saturday. If you add function to stored procedure, try use GROUP BY. I reduced execution time from between (55 to 90 seconds) to (10 to15 seconds) for 95% of my queries by using GROUP BY when using scalar function.

    Replied on Mar 9 2011 5:05PM  . 
    kevin_nikolai
    300 · 0% · 146
Previous 1 | 2 Next

Your Answer


Sign Up or Login to post an answer.
Please note that your answer will not be considered as part of the contest because the competition is over. You can still post an answer to share your knowledge with the community.

Copyright © Rivera Informatic Private Ltd.