Getting Started with Adobe After Effects - Part 6: Motion Blur
First Time? You can support us by signing up. It takes only 5 seconds. Click here to sign up. If you already have an account, click here to login.
Loading

SQLServer Quiz 2010 - Scalar Functions and Performance Problems

  • How can scalar functions cause performance problems when used in the select list, join expression, or where clause?

    Posted on 10-21-2010 00:00 |
    Adam Haines
    230 · 1% · 194

22  Answers  

Subscribe to Notifications
Previous 1 | 2 | 3 Next
  • Score
    7

    IMO , the worst problem with scalar UDF's is that it will turn the operation into a nested loop as the engine is unable to optimize the logic within into a hash (or merge) join if deemed appropriate.

    When used in a join condition matters can be even worse. In a nested loop join , the functions are called for each comparison as i have shown here http://sqlblogcasts.com/blogs/sqlandthelike/archive/2010/07/02/using-a-udf-as-a-sarg-make-a-hash-of-it.aspx

    Replied on Oct 21 2010 2:11AM  . 
    dave ballantyne
    111 · 1% · 462
  • Score
    5

    Using Scalar UDS on Select list, Where clause and Join expression would result performance issue.

    Because SQL Server database engine must create its own internal cursor like processing, on using of scalar UDF.

    It must invoke each UDF on each row.

    If the UDF is used in the WHERE clause, this may happen as part of the filtering the rows.

    If the UDF is used in the select list, this happens when creating the results of the query to pass to the next stage of query processing.

    If the UDS is used in the Join expression this happen when joining the each records from the tables involved

    It’s the row by row processing that seems to slow SQL Server the most.

    Reference : Proposed Solution to the Performance Problem with SQL Server Scalar UDFs by Andrew Novick

    Replied on Oct 21 2010 2:54AM  . 
    Sivaprasad S - SIVA
    239 · 1% · 188
  • Score
    8

    This is a good Question.

    It's best practice to avoid scalar valued functions because they need to be run once for every row, and cannot be optimized by the query execution plan. Therefore, they incline to scale linearly even if the associated tables have indexes.

    The best practice is using an inline-table-valued function, since these are evaluated in line with the query, and can be optimized. We get the encapsulation that is required.

    Scalar functions are calculated on a single thread. Hence even If we move to a multi core machine there is no change in the performance.

    Hence it is recommended to avoid scalar functions.

    Replied on Oct 21 2010 6:57AM  . 
    Vamshi
    131 · 1% · 376
  • Score
    8

    Single thread – can be an issue if you are trying parallelism.

    One call every row- works similar to cursor.

    Very inefficient calls as query optimizer does not optimize these calls.

    Does not use indexes well

    Replied on Oct 21 2010 8:55AM  . 
    rpathak
    299 · 0% · 145
  • Score
    6

    RBAR is the (simple) answer. The query optimizer is forced to analyze every row. The following code returns then same result set, but maintains two very different execution plans. When a simple scalar function is used in the 'WHERE' clause, the would-be INDEX SEEK, i.e. the utilzation of a specific search predicate, is replaced with an INDEX SCAN followed by a FILTER operator that actually conducts the 'limitation' of the query.

    use AdventureWorks
    go
    
    if object_id('dbo.x') is not null 
        drop function dbo.x
    go
    
    create function dbo.x() returns int
    as
    begin return 776 end
    go 
    
    select * from Sales.SalesOrderDetail
    where ProductID=776
    go
    
    select
       *
    from
       Sales.SalesOrderDetail
    where
       ProductID=dbo.x()
    go
    

    Furthermore, you can see that it's batch cost is is quite a bit more....

    Replied on Oct 21 2010 10:41AM  . 
    mjfii
    344 · 0% · 119
  • Score
    5

    Let us see a quick example how UDF reduces the performance. I am creating this demo for WHERE condition. The same can be simulated for JOIN and later on for SELECT statement as well.

    USE tempdb
    GO
    -- Create Table
    CREATE TABLE UDFEffect (ID INT, 
                           FirstName VARCHAR(100), 
                           LastName VARCHAR(100), 
                           City VARCHAR(100))
    GO
    -- Insert One Hundred Thousand Records
    INSERT INTO UDFEffect (ID,FirstName,LastName,City)
    SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 
          'Bob', 
          CASE WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' 
           ELSE 'Brown' END,
          CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' 
              WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' 
              WHEN  ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' 
              WHEN ROW_NUMBER() OVER (ORDER BY a.name)%427 = 1 THEN 'San Diego'
                ELSE 'Houston' END
    FROM sys.all_objects a
    CROSS JOIN sys.all_objects b
    GO
    -- Create Indexes 
    -- Create Clustered Index
    CREATE CLUSTERED INDEX IX_UDFEffect_ID
    ON UDFEffect(ID)
    GO
    -- Create non clustered index
    CREATE NONCLUSTERED INDEX IX_UDFEffect_City
    ON UDFEffect (City)
    GO
    /* Enable execution plan using CTRL + M
    OR 
    Menu >> Query >> Include Actual Execution Plan
    */
    /* Run following Two select together and see the execution plan
    Compare the execution cost */
    -- Select Table
    SELECT ID, City
    FROM UDFEffect
    WHERE City = 'San Diego'
    GO
    -- Select Table
    SELECT ID, City
    FROM UDFEffect
    WHERE RTRIM(LTRIM(City)) = 'San Diego'
    GO
    

    Let us check the execution plan.

    execution plan

    You can clearly see how UDF reduces the performance.

    Due to Function SQL Server is not able to use indexes on the WHERE clause because of the same, it has to do complete scan of the table which resulting reducing the performance.

    Replied on Oct 21 2010 10:57AM  . 
    Nupur Dave
    172 · 1% · 284
  • Score
    10

    The only good functions are CLR scalar functions with no data access and inline table valued functions. All other functions are a bane to performance. This is because of the terrible and slow T-SQL interpreter and (unlike built-in functions) the very poor compiler optimization in function processing (scalar functions reference twice in the SELECT WHERE clause are executed twice etc..).

    Scalar and mult-statement table function cannot benefit from parallelism (inline table value functions can because they are processed like a view is processed, but parametrized).

    Functions in a WHERE or JOIN predicate are not SARGable just like any built-in function (except some built-ins where SQL Server has optimization coded for it, like LEFT, but don't rely on that), and won't benefit from indexes.

    Scalar functions are called sorta like how a stored procedure is called, just called once per outer row (10,000 rows = 10,000 calls, or 20,000 if referenced twice, 30,000 if reference 3 times). Data access in a scalar function is a death to performance. Non-data-access scalar function are hampered by the slow T-SQL interpreter - inline the 1 or 2 line ones, and convert the rest to CLR, or move the logic to the application (where it might belong). Try to convert all data access functions to inline-table-valued functions (processed as views, so limited to 1 query beginning with SELECT or WITH), and the ones that can't, convert to CLR or move out of the database to the app.

    Multi-statement table valued functions use TempDB, cannot use paralelism, and may suffer from the slow T-SQL interpreter, and are a bane to performance if use with APPLY. If the outside proc calls the function only to dump to a temp table, there is redundant tempDB usage.

    Inline table valued functions only! All other logic move to app, or where performance benefits, use CLR scalar functions (1-or-2 lines of code better off inlining in calling code if it can be made as one expression). Never have data access in scalar functions (T-SQL or CLR). If Data access in a scalar function is unavoidable, moving to the logic out of the database and into the app would be advisable.

    Replied on Oct 21 2010 1:36PM  . 
    Jesse Roberge
    81 · 2% · 707
  • Score
    6

    UDF is an “external black box” for the optimizer. Did you ever notice that Profiler is not able to display statement contained in the UDF ? Optimizer is not always able to generate a good plan for query that call UDF. When the query call a Scalar User Defined Function that is “non determinist” the performance of your query can be very very bad. I don't use very often UDF !
    Danny

    Replied on Oct 21 2010 7:56PM  . 
    danny presse
    254 · 1% · 174
  • Score
    5

    Each scalar function must be executed for each qualifying row in the query effectively creating cursor like performance, i.e. RBAR (row by agonizing row). For example, if your query returns a million rows and there is a scalar function in the where clause, the function must execute one million times. It defeats the purpose of set based processing. The behavior is similar to certain types of correlated subqueries.

    Replied on Oct 21 2010 8:45PM  . 
    bfrasca
    1009 · 0% · 25
Previous 1 | 2 | 3 Next

Your Answer


Sign Up or Login to post an answer.
Please note that your answer will not be considered as part of the contest because the competition is over. You can still post an answer to share your knowledge with the community.

Copyright © Rivera Informatic Private Ltd.