-
Replied on Oct 21 2010 1:53AM
.
|
-
Replied on Oct 21 2010 2:11AM
.
|
-
Using Scalar UDS on Select list, Where clause and Join expression would result performance issue.
Because SQL Server database engine must create its own internal cursor like processing, on using of scalar UDF.
It must invoke each UDF on each row.
If the UDF is used in the WHERE clause, this may happen as part of the filtering the rows.
If the UDF is used in the select list, this happens when creating the results of the query to pass to the next stage of query processing.
If the UDS is used in the Join expression this happen when joining the each records from the tables involved
It’s the row by row processing that seems to slow SQL Server the most.
Reference : Proposed Solution to the Performance Problem with SQL Server Scalar UDFs by Andrew Novick
Replied on Oct 21 2010 2:54AM
.
|
-
This is a good Question.
It's best practice to avoid scalar valued functions because they need to be run once for every row, and cannot be optimized by the query execution plan. Therefore, they incline to scale linearly even if the associated tables have indexes.
The best practice is using an inline-table-valued function, since these are evaluated in line with the query, and can be optimized. We get the encapsulation that is required.
Scalar functions are calculated on a single thread. Hence even If we move to a multi core machine there is no change in the performance.
Hence it is recommended to avoid scalar functions.
Replied on Oct 21 2010 6:57AM
.
|
-
Single thread – can be an issue if you are trying parallelism.
One call every row- works similar to cursor.
Very inefficient calls as query optimizer does not optimize these calls.
Does not use indexes well
Replied on Oct 21 2010 8:55AM
.
|
-
RBAR is the (simple) answer. The query optimizer is forced to analyze every row. The following code returns then same result set, but maintains two very different execution plans. When a simple scalar function is used in the 'WHERE' clause, the would-be INDEX SEEK, i.e. the utilzation of a specific search predicate, is replaced with an INDEX SCAN followed by a FILTER operator that actually conducts the 'limitation' of the query.
use AdventureWorks
go
if object_id('dbo.x') is not null
drop function dbo.x
go
create function dbo.x() returns int
as
begin return 776 end
go
select * from Sales.SalesOrderDetail
where ProductID=776
go
select
*
from
Sales.SalesOrderDetail
where
ProductID=dbo.x()
go
Furthermore, you can see that it's batch cost is is quite a bit more....

Replied on Oct 21 2010 10:41AM
.
|
-
Let us see a quick example how UDF reduces the performance. I am creating this demo for WHERE condition. The same can be simulated for JOIN and later on for SELECT statement as well.
USE tempdb
GO
-- Create Table
CREATE TABLE UDFEffect (ID INT,
FirstName VARCHAR(100),
LastName VARCHAR(100),
City VARCHAR(100))
GO
-- Insert One Hundred Thousand Records
INSERT INTO UDFEffect (ID,FirstName,LastName,City)
SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
'Bob',
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith'
ELSE 'Brown' END,
CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles'
WHEN ROW_NUMBER() OVER (ORDER BY a.name)%427 = 1 THEN 'San Diego'
ELSE 'Houston' END
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO
-- Create Indexes
-- Create Clustered Index
CREATE CLUSTERED INDEX IX_UDFEffect_ID
ON UDFEffect(ID)
GO
-- Create non clustered index
CREATE NONCLUSTERED INDEX IX_UDFEffect_City
ON UDFEffect (City)
GO
/* Enable execution plan using CTRL + M
OR
Menu >> Query >> Include Actual Execution Plan
*/
/* Run following Two select together and see the execution plan
Compare the execution cost */
-- Select Table
SELECT ID, City
FROM UDFEffect
WHERE City = 'San Diego'
GO
-- Select Table
SELECT ID, City
FROM UDFEffect
WHERE RTRIM(LTRIM(City)) = 'San Diego'
GO
Let us check the execution plan.

You can clearly see how UDF reduces the performance.
Due to Function SQL Server is not able to use indexes on the WHERE clause because of the same, it has to do complete scan of the table which resulting reducing the performance.
Replied on Oct 21 2010 10:57AM
.
|
-
The only good functions are CLR scalar functions with no data access and inline table valued functions. All other functions are a bane to performance. This is because of the terrible and slow T-SQL interpreter and (unlike built-in functions) the very poor compiler optimization in function processing (scalar functions reference twice in the SELECT WHERE clause are executed twice etc..).
Scalar and mult-statement table function cannot benefit from parallelism (inline table value functions can because they are processed like a view is processed, but parametrized).
Functions in a WHERE or JOIN predicate are not SARGable just like any built-in function (except some built-ins where SQL Server has optimization coded for it, like LEFT, but don't rely on that), and won't benefit from indexes.
Scalar functions are called sorta like how a stored procedure is called, just called once per outer row (10,000 rows = 10,000 calls, or 20,000 if referenced twice, 30,000 if reference 3 times). Data access in a scalar function is a death to performance. Non-data-access scalar function are hampered by the slow T-SQL interpreter - inline the 1 or 2 line ones, and convert the rest to CLR, or move the logic to the application (where it might belong). Try to convert all data access functions to inline-table-valued functions (processed as views, so limited to 1 query beginning with SELECT or WITH), and the ones that can't, convert to CLR or move out of the database to the app.
Multi-statement table valued functions use TempDB, cannot use paralelism, and may suffer from the slow T-SQL interpreter, and are a bane to performance if use with APPLY. If the outside proc calls the function only to dump to a temp table, there is redundant tempDB usage.
Inline table valued functions only! All other logic move to app, or where performance benefits, use CLR scalar functions (1-or-2 lines of code better off inlining in calling code if it can be made as one expression). Never have data access in scalar functions (T-SQL or CLR). If Data access in a scalar function is unavoidable, moving to the logic out of the database and into the app would be advisable.
Replied on Oct 21 2010 1:36PM
.
|
-
UDF is an “external black box” for the optimizer. Did you ever notice that Profiler is not able to display statement contained in the UDF ? Optimizer is not always able to generate a good plan for query that call UDF. When the query call a Scalar User Defined Function that is “non determinist” the performance of your query can be very very bad. I don't use very often UDF !
Danny
Replied on Oct 21 2010 7:56PM
.
|
-
Each scalar function must be executed for each qualifying row in the query effectively creating cursor like performance, i.e. RBAR (row by agonizing row). For example, if your query returns a million rows and there is a scalar function in the where clause, the function must execute one million times. It defeats the purpose of set based processing. The behavior is similar to certain types of correlated subqueries.
Replied on Oct 21 2010 8:45PM
.
|
|