-
- the optimizer can rewrite and optimize queries involving inline table valued UDFs. On the other hand, queries involving scalar UDFs are not rewritten by the optimizer – the execution of the first query includes one function call per row, which is very slow since it is having a scalar UDF there is nothing optimizer can do with it .
- UDF often mean processing row by row rather than set-based and its reduce the performance.
- yes, first query having a overhead on performance
Replied on Mar 2 2011 1:30AM
.
|
-
Hello!
Surprise! Contrary to popular belief, a single execution of the queries provided proves that there is a 50-50 split on the performance, i.e. both have the same performance impact when the actual execution plan is viewed.
However, a simple modification, and the performance hit is seen. Here's the modification that results in a 52-48% performance split:
SELECT
ItemNumber,
dbo.GetExtPrice(Quantity, Rate) AS ExtPrice
FROM SalesDetails
WHERE dbo.GetExtPrice(Quantity, Rate) > 2000
SELECT
ItemNumber,
Quantity * Rate AS ExtPrice
FROM SalesDetails
WHERE (Quantity * Rate) > 2000
Here's the reason - scalar valued functions are non-deterministic. This can be confirmed by running the following T-SQL statement:
SELECT OBJECTPROPERTY(OBJECT_ID('GetExtPrice'),'IsDeterministic')
What this means is that SQL Server is forced to undertake RBAR (Row-By-Agonizing-Row), i.e. process each row, one at a time. In our case, there is an explicit filter operation that is undertaken once all values are computed - causing the performance hit.
Thus, scalar UDFs are great when used in-line in the SELECT statements (there too, caution needs to be exercised, but generally they are pretty safe here). However, use then in WHERE and JOINs only if you absolutely have to.
Until we meet next time,
Be courteous. Drive responsibly.
http://beyondrelational.com/blogs/nakul/default.aspx
Replied on Mar 2 2011 3:23AM
.
|
-
Given those two samples - both queries are pretty similar in performance.
Still, the query that uses in-line calculation will perform better than query with UDF.
Reason for that are not those usual concerns with scalar UDF because:
- No SELECT clause is used in it.
- UDF is not used in WHERE clause.
GetExtPrice UDF does calcualation only. In this particular case row-by-row processing is similar to in-line calculation.
And performance hit is with overhead caused by:
- Passing (assigning) values to UDF
- Returning values from UDF
Replied on Mar 2 2011 1:13PM
.
|
-
If one would follow the mantra that there are no stupid questions, then I would say there are no bad queries . . . just some that take a long time to answer and give some of us jobs :). So, is the first query bad? Eh - it depends.
As the post by Nakul pointed out, the function, in its current state, is non-deterministic. However, it is not because it is scalar, but rather that it is not schema-bound. If the function were created like the following, it would be deterministic (note the with schemabinding clause included):
CREATE FUNCTION GetExtPrice
(
@quantity INT,
@rate MONEY
)
RETURNS MONEY with schemabinding
AS
BEGIN
DECLARE @price MONEY
SELECT @price = @quantity * @rate
RETURN @price
END
However, this is neither here nor there as the deterministic characteristic does not impact the behavior for this function. Deterministic or not, RBAR is eminent on the scalar UDF.
"Not that there's anything wrong with that . . . " (-Seinfeld)
Because good and bad are moral concepts and since we're not talking religion (where moral absolutes are defined), we'll fall back on the notion that good and bad can be defined by user perception. If the user of the query doesn't perceive any wrong-doing, such as poor performance, then all is well with the world. I could envision a scenario where this query is used perhaps once a day for a report and the company has only a couple thousand rows in the SalesDetails table (like a small consulting company might have). In such case, this query would execute sub-second on most any modern-day machine.
It's important to remember that the function does the job and the query produces correct results. One could make the further argument that the function makes the business logic reusable, making for a good design pattern . . . but I wouldn't would be the one to suggest this :).
What May Qualify It for Bad
So, if we may introduce the DBA doctrine, scalar functions are evil (equating to bad for those of you would want to argue that evil is good :) ). They are evil because of potential & real IO and CPU cost they can impose on a system. To demonstrate, let's actually create & populate the table that these queries address so we can look at the performance impact the UDF adds:
use tempdb;
go
------------
-- Set Up --
------------
CREATE FUNCTION GetExtPrice
(
@quantity INT,
@rate MONEY
)
RETURNS MONEY --with schemabinding
AS
BEGIN
DECLARE @price MONEY
SELECT @price = @quantity * @rate
RETURN @price
END
go
create table SalesDetails (ItemNumber int primary key, Quantity int, Rate money)
go
with num(n) as
(
select 1 n union all
select n + 1 from num where N < 100000
)
insert into SalesDetails
select N, datepart(NANOSECOND, sysdatetime()) / 123456, 3
from num
option (maxrecursion 0)
go
-------------
-- Queries --
-------------
set statistics io, time on;
go
SELECT
ItemNumber,
dbo.GetExtPrice(Quantity, Rate) AS ExtPrice
FROM SalesDetails
SELECT
ItemNumber,
Quantity * Rate AS ExtPrice
FROM SalesDetails
set statistics io, time off;
go
--------------
-- Clean Up --
--------------
drop table SalesDetails;
drop function GetExtPrice;
Upon evaluation of the statistics IO & time output, we see that although the IO cost is identical for the two queries, the CPU cost for the first query is nearly 8x that of the second.

If we suppose that this query is executed, say, 2000 times a second on a 2 core machine, the CPU cost of using the scalar UDF should be enough to drive one toward using the inline calculation (the second query) - or at least updating their resume.
Bottom line, it all comes down to how it's being used, who's using it (how much they yell), and how much it hurts. It's not so much about "good" and "bad" as it is about "works" and "works better." However, this perspective should not encourage and does not condone the use scalar UDFs in their overall design patterns . . . that would be bad :).
Replied on Mar 2 2011 4:15PM
.
|
-
I don't consider the first query being bad. "Bad" would be a cursor or WHILE loop... since this would indicate the programmer forced the loop purposely. In the given scenario the programmer could have been trapped by a "hidden loop" he/she just didn't know about.
But, for sure I consider it being suboptimal.
In the first query, the function is called for each and every row, aka RBAR, usually causing a poor performance.
Unfortunately, the Execution Plan will not indicate such a behavior, leading to a common but still false conclusion like the one drawn by Nakul Vachhrajani ("both have the same performance impact").
Even STATISTICS IO won't show anything significant since it will usually show the same number for Scan count and logical reads. However, STATISTICS TIME will show a difference. But how to find the root cause?
When running a Profiler trace with SQL:StmtCompleted and SQL:BatchCompleted we still don't see anything more detailed than we already know: the first query causes higher values for CPU and Duration.
The reason becomes obvious when we include SP:Completed and flag the ObjectName column. All of a sudden the profiler trace will be flooded with repetitive rows of the SELECT statement indicating there is a RBAR involved with the GetExtPrice function as the root cause.
The solution to create a reusable function and avoid RBAR at the same time is changing the Scalar-valued function to a Table-valued function:
CREATE FUNCTION [dbo].[GetExtPriceScalar]
(
@quantity INT,
@rate MONEY
)
WITH SCHEMABINDING
RETURNS table
AS
RETURN
( SELECT @quantity * @rate as ExtPrice
)
When comparing the performance using SET STATISTICS TIME ON, the impact usually is obvious. If needed, it can be demonstrated using Profiler and the SP:Completed event: The function will be called only once.
As a side note: A minor performance improvement of the original function can be achieved by adding WITH SCHEMABINDING when creating the function. This will allow SQL Server to mark this function as schema-bound, not requiring additional work (e.g. Eager Spool) as well as choose better options for a query plan (especially when a WHERE clause is applied or the query is used in joins or insert statements). This will not have a significant impact on the given code sample. But since it's a function I would expect it is reused somewhere else ;-)
However, this won't provide a significant improvement as much as changing the type of the function would do or using the inline-calculation approach.
Finally, here are two related blog posts I'd like to highly recommend:
Adam Machanic: "Scalar functions, inlining, and performance...", Kalen Delaney: Cost of Scalar Functions regarding the impact of scalar UDFs
as well as TomerV: "Improving query plans..." regarding the advantage of a schema-bound UDF.
Edit: link added
Replied on Mar 2 2011 4:46PM
.
|
-
On my machine statistics are as follow
For first qurey
CPU time = 952 ms, elapsed time = 1246 ms.
For second query
CPU time = 31 ms, elapsed time = 886 ms.
Big difference in CPU time can be observed, proving second query much more faster then first one.
Replied on Mar 3 2011 4:43AM
.
|
-
If we put all the assumptions aside like “available indexes” or “number of records in table“ and only consider the mentioned function GetExtPrice() the way it is along with the provided two queries, I would say the first query is bad. Following are the reasons:
(1)
The function has been defined without “WITH SCHEMABINDING” option and thus it is non-deterministic function because, as Scott has already mentioned, a function defined without “WITH SCHEMABINDING” option is non-deterministic function. Due to this, the function would cause the query to recompile quite frequently. If you run SQL profiler, you can notice that there are frequent SP:CacheInsert events going on when you execute the first query repeatedly. As soon as you alter the function "WITH SCHEMABINDING" option, you will notice that the query starts to remain in procedure cache and frequent SP:CacheInsert events are stopped.
But as this is not the case with us because the provided function GetExtPrice() is non-deterministic in the first query, it forces query to recompile frequently which does degrade the performance of the query. I executed both the queries after executing SET STATISTICS TIME ON. While executing both these queries repeatedly, I noticed that first query (query that uses function) takes CPU time almost 15 or more times than the CPU time taken by the second query which causes the first query to run 15 or more times slower than the second query. So, obviously the first query is bad.
(2)
The second reason why the first query is bad and its impact are even worst is that; imagine that you have users having only restricted and ReadOnly access to the database without “Execute” permission on our function. In this case, when your users will try to access the first query, the query will result in error as they are not having appropriate permission to execute the function GetExtPrice() in order to complete the query. You must assign appropriate permissions explicitly by any mean to allow them to execute the first query. At the same time, they are always able to execute the second query and retrieve desired data from SalesDetails table with desired calculation (ExtPrice) without assigning them any extra “Execute” permission.
So, I conclude that the first query is bad.
Replied on Mar 4 2011 2:34AM
.
|
-
SQLServer Quiz 2011 - Are all Scalar User Defined Functions (UDF) always bad. Is the first query bad? Explain and justify your answer.
Answer:
What is User Defined Functions (UDF)?
In any programming language functions are used to perform certain task and can be reused as needed. Microsoft SQL Server also uses user-defined functions and these functions are routines that accept input parameters, perform certain actions such as complex calculation, and return the result as a value. Some of the benefits of using UDFs are:
- Modular programming capability.
- Reusable.
- Reduce the network traffic.
- Faster execution.
Types of UDF:
There are two types of User Defined Functions and they are: Table Valued and Scalar functions.
Table-valued Functions: It is known from the name of the function itself that user-defined table valued functions return a table data type.
Scalar Functions: These functions return a single data value of the type defined in the return clause of the function. Scalar functions are UDFs that return a single value. If inline scalar function is used, there is no function body. With the inline scalar function the return value is the result set of a single SQL statement. In this example, the first query function is used and the function is called for each and every row which normally causes a poor performance. Whereas the second query is not using a function rather it is getting the result in a single select statement, which is better performance wise. As per my observation on these two queries, using actual execution plan I do not see the difference between them. Also by using SET STATISTICS ON and SET STATISTICS TIME ON, I do not see any difference in CPU time and logical reads between these two queries. Both queries are identical from the execution plan, looking at CPU time and the logical reads.
Execution Plan, Logical Reads, CPU Time For first query: http://postimage.org/image/b1mbivtw/
Execution Plan, Logical Reads, CPU Time for Second query: http://postimage.org/image/ayl61z8k/
From my findings, both of the queries behave the same performance wise and I would not say first query is bad in this particular example. But User-Defined function is always great to use in inline functions because SQL Server Engine does not have to process row by row one at a time to get the result. As Nukul mentioned, the performance difference can be observed between these two queries using WHERE condition. Therefore, it really depends on how we use the UDF and what we are trying to achieve using the functions. There are several options available to improve the performance of scalar functions one is using SCHEMABINDING, using table-valued functions. Since the scalar user defined function is non-deterministic and it forces SQL Server engine to process each row one at a time and which causes performance issue. Therefore, in large sets of data, using scalar functions can be bad and inline functions can be better. I personally analyze both queries and look at the performance of them and the use the one that is taking less time to execute it. So proper testing and analyzing in our respective environment is a must before deciding to use one based on theory only. Performing benchmarking test is the best way to find out which UDF is performs better.
Replied on Mar 4 2011 8:24PM
.
|
-
In order to support my previous answer I had posted, I would like to add one more point to it.
(3) In order to get better performance, if you decide to add one calculated column named ExtPrice to your table that is based on Quantity*Rate calculation and create an index on this calculated column or you decide to create an indexed view which uses the inline calculation for ExtPrice in its SELECT query, you can do so without any hassle in case of inline calculation. But in case of our user defined function, you can neither create an index on calculated column ExtPrice whose value is calculated by our user defined function nor can you create an indexed view which uses the same function for ExtPrice column in its SELECT query. The reason is, we have non deterministic function here.
The most important benefit we get from User Defined Function is code reusability. But I would never sacrifice the performance of the queries in order to get code reusability for such simple business logic.
Replied on Mar 7 2011 2:31AM
.
|
-
I replaced lengthy T-SQL with scalar function - the queries were for each subject and Booked or Completed, days were Monday to Saturday. If you add function to stored procedure, try use GROUP BY. I reduced execution time from between (55 to 90 seconds) to (10 to15 seconds) for 95% of my queries by using GROUP BY when using scalar function.
Replied on Mar 9 2011 5:05PM
.
|
|