In this post, let’s delve a bit more in depth regarding wait stats.
The very first question: when do the wait stats occur?
Here is the simple answer. When SQL Server is executing any task, and if for any reason it has to wait for resources to execute the task, this wait is recorded by SQL Server with the reason for the delay. Later on we can analyze these wait stats to understand the reason the task was delayed and maybe we can eliminate the wait for SQL Server. It is not always possible to remove the wait type 100%, but there are few suggestions that can help.
Before we continue learning about wait types and wait stats, we need to understand three important milestones of the query life-cycle.
Running - a query which is being executed on a CPU is called a running query. This query is responsible for CPU time.
Runnable – a query which is ready to execute and waiting for its turn to run is called a runnable query. This query is responsible for Signal Wait time. (In other words, the query is ready to run but CPU is servicing another query).
Suspended – a query which is waiting due to any reason (to know the reason, we are learning wait stats) to be converted to runnable is suspended query. This query is responsible for wait time. (In other words, this is the time we are trying to reduce).
In simple words, query execution time is a summation of the query Executing CPU Time (Running) + Query Wait Time (Suspended) + Query Signal Wait Time (Runnable). Again, it may be possible a query goes to all these stats multiple times.
Let us try to understand the whole thing with a simple analogy of a taxi and a passenger.
Two friends, Tom and Danny, go to the mall together. When they leave the mall, they decide to take a taxi. Tom and Danny both stand in the line waiting for their turn to get into the taxi. This is the Signal Wait Time as they are ready to get into the taxi but the taxis are currently serving other customer and they have to wait for their turn. In other word they are in a runnable state.
Now when it is their turn to get into the taxi, the taxi driver informs them he does not take credit cards and only cash is accepted. Neither Tom nor Danny have enough cash, they both cannot get into the vehicle. Tom waits outside in the queue and Danny goes to ATM to fetch the cash. During this time the taxi cannot wait, they have to let other passengers get into the taxi. As Tom and Danny both are outside in the queue, this is the Query Wait Time and they are in the suspended state. They cannot do anything till they get the cash.
Once Danny gets the cash, they are both standing in the line again, creating one more Signal Wait Time. This time when their turn comes they can pay the taxi driver in cash and reach their destination. The time taken for the taxi to get from the mall to the destination is running time (CPU time) and the taxi is running.
I hope this analogy is bit clear with the wait stats. You can check the Signalwait stats using following query of Glenn Berry.
-- Signal Waits for instance
SELECT CAST(100.0 * SUM(signal_wait_time_ms) / SUM (wait_time_ms) AS NUMERIC(20,2))
AS [%signal (cpu) waits],
CAST(100.0 * SUM(wait_time_ms - signal_wait_time_ms) / SUM (wait_time_ms) AS NUMERIC(20,2))
AS [%resource waits]
FROM sys.dm_os_wait_stats OPTION (RECOMPILE);
Higher the Signal wait stats are not good for the system. Very high value indicates CPU pressure. In my experience, when systems are running smooth and without any glitch the Signal wait stat is lower than 20%. Again, this number can be debated (and it is from my experience and is not documented anywhere). In other words, lower is better and higher is not good for the system.