As part of a SQL server monitoring review and agent upgrade, I was asked by the Tivoli Engineer to create a list of situations, that monitoring would be useful. The architecture used is Tivoli Monitoring /
Some considerations are:
1) By commiting to having to many situations monitored – “white noise” can distract from the critical situations. For example, receiving notification on every block – may obfuscate, some critical space full.
2) DBAs are busy – assessing every situation can be time consuming – therefore the notifications must be relevant
3) We are focusing here on monitoring situations – i.e a certain threshold has been reached or an event has been triggered which will notify the DBA. This is distinct from daily reporting which focuses more on preventative actions.
As a first attempt the SQL Server list is:
MS_SQL_DB_Space_Pct_Used_Crit (database space used > 90%)
MS_SQL_DB_Status_Crit (database status does not equal vailable)
MS_SQL_DB_Suspect_Crit (database in suspect mode)
MS_SQL_Log_Freespace_Critical (Log freespace < 10%)
MS_SQL_Status_Critical (If Status = Inactive)
MS_SQL_MS_Device_Free_Critical (Percent of Device free is < = 5)
MS_SQL_MS_Job_Failure_Critical (Monitors SQL Server Jobs)
MS_SQL_IO_Disk_Errors_Crit (Any io errors greater than 0)
MS_SQL_NT_Free_Disk_Warning (Based on NT_Logical_Disk_Space_Warning)
MS_SQL_NT_SQL_Services_Fatal (Monitors SQL Server and SQL Agent Services)
MS_SQL_UDB_SQL_Fail_High (An application experiences too many SQL statements failures)
I intentionally haven’t included OS monitoring.
It would be good to hear some feedback from other DBAs, and get a feel for the critical ones
Republished from http://www.sqlserver-dba.com.
Republished from SQL Server DBA [65 clicks].
Read the original version here [32134 clicks].