Today, we will talk about a slightly different, yet very common subject. We will talk about what are the different ways to measure the number of rows in a table.
I found a couple of posts over the Internet, and I will reference them as and when I demonstrate the associated methods. This post basically is an assimilation of these methods and attempts to perform a comparative analysis to categorize them in terms of:
- Performance (in terms of time)
- Accuracy
The most common methods
The most common of all methods is the one that every person who has even come in contact with Microsoft SQL Server for even 5 minutes knows about. It is the very generic:
SELECT COUNT(*) FROM HumanResources.Employee
Other similar options are:
SELECT COUNT(1) FROM HumanResources.Employee
SELECT COUNT(BusinessEntityID) FROM HumanResources.Employee
Some developers believe that COUNT(1) or using the 1st primary key column is faster for the COUNT() function, however, that is untrue. Here’s the proof:
If you have tables with rows greater than the range of INT data type, you may use COUNT_BIG() instead.
Using Catalog Views
Microsoft SQL Server comes with a few catalog views, which return information that is used by the SQL Server Database Engine. Obviously, this is one of the places where one should go and look for the row count information. A table with a clustered index will have at least one entry for that index in the catalog view – sys.sysindexes. This entry will have the count of the number of rows in the index, which in-turn is equal to the number of rows in the table (for SQL 2008 and up, a clustered index cannot be filtered and hence clustered indexes will always hold the information for the entire table).
SELECT si.rows AS 'RowCount'
FROM sys.sysindexes si
WHERE si.id = OBJECT_ID('HumanResources.Employee') AND si.indid < 2
Comparing against the conventional options, we find that this option is faster!
However, please note that the catalog views may report stale data.
Using DMVs
Mr. Pinal Dave (blog) provides a great method of accurately determining the number of rows in a table with the use of DMVs in his post here.
His query has been modified to only look for the HumanResources.Employee table below:
SELECT sc.name + '.' + ta.name TableName, SUM(pa.rows) RowCnt
FROM sys.tables ta
INNER JOIN sys.partitions pa ON pa.OBJECT_ID = ta.OBJECT_ID
INNER JOIN sys.schemas sc ON ta.schema_id = sc.schema_id
WHERE ta.is_ms_shipped = 0 AND
pa.index_id IN (1,0) AND
sc.name = 'HumanResources' AND
ta.name = 'Employee'
GROUP BY sc.name, ta.name
ORDER BY SUM(pa.rows) DESC
This method falls behind in performance when searching for row counts of individual tables as demonstrated below. However, without the schema name and the table name filters added in the WHERE clause, this is a handy script to get the row counts for all tables in a database. As a matter of fact, I have verified with the help of the SQL Profiler that Microsoft also uses the sys.partitions DMV to fetch the row counts when showing them on the Object Explorer Details window. (If you would like to know more about the Object Explorer Details window, refer my posts – Part I and Part II, which are part of my series on the Underappreciated features of Microsoft SQL Server)
I would recommend this to be in the "tool-kit” of each and every database developer, administrator and operator.
Using Aggregation Functions
The following method, from Madhivanan’s post (blog) (read this post here) is quite interesting:
SELECT SUM(1) AS ‘RowCount’ FROM HumanResources.Employee
Performance-wise, this method is same as the well-known COUNT(*) or COUNT(1), but it’s interesting because it uses an aggregation function to get the row count.
Using System Stored Procedures
This one again comes from Madhivanan’s post, with a small extension from Books-On-Line.
Microsoft SQL Server comes with a big list of system functions (the most common ones being sp_help, sp_executesql and many others which we use on a regular basis). It is no surprise then that we also have a system function to provide information on the number of rows in a table.
Rows in a table occupy space, and hence, it is fit that the system stored procedure responsible to predict the space usage is also responsible to predict the number of rows in a table.
--Default version, may return stale results
EXEC sp_spaceused @objname=N'HumanResources.Employee';
--Force update of usage statistics, will always return current usage
EXEC sp_spaceused @objname=N'HumanResources.Employee', @updateusage=N'TRUE';
Here’s the catch: System stored procedures ALWAYS return an INT. If you have rows that exceed the range of an INT in number, do not use these.
Also, this is slower in performance when compared to the conventional methods mentioned above.
Read more on sp_spaceused on Books-On-Line at: http://msdn.microsoft.com/en-us/library/ms188776.aspx
The next method that also uses a system function is the one below which came from a friend:
SELECT COUNT(@@ROWCOUNT) FROM HumanResources.Employee
This method performs equally well as the conventional ones.
Summary
Summarizing, I would attempt to present a small table that would serve as a ready look-up for each of these methods. I hope you find it useful.
| Method |
Performance (Time) |
Accuracy |
Conventional
(SELECT COUNT(*) and others) |
No |
Yes |
Catalog Views
(sys.sysindexes) |
Yes |
Approximate |
| DMVs |
Partial* |
Yes |
| Aggregate Functions |
No |
Yes |
| System Stored Procedures |
No |
Approximate |
(*Partial = Fares low on performance for individual tables, but scores high for a group of tables).
There are a couple of points which I would like the reader to keep in mind when gathering data for row count analysis:
- System stored procedures will always return an INT value
- Size information is not maintained for non-clustered indexes – please exercise due diligence before using these methods with non-clustered indexes (or tables without clustered indexes, i.e. heaps)
- Row counts may be incorrect for large tables. For their accuracy, always update statistics by using the following before attempting to fetch an accurate row count:
--Update the table usage statistics
DBCC UPDATEUSAGE ('AdventureWorks2008','HumanResources.Employee')
GO
--Per Books On Line:
--Reports and corrects inaccuracies in the sysindexes table,
--which may result in incorrect space usage reports by the sp_spaceused system stored procedure.
Computing Row Counts for a range of values
When working with a range of values, getting the row counts is a bit difficult and has very limited options. To the best of my knowledge, for users using SQL Server 2005, we can only use the conventional queries:
SELECT COUNT(*) FROM HumanResources.Employee WHERE BirthDate BETWEEN '1950-01-01' and '1960-01-01'
However, for users using SQL Server 2008 and up, the option to create a filtered index (if the range is pre-defined) and then using the catalog view method is always open.
And finally…
Now for the big question – which of the above methods are my favourites? Here’s my choice:
- Conventional methods if I need the accurate row count for a single table
- Catalog Views if I only need approximate row counts
- For fetching row counts of all tables in a database or a schema, I would go in for Pinal’s script
Do you use any other methods? If yes, do let me know – I will publish the alternatives with due credit. Also, if you use any other methods to get the row counts for range of values, please do share your solutions – they will also be published with your name as the contributor.
Until we meet next time,
Be courteous. Drive responsibly.