Getting Started with Adobe After Effects - Part 6: Motion Blur


Upload Image Close it
Select File

Nakul Vachhrajani is a Technical Specialist & Systems development professional with iGATE. He holds a MCTS (SQL Server 2008: Implementation & Maintenance)
Browse by Tags · View All
#SQLServer 305
SQL Server 304
Administration 252
DBA 241
T-SQL 234
#TSQL 232
Development 226
Tips 216
Guidance 148
Best Practices 119

Archive · View All
April 2011 14
March 2012 11
December 2011 11
March 2011 11
December 2012 10
October 2011 10
January 2011 10
September 2013 9
January 2013 9
November 2012 9

Measuring the number of rows in a table – are there any alternatives to COUNT(*)?

Apr 4 2011 12:00AM by Nakul Vachhrajani   

Today, we will talk about a slightly different, yet very common subject. We will talk about what are the different ways to measure the number of rows in a table.

I found a couple of posts over the Internet, and I will reference them as and when I demonstrate the associated methods. This post basically is an assimilation of these methods and attempts to perform a comparative analysis to categorize them in terms of:

  • Performance (in terms of time)
  • Accuracy

The most common methods

The most common of all methods is the one that every person who has even come in contact with Microsoft SQL Server for even 5 minutes knows about. It is the very generic:

SELECT COUNT(*) FROM HumanResources.Employee

Other similar options are:

SELECT COUNT(1) FROM HumanResources.Employee
SELECT COUNT(BusinessEntityID) FROM HumanResources.Employee

Some developers believe that COUNT(1) or using the 1st primary key column is faster for the COUNT() function, however, that is untrue. Here’s the proof:

image

If you have tables with rows greater than the range of INT data type, you may use COUNT_BIG() instead.

Using Catalog Views

Microsoft SQL Server comes with a few catalog views, which return information that is used by the SQL Server Database Engine. Obviously, this is one of the places where one should go and look for the row count information. A table with a clustered index will have at least one entry for that index in the catalog view – sys.sysindexes. This entry will have the count of the number of rows in the index, which in-turn is equal to the number of rows in the table (for SQL 2008 and up, a clustered index cannot be filtered and hence clustered indexes will always hold the information for the entire table).

SELECT si.rows AS 'RowCount'
FROM sys.sysindexes si
WHERE si.id = OBJECT_ID('HumanResources.Employee') AND si.indid < 2

Comparing against the conventional options, we find that this option is faster!

image

However, please note that the catalog views may report stale data.

Using DMVs

Mr. Pinal Dave (blog) provides a great method of accurately determining the number of rows in a table with the use of DMVs in his post here.

His query has been modified to only look for the HumanResources.Employee table below:

SELECT sc.name + '.' + ta.name TableName, SUM(pa.rows) RowCnt
FROM sys.tables ta
INNER JOIN sys.partitions pa ON pa.OBJECT_ID = ta.OBJECT_ID
INNER JOIN sys.schemas sc ON ta.schema_id = sc.schema_id
WHERE ta.is_ms_shipped = 0 AND 
      pa.index_id IN (1,0) AND
      sc.name = 'HumanResources' AND
      ta.name = 'Employee'
GROUP BY sc.name, ta.name
ORDER BY SUM(pa.rows) DESC

This method falls behind in performance when searching for row counts of individual tables as demonstrated below. However, without the schema name and the table name filters added in the WHERE clause, this is a handy script to get the row counts for all tables in a database. As a matter of fact, I have verified with the help of the SQL Profiler that Microsoft also uses the sys.partitions DMV to fetch the row counts when showing them on the Object Explorer Details window. (If you would like to know more about the Object Explorer Details window, refer my posts – Part I and Part II, which are part of my series on the Underappreciated features of Microsoft SQL Server)

I would recommend this to be in the "tool-kit” of each and every database developer, administrator and operator.

image

Using Aggregation Functions

The following method, from Madhivanan’s post (blog) (read this post here) is quite interesting:

SELECT SUM(1) AS ‘RowCount’ FROM HumanResources.Employee

Performance-wise, this method is same as the well-known COUNT(*) or COUNT(1), but it’s interesting because it uses an aggregation function to get the row count.

Using System Stored Procedures

This one again comes from Madhivanan’s post, with a small extension from Books-On-Line.

Microsoft SQL Server comes with a big list of system functions (the most common ones being sp_help, sp_executesql and many others which we use on a regular basis). It is no surprise then that we also have a system function to provide information on the number of rows in a table.

Rows in a table occupy space, and hence, it is fit that the system stored procedure responsible to predict the space usage is also responsible to predict the number of rows in a table.

--Default version, may return stale results
EXEC sp_spaceused @objname=N'HumanResources.Employee';

--Force update of usage statistics, will always return current usage
EXEC sp_spaceused @objname=N'HumanResources.Employee', @updateusage=N'TRUE';

image

Here’s the catch: System stored procedures ALWAYS return an INT. If you have rows that exceed the range of an INT in number, do not use these.

Also, this is slower in performance when compared to the conventional methods mentioned above.

image

Read more on sp_spaceused on Books-On-Line at: http://msdn.microsoft.com/en-us/library/ms188776.aspx

The next method that also uses a system function is the one below which came from a friend:

SELECT COUNT(@@ROWCOUNT) FROM HumanResources.Employee

This method performs equally well as the conventional ones.

Summary

Summarizing, I would attempt to present a small table that would serve as a ready look-up for each of these methods. I hope you find it useful.

Method Performance (Time) Accuracy
Conventional
(SELECT COUNT(*) and others)
No Yes
Catalog Views
(sys.sysindexes)
Yes Approximate
DMVs Partial* Yes
Aggregate Functions No Yes
System Stored Procedures No Approximate

(*Partial = Fares low on performance for individual tables, but scores high for a group of tables).

There are a couple of points which I would like the reader to keep in mind when gathering data for row count analysis:

  • System stored procedures will always return an INT value
  • Size information is not maintained for non-clustered indexes – please exercise due diligence before using these methods with non-clustered indexes (or tables without clustered indexes, i.e. heaps)
  • Row counts may be incorrect for large tables. For their accuracy, always update statistics by using the following before attempting to fetch an accurate row count:
--Update the table usage statistics
DBCC UPDATEUSAGE ('AdventureWorks2008','HumanResources.Employee')
GO
--Per Books On Line:
--Reports and corrects inaccuracies in the sysindexes table, 
--which may result in incorrect space usage reports by the sp_spaceused system stored procedure.

Computing Row Counts for a range of values

When working with a range of values, getting the row counts is a bit difficult and has very limited options. To the best of my knowledge, for users using SQL Server 2005, we can only use the conventional queries:

SELECT COUNT(*) FROM HumanResources.Employee WHERE BirthDate BETWEEN '1950-01-01' and '1960-01-01'

However, for users using SQL Server 2008 and up, the option to create a filtered index (if the range is pre-defined) and then using the catalog view method is always open.

And finally…

Now for the big question – which of the above methods are my favourites? Here’s my choice:

  1. Conventional methods if I need the accurate row count for a single table
  2. Catalog Views if I only need approximate row counts
  3. For fetching row counts of all tables in a database or a schema, I would go in for Pinal’s script

Do you use any other methods? If yes, do let me know – I will publish the alternatives with due credit. Also, if you use any other methods to get the row counts for range of values, please do share your solutions – they will also be published with your name as the contributor.

Until we meet next time,

Be courteous. Drive responsibly.

Tags: Development, Administration, T-SQL, Performance Tuning, Tips, SQL Server, #SQLServer, Best Practices, DBA, Myths,


Nakul Vachhrajani
4 · 36% · 11645
14
 
0
Lifesaver
 
0
Refreshed
 
 
0
Incorrect



Submit

21  Comments  

  • Excellent - Super!

    commented on Apr 3 2011 11:53PM
    Pinal Dave
    151 · 1% · 326
  • Couple of points

    1. Comparing the performance of the conventional methods with the ones using system table is meaningless since the cost for the later will always be the same while the cost of the former one varies based on the number of rows, the index used and the width of the row.

    2. In the approach that uses SELECT COUNT(@@ROWCOUNT) , @@ROWCOUNT has no meaning even though it gives the illusion that it does. It is same as SELECT COUNT(1) or SELCT COUNT('AnythingGoesHere')

    3. When you use select COUNT(*) or a constant like COUNT(1) it gives the optimizer the choice of choosing the narrowest index available. If you specify a column name, the optimizer is forced to use an index on the coulmn if one exists, or the clustered index.

    commented on Apr 5 2011 6:54AM
    Roji Thomas
    803 · 0% · 36
  • The fact of the matter is that almost every application still uses the conventional methods. Ultimately, this leads to performance issues because where we just need an approximate count, the application is trying to be accurate by doing a COUNT(*) or a COUNT(1). As for the COUNT(@@ROWCOUNT) goes, I have seen production code that uses this. Because it was being used by somebody somewhere, it found it's way on my analysis.

    Also, for COUNT(*) v/s COUNT(1) v/s COUNT('ColumnName'), I have a SQL 7 instance in my test bed, and the difference was seen there - but with SQL 2005 and up I could not see any difference in behaviour.

    At the end of the day, the point is to choose the right method based on whether performance is more important or accuracy - there's always a tradeoff and this stands true when you need to fetch the number of rows in a table. Hence the comparison between all the available methods (irrespective of whether they are conventional or not) has been presented.

    commented on Apr 6 2011 1:31AM
    Nakul Vachhrajani
    4 · 36% · 11645
  • Never knew that these many methods exists for finding number of rows in a table. I always used count(*) for count(field) , good to know these . Thanks

    Javin

    commented on Apr 29 2011 9:07AM
    Javin Paul
    1774 · 0% · 11
  • You are most welcome, Javin. Good to know that you enjoyed reading the post. Thanks!

    commented on Apr 30 2011 11:03AM
    Nakul Vachhrajani
    4 · 36% · 11645
  • Good one

    commented on May 27 2011 2:44AM
    Pravin Patel
    408 · 0% · 100
  • Simply superb.. Thanks

    commented on Nov 1 2011 11:01PM
    Hardik Doshi
    20 · 9% · 2864
  • My team frequently needs to review the row count on several huge tables and we always use the catalog views. It is true that we do not need the exact number. A difference of a few rows is always acceptable. Reading from catalog views has become our favorite choice because the row count from tables having several billion rows can be fetched in a few milliseconds.

    Recently I have been struggling with a similar requirement on service broker. We need to frequently monitor the number of rows in sys.transmission_queue and the number of rows waiting in one or more service broker queues. A SELECT COUNT(*) is not desirable because it may takes several minutes to return when those queues are really huge.

    I was looking for ways to read this from the system tables and Remus Rusanu helped with the following queries.

    Total number of rows in each of the service broker queues

    select 
    q.name, p.rows
    from sys.objects as o
    join sys.partitions as p on p.object_id = o.object_id
    join sys.objects as q on o.parent_object_id = q.object_id 
    where p.index_id = 1
    

    Total number of rows in the transmission queue

    select 
    p.rows
    from sys.objects as o
    join sys.partitions as p on p.object_id = o.object_id
    where o.name = 'sysxmitqueue'
    
    commented on Nov 2 2011 2:47AM
    Jacob Sebastian
    1 · 100% · 32235
  • Thank-you for sharing these queries and experiences, Jacob. I am sure they will be of great help to all who use Service Broker.

    commented on Nov 5 2011 12:38AM
    Nakul Vachhrajani
    4 · 36% · 11645
  • Great Work Nakul Sir.

    commented on Jan 2 2012 11:21PM
    Nirav
    37 · 5% · 1593
  • Hi Nakul...I have doubt on following line .. "Catalog Views if I only need approximate row counts " Can you please explain what is the exact meaning of approximate view ?So it is not relaible right ? M little confused

    commented on Jan 19 2012 12:10AM
    satyajit
    126 · 1% · 402
  • Hi Nakul, Can you please explain "Catalog Views if I only need approximate row counts " what is exactly meant by approximate row counts

    commented on Jan 19 2012 12:12AM
    satyajit
    126 · 1% · 402
  • Satyajit,

    When we use catalog views to fetch the record counts, you would observe that we looked at the sys.indexes catalog view. What this means is that we are looking at the metadata associated to the table. However, in cases when we perform bulk loads into a table with a very large number of rows (in millions), the statistics might become stale and the indices may not be updated instantaneously. In such cases, fetching row counts via the catalog views would yield the last known results, making them just an approximate count.

    Even the placement of index creation with respect to the load point may impact this count due to stale satistics. I touch upon this here: http://beyondrelational.com/blogs/nakul/archive/2011/11/07/sql-server-performance-best-practice-create-index-before-or-after-a-data-insert.aspx

    My recommendation would be to create a temp table, and pump in gigantic amounts of data into it, while noting the row counts as you go. If the confusion still remains, do let me know.

    commented on Jan 19 2012 11:53AM
    Nakul Vachhrajani
    4 · 36% · 11645
  • Hi Nakul,

    SELECT COUNT(BusinessEntityID) FROM HumanResources.Employee,

    This method may be not accurate because it ignore the rows for BusinessEntityID column which having NULL values. In this case COUNT(*) or COUNT(1) gives correct result.

    commented on Mar 14 2012 1:37AM
    Paresh Prajapati
    6 · 23% · 7465
  • @Paresh: Most implementations that I have seen (wherein we need a count on the non-key column) for getting the row count involve counting the primary key values (COUNT(BusinessEntityID)) after filtering on the field required. The query would be like:

    USE AdventureWorks2008R2
    GO
    SELECT COUNT(BusinessEntityID)
    FROM HumanResources.Employee
    WHERE JobTitle = 'Design Engineer'
    

    Because counts should be generally be taken based on Primary Keys (which can never be NULL), the method is accurate.

    commented on Mar 14 2012 2:06AM
    Nakul Vachhrajani
    4 · 36% · 11645
  • Yes Nakul, agreed with the example you given. My assumption was for the non-primary key columns used in aggregation.

    commented on Mar 14 2012 2:24AM
    Paresh Prajapati
    6 · 23% · 7465
  • Good Post

    commented on Apr 4 2012 5:12AM
    meechan
    3071 · 0% · 2
  • Nakul,

    last month actually i had come across loading data into Inventory Historic tables through SSIS package which is having huge number of records.

    in between loading i have used this code to know the how many records are loaded :exec sp_spaceused 'employees' (i hope already we have discussed couple of days before)

    Question is : IS THIS ACCURATE???????? or which one is best way?????

    Thanks in advance....

    commented on Jan 18 2013 4:26AM
    Bala Krishna
    83 · 2% · 676
  • As shown above, sp_spaceused is neither accurate nor is good from a performance standpoint. If the number of records in the table are higher than the limit of INT, the query would fail.

    If accuracy is not a major concern, I would recommend the use of catalog views.

    commented on Jan 19 2013 10:27AM
    Nakul Vachhrajani
    4 · 36% · 11645
  • Thanks for your reply

    commented on Jan 20 2013 10:09PM
    Bala Krishna
    83 · 2% · 676
Previous 1 | 2 Next

Your Comment


Sign Up or Login to post a comment.

"Measuring the number of rows in a table – are there any alternatives to COUNT(*)?" rated 5 out of 5 by 14 readers
Measuring the number of rows in a table – are there any alternatives to COUNT(*)? , 5.0 out of 5 based on 14 ratings
    Copyright © Rivera Informatic Private Ltd Contact us      Privacy Policy      Terms of use      Report Abuse      Advertising      [ZULU1097]