At the time being, SQL Server (version 2008 R2) still supports no solution for a general purpose sequencing structure. This may sound cryptic, but I’m talking about the ordinary “Tally Table”, a structure that can be used to achieve the same functionality. Many posts have been written about this matter, so I’m going to try to wrap up the important things and give some additional concerns about the actual implementations that use it. First I’m going to talk about creating the tally structure, what’s worth and what’s not.
Building the tally structure
Generally, there are few ways one can build the tally structure. Other possibilities are their derivatives - if there is one that’s not, please let me know and I’ll include it here.
To create a tally, we can use:
- a loop (while),
- recursive CTE,
- cross joining system tables and
- cross joining constant expressions
Roughly said, first two are really bad choices, because of RBAR style processing.
What will happen there is that SQL Server engine is going to work on one a row at a time.
Let’s say we want to fill the following table with numbers 1..100 000:
IF OBJECT_ID('Tally') IS NOT NULL
DROP TABLE Tally;
CREATE TABLE Tally (N int NOT NULL);
ALTER TABLE Tally
ADD CONSTRAINT PK_Tally PRIMARY KEY CLUSTERED (N);
WHILE loop
If WHILE loop is used there will be no loop performance optimization, so it’s not the good choice:
SET NOCOUNT ON; -- we can save a few % if we remove messages from resultset
TRUNCATE TABLE Tally;
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
DECLARE @Counter int;
SET @Counter = 1;
WHILE @Counter <= 100000
BEGIN
INSERT INTO Tally(N)
VALUES (@Counter);
SET @Counter = @Counter + 1;
END
Notice the CHECKPOINT + DBCC DROPCLEANBUFFERS instructions. I use these to reach as clean testing environment as I can get. However, you might not get perfectly clean buffer cache (see this) but for these tests I believe it will do.
Execution plan shows the ASSIGN/COND pair that comes with WHILE loop, and is really slow.

And Management Studio says execution lasts 28 s, not a great result.
One would think that a FAST_FORWARD CURSOR variant would do better, and would be right, if we only could iterate something using that cursor. But the task IS yet to build something to iterate, so we can’t use a cursor.
Recursive CTE (loop)
Another possibility is recursive CTE:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
TRUNCATE TABLE Tally;
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
;WITH TallyCte(n) AS (
SELECT n = 1
UNION ALL
SELECT n = n + 1 FROM TallyCte WHERE n < 100000)
INSERT INTO Tally(N)
SELECT n
FROM TallyCte
ORDER BY n
OPTION(MAXRECURSION 0); -– do not limit the recursion depth; else you get the "maximum recursion 100" error
First thing I do here is turn on TIME and IO statistics; I couldn’t do that for the loop, because I would get a result for each pass, and there is 100 000 passes. Here there is roughly said just one pass (internally there are 100 000) and I do get good stats.
To explain the query, in anchor part (SELECT n = 1) we start numbering and recursive part (SELECT n = n + 1 FROM TallyCte WHERE n < 100000) steps up until it reaches the limit given by WHERE clause.
On my machine, it executes ~10x faster than first query (CPU time = 1903 ms, elapsed time = 2830 ms).
Let’s examine the interesting part of exec plan:

- (top right branch): What happens here is that anchor is created (Constant Scan, Compute Scalar) and sent to UNION ALL (Concatenation).
- (left branch): Then the row is sent to lazy spool, which builds index structure and outputs the same row to Table Spool on the right…
- (middle right branch): …which in turn sends that row to Compute (n=n+1) and joins…
- (bottom right branch): …with the recursive SELECT expression using a filter (WHERE n < 1000000)
Spooling happens on and on, until the recursive part exceeds and stops sending rows to the spool. And here is our loop.
So, it not exactly a RBAR, but can be described as a very optimized RBAR.
Cross joining system tables
This is actually a nice approach to the tally problem. Let’s see what we are talking about here:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
TRUNCATE TABLE Tally;
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
INSERT INTO Tally(N)
SELECT TOP 1000000
row_number() OVER(ORDER BY (SELECT 0)) AS N
FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2
Again, we measure the execution time and get (CPU time = 468 ms, elapsed time = 1305 ms). This is almost 3x faster than the recursive CTE.
Since we are using 100 000 rows, there is really no big difference. However, if we try it with 1 000 000 rows, the difference would be obvious. On my machine it was 27s (r.CTE) : 14s (cross join)
Now, we’re not actually joining system tables but system views – because of that, another interesting part of the execution plan is going to be visible, but the time will not be spent there:

Most of the time SQL server will be spending on sorting the resultset prior to inserting it to our tally table:

Cross joining constant expressions
The trick here is not to use any existing structure to create the tally table, but to create the basic element “on the fly” and use for the cross join to get an adequate number of rows.
Let’s see:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
TRUNCATE TABLE Tally;
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
;WITH E1(N) AS (
SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 1)
,E2(N) AS (SELECT 1 FROM E1 a, E1 b)
,E4(N) AS (SELECT 1 FROM E2 a, E2 b)
,E6(N) AS (SELECT 1 FROM E4 a, E2 b)
INSERT INTO Tally(N)
SELECT TOP 100000
N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM E6
So, our structure is 10-based - actually it has exactly 10 rows. Every higher cross join is done in next CTE and named after 10-exponentials (e.g. E4=E2*E2, or E6=E4*E2).
The execution time here is pretty good for 100 000 rows (CPU time = 421 ms, elapsed time = 1273 ms), and same for 1 000 000 rows (13 s).
Execution plan will give us some parallelisms, so I would expect some possibility of performance gain in multiprocessor environment; however, to prove that I would have to run some tests.
In the next part there will be a short discussion about the location and form of the structure and the impact on the performance.