-
Even I ask the same question during interview.
Primary Key and Clustered Index are two different things. Developers get confused because they usually misunderstand the following line "On a table when a Primary Key defined by default SQL Server creates a Unique Clustered Index on those column(s) where primary key is defined"
As we know that we can't have 2 clustered index on a table, When we create a Primary key where a Clustered index is already exist in that case , SQL Server creates a Unique Non Clustered Index.
Sometime I ask a different version of same topic question? i.e. What will happen if I wanted to create a Primary on a column on which already a clustered index exist?
Thanks & Regards,
Sandip Pani
commented on Aug 23 2012 1:00AM
|
-
Thank-you for sharing your views, Sandip. I do like the nice twist to the question. Thank-you for taking the time out and reading this post.
commented on Aug 23 2012 1:24AM
|
-
However there is a difference. When you create a table with column with Primary key defined, by default the column which is defined as Primary key will have clustered index. Now see what happens when you create this table
CREATE TABLE pkTest (Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
DummyId INT NOT NULL,
KeyName VARCHAR(20)
)
Column Id is automatically associated with a clustered index. Your case is valid only when you alter the table to have primary key constraint
commented on Aug 23 2012 6:03AM
|
-
@Madhivanan,
You are right,What you mentioned here is the default behavior.
Kindly refer the Conclusion section, Here It is mentioned that "Having a primary key does not imply that clustered index will exist on the primary key under all situations".
Thanks & Regards,
Sandip Pani
link text
commented on Aug 23 2012 6:21AM
|
-
@Madhivanan try this:
CREATE TABLE pkTest
(
[id] [int],
[DummyId] [int],
[KeyName] [varchar](20),
CONSTRAINT [PK__pkTest__Id] PRIMARY KEY NONCLUSTERED ( [id] )
)
There is no need for an ALTER TABLE, you can create a non-clustered primary key directly in the CREATE TABLE statement.
commented on Aug 23 2012 8:38AM
|
-
Here's the real question: Why have a clustered index on a column or columns OTHER than the Primary Key?
commented on Aug 23 2012 8:39AM
|
-
A clustered index can be defined on
any column of a table as long as the
column satisfies the criteria for a
clustered index
What exactly are the critieria for clustered indexes?
commented on Aug 23 2012 1:56PM
|
-
I have posted similar stuff in 2006 @ http://vadivel.blogspot.in/2006/03/primary-keys-without-clustered-index.html ...
commented on Aug 23 2012 9:32PM
|
-
Thank-you for sharing @Vadivel!
commented on Aug 24 2012 12:38PM
|
-
commented on Aug 25 2012 9:15AM
|
-
@Nakul, there are no "criteria" listed, only recommendations or situations where a clustered index should be considered.
commented on Aug 25 2012 9:30AM
|
-
@Marc
Here's the real question: Why have a
clustered index on a column or columns
OTHER than the Primary Key?
Well, for example, because primary key is multicolumn and large, so you risk inefficient INSERTs, many page splits, and having large, inefficient clustering key in every nonclustering index on the table. Acquaint yourself with the concept of surrogate keys.
commented on Aug 27 2012 2:10AM
|
-
@chojrak: After working with SQL Server for the past 18 years, including 5 years at Microsoft, I'm very familiar with the concepts (and worst practices) around primary keys, candidate keys and surrogate keys. I've spent a good portion of my career improving the performance of people who blindly create an [int] IDENTITY(1, 1) based prirmary key on every table
If your primary key is clustered, the risks you list are valid. If the primary key is nonclustered, none of the risks you list are valid.
Create your clustered index to satisfy your most common sorting requirements. Create your primary key to satisfy your most relevant uniqueness constraints. If they are the same column (or columns), use a clusterer primary key. If they are not, the most common sorting requirement with the least volitile data should be the clustered index.
commented on Aug 27 2012 8:52AM
|
-
@Marc
thus you just answered your very question. Offtopic - if possible, I'd like to know more about problems with those "[int] IDENTITY(1, 1) based prirmary key on every table". I have to confess that, although not blindly, I use them from time to time, and until now successfully :-)
commented on Aug 27 2012 9:49AM
|
-
@Chojrak11: my question was rhetorical... a question designed to get the audience to think. Here are some more rhetorical questions:
Problems with primary keys based on [int] IDENTITY(1,1)
Lets's say that you have a customer list, which is most often sorted by name.
Ask yourself:
A) What does a sorting operation on [CustomerName] look like when the primary key is clustered and based on [CustomerId]?
B) What does a sorting operation on [CustomerName] look like when the primary key is nonclustered and based on [CustomerID] and there is a clustered index on [CustomerName]?
Both A and B will work. Which will work faster?
Which will work faster when retrieving records based on a customer name (a lookup)?
Now let's say that you have a link table that links Vendors to the Products they sell. Products can be provided by multiple Vendors and Vendors sell multiple Products.
CREATE TABLE [Products]
(
[ProductId] [int] IDENTITY (1, 1),
[Description] [varchar](max)
)
CREATE TABLE [Vendors]
(
[VendorId] [int] IDENTITY (1,1),
[Name] [varchar](100)
)
CREATE TABLE [lnk_Vendors_Products]
(
[recordId] [int] IDENTITY(1,1),
[fkVendorId] [int],
[fkProductId] [int],
[StartDate] [date], -- date vendor starting offering product
[EndDate] [date] -- date vendor stopped offering product
)
Is it going to be best to create a clustered primary key on [lnk_Vendors_Products].[recordId] with a uniqueness constraint on ([fkVendorId], [fkProductId]) or just have a clustered primary key directly on ([fkVendorId], [fkProductId])?
In order to keep key proliferation to a minimum (any children of [lnk_Vendors_Products], should reference the primary key [recordId]. But the clustered index should go on ([fkVendorId], [fkProductId]). This allows children to reference only the [recordId], but allow for fast retrieval based on [fkVendorId].
Note: if you are searching for Products more often than Vendors, the clustered index should go on [fkProductId], [fkVendorId]; rather than the other way around.
commented on Aug 27 2012 10:47AM
|
-
@Marc Jellinek: Thank-you very much for the questions. They do get me to think hard and think more - which is always exciting. Also, thank-you very much for sharing some of the nuances around key design.
Regarding your note:
Note: if you are searching for
Products more often than Vendors, the
clustered index should go on
[fkProductId], [fkVendorId]; rather
than the other way around.
I am sure you remember, but for the benefit of others I would like to refer a previous post here which would highlight one of the many reasons behind the note: SQL Server – T-SQL - Best Practice - Composite Primary Keys – Performance (and accuracy) impact
This has been a very, very interesting discussion. Thank-you, all.
commented on Aug 27 2012 12:26PM
|
-
@Nakul, I'm not sure I understand the reference to the blog post about composite primary keys. For that matter, I don't understand what best practice it recommends (other than referencing ALL columns in a primary key when making a join)
commented on Aug 27 2012 1:20PM
|
-
@Marc: Yes, referencing ALL columns of the composite key is definitely a take-away and the other (I agree, it's not so obvious) point was that switching the order of the columns does have an impact on the way SQL Server would use the index defined on the key.
commented on Aug 27 2012 2:03PM
|
-
@Marc thank you for detailed answer. A lot of additional rhetorical questions in it :-)
So, this is an additional thing to consider while designing a table. In case of clustered keys on most often sorted columns that aren't ever-increasing on INSERT, we gain performance on /some/ queries, but one must be sure that amount of INSERTs (and associated splits) will not kill overall performance and not cause locks on high concurrency. Some performance can be gained by properly selecting fill factor, but then we increasing # of I/Os for reads. That said such a key may possibly be better for larger tables that are mostly read and sometimes updated, but for smaller transactional tables heavily INSERTed/DELETEd I'd use small, monotonic clustering key. But as always in SQL world, it'd be best to try and measure.
I'll do my homework and analyze this subject matter thoroughly.
Thanks again, I'd upvote you more times, but it doesn't let me.
commented on Aug 28 2012 1:37AM
|
-
Returning to the orignal theme of "default behaviour" for a moment... Adding a primary key to a table which is already clustered creates a new non-clustered
index by default.
-- The NONCLUSTERED keyword is optional in the DDL statement:
ALTER TABLE pkTest
ADD CONSTRAINT pk_pkTest_pkTestId PRIMARY KEY /* NONCLUSTERED */ (Id)
GO
This is true even when the primary key is added to the clustering key:
-- A redundant nonclustered index is created on the clustering key, DummyId:
ALTER TABLE pkTest
ADD CONSTRAINT pk_pkTest_pkTestId PRIMARY KEY /* NONCLUSTERED */ (DummyId)
GO
As you'd expect, you can't "promote" an already-existing clustering key to a primary key without dropping it first. eg:
ALTER TABLE pkTest
ADD CONSTRAINT pk_pkTest_pkTestId PRIMARY KEY CLUSTERED (DummyId)
GO
Fails with error:
Msg 1902, Level 16, State 3, Line 3
Cannot create more than one clustered index on table 'pkTest'. Drop the existing clustered index 'idx_pkTest_DummyId' before creating another.
Msg 1750, Level 16, State 0, Line 3
Could not create constraint. See previous errors.
commented on Aug 28 2012 3:57AM
|