Choosing a Natural key or Surrogate key depends on These below factors.
If you have a column which have below characterstics, we can choose that column as natural key.
- Its Unique
- Its an Integer or BigInt
- Its never going to update.
If the column doesn't have above characterstics, I will choose Surrogate key
These are pros and cons of Surrogate key:
- No Relation with Business Logic and
hidden from User.
- Fast in joins as 4 bytes(in case of
Int) or 8 bytes(in case of big int)
- Less storage space when the table is
related with multiple tables, as
every related table store the value.
1.It will leads to some additional join in some scenarios. assume, you have a table Customers(SSN, CustID int ). CustId is choosen as Primary key,assume another table CustomerExpenses, you will be relating CustID with this table.
So, now if we need a query "Give me SSN Numbers having Customer Expenses", We need to join the 2 tables.
If we choose "SSN" as key, We don't need an additional join here, as SSN is related with Expenses table instead of CustID field.
2.An additional index is required, as the Primary key always creates an index.
3.Cannot be used in Search
4.As Surrogate key choosen, there won't be any index on "SSN" field. IF we create an additional index on "SSN" field,it will take additional storage and slows inserts/updates/deletes
Pros and Cons of Natural Key
- No additional index is required.
- Can be used in Search.
- If for some reason, business
requirements changes, needs to do
updates in multiple tables.
- In case, if it is having more
length, it will take so much
additional storage, as it will be
stored in its related tables.
- If it is string, joins will become
Replied on Oct 7 2010 1:13AM
basically this is "IT DEPENDS" situation but in most cases I would like to go for "Surrogate" key over "Natural" key but this is my own choice and there could be endless debate on this topic as both has their own Pro and Cons. As my choice is Surrogate key, I would always, in most cases, prefer
IDENTITY in SQL Server. It is a small column and it is useful for data warehousing (unlike natural keys), guarantees uniqueness (only if you're careful!), and is completely automated by the system. however there could be debate on this too. :)
A surrogate key is basically making row unique which has no connection to the data and we can consider this attribute as one of the cons also because it has no connection to the data. we can have two rows with the exact same data in all columns except the key column. This is usually handled at the application side and is an acceptable downside.
A natural key is a row identifier composed of data that uniquely describes data using its own attributes. PAN Number, SSN Number, Passport number could be natural key.
- Totally separate stuff and end user
doesn't need to know it or change
- optimal join / index performance
- easy to reference as it is light
- highly useful for data warehousing
- native data type of the OS, and easy
to work with in all languages
- generated automatically and not
- An additional index is needed.
- nobody can assume it, only retrieved after insert.
- Generally search happens over fields
related to data not on surrogate
keys which has no relation to data.
- Always requires a join when browsing
the child table so you have to have
additional join in many cases.
- No additional Index.
- Can be used as a search key.
- Some of the natural key's value gets
change over the period of time (not
always) so needs to update multiple
tables and indexes
- Most of the Natural keys are not
numeric by default so slower join
- Locking contentions can arise if
using application driven generation
mechanism for the key.
BTW, I use surrogate keys, normally integer identity as I told you above too. Though at some places I would like to go for natural key for eg in country table where I would like to go for ISO standard codes but it might change also.
So, in short, this is all about "DEPENDS". :)
Replied on Oct 7 2010 2:32AM
Natural Keys should be used when they exists (which is rare) else it is surrogate.
Edit: On a data-warehouse only surrogates
Replied on Oct 7 2010 2:51AM
Surrogate Key: A key with no business meaning. This will just make the row unique but no connection to the given data. Ex: When you set an identity column, normally its no way related to data but still has unique incremental value for each row.
Dataware houses use this artificial or identity key for the tables primary keys. In ORACLE it is ORACLE Sequence . In SQLServer its identity column. Indexing on Surrogate key is easy. Normally it is used when that column is not changing or not getting updated. They can be implemented using UUID (Universally lly Unique IDentifiers), GUID (Globally Unique IDentifiers), (Persistent Object ID entifiers) POID or an identity value.
Natural Key: A key that is formed using attributes that already exists in the real world. Ex: SSN for US Citizens. PAN Number, Pass Port Number as they are always unique and exist in real time. Natural keys are by default a subject to change which is a disadvantage for a row identifier.
Most databases a row identifier is basis for the clustered index and non-clustered indexes. But natural keys are subject to change. When the clustered index key is changed ALL indexes have to be rebuilt since non-clustered indexes contain the full key of the clustered index. So every time the natural key, which is also a clustered index changes, all indexes have to be rebuilt. And this is not including changing the actual data type or it’s size, jut the key value.
Tip: Never use any business logic to the surrogate key other than simple CRUD operations.
When to use what?
Well. It depends on the context and the data. Depending on the data , we need to examine these questions for a column
Does the primary key is unique? - Yes
Does it apply to all rows? -Yes
Is it minimal? -Yes
Is it stable over time without changing? -Yes.
Then I will go for a natural key. If the key do not meet any of the above criteria then its better to use surrogate key.
I normally prefer using surrogate keys as they have broader perspective and number of advantages compared to natural keys depending on the context.
There is nice article on SqlServer magazine regarding this http://www.sqlmag.com/article/data-modeling/surrogate-key-vs-natural-key.aspx
Replied on Oct 7 2010 3:52AM
When interpreting your question as "When will you use a Surrogate Key and when will you use a Natural Key as a primary key?" I come to this answer:
"When in doubt use a Surrogate Key!"
Most of the Natural Keys do not fulfill the following requirements:
- not nullable
The first two requirements are important for performance (primary key is used as reference through foreign keys and in all nonclustered indices; depending on the size of the Natural Key the extra amount of space in the physically table and the extra index for an int/bigint-column will be very few in comparision with the space you are saving with every further index and every foreign key).
Uniqueness is required technically for primary keys and is not fulfilled in DWH-tables which consisting of slowly changing dimensions.
Even when all of the requirements are fulfilled, my experince tells me that there is never a guarantee that the business user will not change the Natural Key one time in the future. If this happens you get in troubles if you have used the Natural Key as a primary key (or got at least a lot of work to do in changing the primary key in the table, changing the foreign key in all other tables and maybe even correcting all queries). Even with above mentioned currencies I've participated in debates between business users if ISO-standard is a good enough standard for their requirements or changing from ISO-currency to their own currency-codes ... (fortunately I had an Surrogate Key on currency table and could stay calm and relaxed).
But even when there is a Surrogate Key I would recommend to create an unique index on the "Natural Key"-columns in the table in general.
When interpreting your question as "When will you use a Surrogate Key and when will you use a Natural Key for joining or in where/having clauses?" I come to this answer:
I would always use the primary key (either Surrogate Key or Natural Key; s. above) for joining within the same database and always use the Natural Key for where/having clauses and for joining throughout database borders.
Replied on Oct 7 2010 4:07AM
The following questions need all to be answered with "Yes" in order to use a natural key. As soon as there is one "No", a surrogate key should be considered.
And additionally: The natural key should be ascending or descending to avoid page splits.
A second source loks at it vice versa:
Create a Surrogate if
From my point of view, the anwser is "It depends." But a natural key always has to face the risk of a sudden change in terms of stability over time (for example: tax number in Serbia changed).
Therefore, most of the time a surrogate key is more appropriate.
Replied on Oct 7 2010 4:58AM
I’m a proponent of the use of surrogate keys in general because natural keys usually do not be confirmed to the rules of Primary Key.
In practical life we try to use natural keys but usually we can't (even though in specific cases we do use natural key. Many-to-many break up table, Look-up tables and configuration tables are examples of those cases) and that can be of many reasons like:
- The natural key value may be changed
by the user.
- The natural key is a composite key
(I will not discuss why composite
key may affects join performance, or
why if the PK is the clustered index
then it should not be on composite
key. That is another story).
- The uniqueness of the natural key is
not guaranteed (I do not mean the
user opinion because I have got into
many situations where the user
changes his mind).
To sum up the natural key and the surrogate key both exists and used. But the surrogate key is the winner in most cases, and if you are not aware of the details of choosing PK then you should opt to use surrogate key (do not forget to but unique constraint on the natural key).
Replied on Oct 7 2010 8:11AM
I will not go into the basic explanation but will address this issue with performance point of view.
Surrogate keys are artificial keys.
Natural keys are logical meaningful keys to the table.
Use natural keys when they are small in width which make sense in the building index on it. If natural key is too large it will build large width index - an index with large width will have lots of depth level - which will eventually lead to reduce the performance. You can read more about this over here: http://blog.sqlauthority.com/2010/07/04/sql-server-index-levels-page-count-record-count-and-dmv-%C2%A0sys-dmdbindexphysical_stats/
Additionally, there is a limit regarding how many columns one can have in the index level and the wide of the column. There are cases when natural keys are very large and are not the right fit for the index, this are the scenario when I will go for surrogate keys.
Replied on Oct 7 2010 9:50AM
use Surrogate keys if you know that there may be chance of change in your Composite primary key.As if there is change in your Key then you have to change foreign key reference in your whole database).
use natual key If you know that Your composite primary will not be violate in future.
Replied on Oct 7 2010 10:07AM
In fact, both should be used – Surrogate key as primary key with clustered index and natural key as unique key with non-clustered index. If the natural key is small and does not undergo updates and if the table is reasonably small with limited non-clustered index, we may go with natural key as primary key with clustered index otherwise, surrogate key is recommended which is created just for SQL performance and space, and which should not be exposed to business use at all.
Replied on Oct 7 2010 10:54AM