Dimension is a term in data management and data warehousing that refers to logical groupings of data such as geographical location, customer information, or product information.
Slowly Changing Dimensions (SCDs) are dimensions that have data that changes slowly, rather than changing on a time-based, regular schedule.
The Slowly Changing Dimension transformation coordinates the updating and inserting of records in data warehouse dimension tables.
The Slowly Changing Dimension transformation provides the following functionality for managing slowly changing dimensions:
Matching incoming rows with rows in the lookup table to identify new and existing rows.
Identifying incoming rows that contain changes when changes are not permitted.
Identifying inferred member records that require updating.
Identifying incoming rows that contain historical changes that require insertion of new records and the updating of expired records.
Detecting incoming rows that contain changes that require the updating of existing records, including expired ones.
The Slowly Changing Dimension transformation supports four types of changes: changing attribute, historical attribute, fixed attribute, and inferred member.
Changing attribute changes overwrite existing records. This kind of change is equivalent to a Type 1 change. The Slowly Changing Dimension transformation directs these rows to an output named Changing Attributes Updates Output.
Historical attribute changes create new records instead of updating existing ones. The only change that is permitted in an existing record is an update to a column that indicates whether the record is current or expired. This kind of change is equivalent to a Type 2 change. The Slowly Changing Dimension transformation directs these rows to two outputs: Historical Attribute Inserts Output and New Output.
Fixed attribute changes indicate the column value must not change. The Slowly Changing Dimension transformation detects changes and can direct the rows with changes to an output named Fixed Attribute Output.
Inferred member indicates that the row is an inferred member record in the dimension table. An inferred member exists when a fact table references a dimension member that is not yet loaded. A minimal inferred-member record is created in anticipation of relevant dimension data, which is provided in a subsequent loading of the dimension data.The Slowly Changing Dimension transformation directs these rows to an output named Inferred Member Updates. When data for the inferred member is loaded, you can update the existing record rather than create a new one.
EXAMPLE WITH DETAIL EXPLATNATION TO UNDESTAND THE CONCEPT
Please consider following example.............
Yogesh is a customer with ABC Inc. he first lived in Mumbai,Maharastra. So, the original entry in the customer lookup table has the following record:
|
Customer Key
|
Name
|
State
|
|
1001
|
Yogesh
|
Mumbai,Maharastra
|
At a later date, he moved to Pune,Maharastra on January, 2003. How should ABC Inc. now modify its customer table to reflect this change? This is the "Slowly Changing Dimension" problem.
There are in general three ways to solve this type of problem, and they are categorized as follows:
Type 1: The new record replaces the original record. No trace of the old record exists.
Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people.
Type 3: The original record is modified to reflect the change.
We next take a look at each of the scenarios and how the data model and the data looks like for each of them. Finally, we compare and contrast among the three alternatives.
In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.
In our example, recall we originally have the following table:
|
Customer Key |
Name |
State |
|
1001 |
Yogesh |
Mumbai,Maharastra |
After Christina moved from Illinois to California, the new information replaces the new record, and we have the following table:
|
Customer Key |
Name |
State |
|
1001 |
Yogesh |
Pune,Maharastra |
Advantages:
- This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information.
Disadvantages:
- All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Yogesh lived in Mumbai,Maharastra before.
Usage:
About 50% of the time.
When to use Type 1:
Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes.
In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The newe record gets its own primary key.
In our example, recall we originally have the following table:
|
Customer Key |
Name |
State |
|
1001 |
Yogesh |
Mumbai,Maharastra |
After Yogesh moved from Mumbai to Pune, we add the new information as
a new row into the table:
|
Customer Key |
Name |
State |
|
1001 |
Yogesh |
Mumbai,Maharastra |
|
1005 |
Yogesh |
Pune,Maharastra |
Advantages:
- This allows us to accurately keep all historical information.
Disadvantages:
1) This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
2)This necessarily complicates the ETL process.
Usage: About 50% of the time.
When to use Type 2:
Type 2 slowly changing dimension should be used when it is necessary for the
data warehouse to track historical changes.
In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value.
There will also be a column that indicates when the current value becomes active.
In our example, recall we originally have the following table:
|
Customer Key |
Name |
State |
|
1001 |
Yogesh |
Mumbai,Maharastra |
To accommodate Type 3 Slowly Changing Dimension, we will now have the following
columns:
- Customer Key
- Name
- Original State
- Current State
- Effective Date
After Christina moved from Illinois to California, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2003):
|
Customer Key |
Name |
Original State |
Current State |
Effective Date |
|
1001 |
Yogesh |
Mumbai,Maharastra |
Pune,Maharastra |
15-JAN-2003 |
Advantages:
1) This does not increase the size of the table, since new information is
updated.
2)This allows us to keep some part of history.
Disadvantages:
1) Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Yogesh later moves to Delhi on December 15, 2003, the Pune information will be lost.
Usage:Type 3 is rarely used in actual practice.
When to use Type 3:
Type III slowly changing dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur for a finite number of time.
Hope everyone understood SDC concept from above example..............
(Type 6) Hybrid SCDs are combination of All SCD Types.
It may happen that in a table, some columns are important and we need to track changes for them i.e capture the historical data for them whereas in some columns even if the data changes, we don't care.
For such tables we implement Hybrid SCDs, where in some columns are Type 1 and some are Type 2.
This is one of grt Example for SCD Please Click:: http://www.learndatamodeling.com/sldim.htm
Refrence Link :: http://msdn.microsoft.com/en-us/library/ms141715.aspx
Genral Information from WIKI:: http://en.wikipedia.org/wiki/Slowlychanging_dimension
PDF :: http://www.scribd.com/doc/58598142/130/Hybrid-Slowly-Changing-Dimensions
Thanks
Yogesh