A slowly changing dimension (SCD) is one that appropriately manages the change of dimension members over time. It applies when the values of a business entity change over time, and not on a set schedule.
For example, a customer SCD will have a set of columns that can change and
a set of fields to track when the record was added and modified.
Types of SCD -
- Type 0 – Fixed Dimension
- No changes allowed, dimension never changes
- Type 1 – No History
- Update record directly, there is no record of historical values, only current state
- Type 2 – Row Versioning
- Track changes as version records with current flag & active dates and other metadata
- Type 3 – Previous Value column
- Track change to a specific attribute, add a column to show the previous value, which is updated as further changes occur
- Type 4 – History Table
- Show current value in dimension table but track all changes in separate table
- Type 6 – Hybrid SCD
- Utilise techniques from SCD Types 1, 2 and 3 to track change
Star schema design theory refers to common SCD types.
The most common are type 1 and type 2.
In practice, a dimension table may
support a combination of history tracking methods,
including type 3 and type 6. Confusingly, there is no SCD type 5 in commonly agreed definitions.
After you have implemented your chosen dimension type, you can then point your fact records at the relevant business or surrogate key. Surrogate keys in these examples relate to a specific historical version of the record, removing join complexity from later data structures.
A type 0 SCD the table remains the same. This means our existing reports will continue to show the same figures, maybe it is a business requirement that each customer is always allocated to the country they signed up from.
A type 1 SCD always reflects the latest values,
and when changes in source data are detected,
the dimension table data is overwritten.
This design approach is common for
columns that store supplementary values,
like the email address or phone number of a customer.
When a customer email address or phone number changes,
the dimension table updates
the customer row with the new values.
A type 2 SCD supports versioning of dimension members.
Often the source system doesn't store versions,
so the data warehouse load, process,
detects, and manages changes in a dimension table.
In this case, the dimension table must use
a surrogate key to provide
a unique reference to a version of the dimension member.
It also includes columns that to find
the date range validity of the version, for example,
StartDate and EndDate,
and possibly a flag column,
for example, IsCurrent,
to easily filter by current dimension members.
In order to support type 2 changes, we need to add four columns to our table:
· Surrogate Key – the original ID will no longer be sufficient to identify the specific record we require, we therefore need to create a new ID that the fact records can join to specifically.
· Current Flag – A quick method of returning only the current version of each record
· Start Date – The date from which the specific historical version is active
· End Date – The date to which the specific historical version record is active
A type 3 SCD supports storing two versions of a dimension member as separate columns.
The table includes a column for the current value of
a member plus
either the original or previous value of the member.
Type 3 uses additional columns, in this case,
current email to track one key instance of
history rather than storing
additional rows to track each change,
like in a type 2 SCD.
This type of tracking may be used for
one or two columns in a dimension table.
It is not common to use it
for many members of the same table.
It is often used in combination
with type 1 or type 2 members.
A type 4 SCD , has no change to existing table here, we simply update the record as if a Type 1 change had occurred. However, we simultaneously maintain a history table to keep track of these changes:
Our Dimension table reads:

Whilst Type 4 historical table is created as:

Depending on requirements, place both ID and Surrogate Key onto the fact record so as to optimise performance whilst maintaining functionality.
Separating the historical data makesthe dimensions smaller and therefore reduces complexity and improves performance if the majority of uses only need the current value.
However, if you do require historical values, this structure adds complexity and data redundancy overheads. It is generally assumed that the system will use Type 1 or Type 2 rather than Type 4.
A type 6 SCD combines type 1, 2, and 3.
When a change happens to a type 2 member,
you create a new row with
appropriate StartDate and EndDate.
In type 6 design,
you also store the current value
in all versions of that entity,
so you can easily report on
the current value or the historical value.
Using the sales region example,
you split the region column into
current region and historical region.
The current region always shows the latest value and
the historical region shows the region that
was valid between the StartDate and EndDate.
So for the same salesperson,
every record would have
the latest region populated in current region while
historical region works exactly like
the region field in the type 2 SCD example.
Comments
Post a Comment