Monday, October 10, 2011

Entity Relational Modeling vs. Dimensional Modeling

Surrogate Keys
  • A substitution for the natural primary key.
  • It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.
  • Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Oracle sequence, or SQL Server Identity values for the surrogate key.
  • It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.
  • In a data warehouse, a surrogate key is a necessary generalization of the natural production key and is one of the basic elements of data warehouse design.
  • Every join between dimension tables and fact tables in a data warehouse environment should be based on surrogate keys, not natural keys.
  • It is up to the data extract logic to systematically look up and replace every incoming natural key with a data warehouse surrogate key each time either a dimension record or a fact record is brought into the data warehouse environment.
  • One of the most important use of surrogate Key the need to encode uncertain knowledge (When you have a I don't know" situation, you may want more than just this one special key for the anonymous customer )


ER to Dimensional modeling

Single Entity Relationship – breaks down into multiple Facts
  1. Convert ER to set of dimensional modeling
  2. Select many-to-many facts
  3. De-normalize remaining tables

No comments:

Post a Comment