Data Warehouse – Let’s Build it!!! – Part4

Now its time to discuss about SNOWFLAKE Schema, a type of star schema. It is called so because the diagram of the schema resembles a snowflake. Just like in Star Schema, Snowflake schema is represented by centralized fact tables which are connected to multiple dimensions, such that the ER diagram resembles a snowflake in shape. The only difference is that, Snowflake schema further divides the Dimension tables into sub-dimensional tables that represents a Dimensional Hierarchy. So, Snowflake encourages normalization of Dimension tables. In simple, it can be termed as normalized STAR schema.

In a Snowflake Schema, each dimension breaks down into multiple tables in a hierarchial fashion. As a result, all the low level dimension tables are tied to next higher level dimension by means of a natural key, whereas the highest level dimension tied to the FACT table by a Surrogate/Primary KEY.

The main reason behind normalizing dimensions is to eliminate redundancy. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. So, the more you normalize the dimension table, the more complicated SQL joins must be issued to query them.

Now our concern is about which one needs to be implemented in a particular environment – We cannot just decide which is better over the other. There are lot of factors that are to be considered to know which one to be used for a particular environment. Lets see some of those factors –

1.) Oracle recommends you choose a star schema over a 
    snowflake schema unless you have a clear reason not
    to. 

2.) It also depends on what the query tools being used in
    an environment.

3.) When a Dimensional table is very large  with number of 
    attributes, where it's been difficult for star schema 
    to represent data structure, then the better choice is
    using a SNOWFLAKE.

4.) If dimension table contains millions of rows, the use of 
    snowflake schemas should significantly improve performance 
    by moving out some data to other table. 

5.) General rule of thumb is keep away from snowflake schemas 
    as even though it may save you some space, but it costs 
    a lot in terms of query times.

Finally, we need to determine one thing .. Why we are building a Data Warehouse and for whom its been build for? Yes .. you are right!!! For Analysis purpose and for the Business users. I think this will enough to answer for the query which schema to use? That is the reason why Oracle itself recommended to go with STAR, until and unless you have specific reason for SNOWFLAKE. Even performance might not be a factor to choose between STAR and SNOWFLAKE(I mean it . It should be considerable, not big difference) and other factors should be given more priority.

So we conclude this post here and will discuss another topic in the next post.

Advertisements
Posted in Data Warehouse | Leave a comment