Data Warehouse – Let’s Build it!!! – Part 1

In previous posts, we come across –

What a Data Warehouse is?
What a Data Warehouse is not?
Why it is necessary?
What needs to be considered while loading data into Data Warehouse

Now, from this post onwards we move forward in designing/building a Data Warehouse.
The initial points that we need to consider while building a Data Warehouse are –

*Keep it simple 
*Fast data access
*Inputs to Data Warehouse(from where Data Warehouse got it's data)

But, how can you achieve this? – Does the normal Relational (two-dimensional) model is suitable for this kind of environment? Answer is NO. Of course this works well with Transactional systems, where Normalized tables are used to reduce the redundancy and require lots of table joins so as to retrieve the necessary data. Analytical queries against the Normalized tables will be very slow.

So, a Normal two-dimensional (rows, columns) way of storing data doesn’t work for a Data Warehouse. This means, data should be stored in Multi-dimensional form.

Eventough Dimensional model is different from its counterpart i.e. Relational model; it too consists of tables, rows, columns etc., and nothing more than that. The only difference is, it stores the data in different kinds of tables like Fact table, Dimension table etc., and reference each other just like in Relational model. This is the Physical form(model), that users use to access data warehouse data. But for building/modelling a data warehouse we should not just limit to final organization of the data. But as mentioned earlier we also need to consider from where data warehouse gets its data from. In normal operational database, data is directly fed into database by means of User Inputs. Whereas for Data Warehouse it is not the case, data is fed from various sources like Relational databases, non-relational databases, Flat files etc., Make note that we cannot directly load from source database to data warehouse database, as their designs are fundamentally different. So, a mapping is necessary between Source databases and data warehouse database.

What you infer from the above is that – A complete Data Warehouse Process is divided into 3 stages –

In the Extraction phase, it is required to identify from where and 
from what sources Data Warehouse gets its data. Once identified the 
sources, it is necessary to analyze suitability of the data with the 
target(i.e Data Warehouse). If the Source and the target are in the 
same format(design), then  extraction will be straight-forward 
(a simple Export will do). Otherwise, additional analysis like
Data Profiling (process of checking the suitability of the data, 
whether it meets expected pattern or not.)is used before being 
loaded into the data warehouse. 

In the Transformation Phase, necessary modifications needs to be 
performed on the extracted Source Data, so as to meet the business 
and technical needs of the target database. This phase also used as 
STAGING AREA for cleansing and transform of the extracted data.

In the Loading  Phase, once the data that is extracted and 
transformed into required format, it is directly loaded into the 
Data Warehouse.

During the entire Data Warehouse process, we will be taking numerous decisions through out all the phases, that will impact the end result. For example, in the first phase, we decide on which business elements are required from the source databases and how they related with each other and the model(ER model) they used will also impact the final organization of data in data warehouse.

So Keep in mind that while designing /building a data warehouse, we should take entire Data Warehouse process into consideration, but not just limit to the Data Warehouse itself.

Thus conclude this here and move forward in the next post with the physical model of data warehouse.

Advertisements
This entry was posted in Data Warehouse. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s