This post is the new start for the long Journey of Data Warehouse posts –
Let’s start with what Data Warehouse is?
A Data warehouse is a repository that contains huge amounts of data than the actual related Operational database(s), reaching terabytes in size depending on how much history needs to be saved. They are not synchronized in real time to the associated operational data but are updated as often as once a day.
Now the question arises here –
What is the necessity of another database (DWH), if already there exists one, which is Operational Database?
The below reasons describes why there is a necessity for DWH –
• A Normal operational database does not hold historical data => what this says is that, Operational database contains only current data and so that data is not sufficient for making any crucial business decisions.
• Separation of Normal customers to the business and the Business Users => what this says is that, a Normal customer uses the Operational database for his/her transactions and furthermore he/she will retrieve/modify only his/her data. Whereas Business users requires all the data that is available for making strategic decisions. So an Operational database cannot provide sufficient data for the Business Users.
• Requires Business as a whole => what this says is that, in order to make strategic decisions about the business, business users must have information regarding the business in its entirety. They cannot make decisions based on the information available from a single source. This means, each department in a business will maintain their records in their own way. For example, Sales department might use Oracle database for their records and accounts department might be using Informix. So, it becomes a time consuming process for the Business users to gather the information from various sources. Furthermore, Business users might (necessarily) not be a tech-savvy to obtain information from all those sources. So there is a necessity for centralized repository where it contains data from all the sources, and furthermore Business users might be able to query the repository easily for accessing the required data.
So thus explains the necessity of having a centralized repository for storing entire enterprise data, that can used for making strategic decisions by the Business Users.
Here ends our Introduction on Data Warehouse – in the next posts lets see how Data Warehouse is designed/build, how the data is loaded into the DWH from various sources and what the various tools available to perform those tasks.