In: Computer Science
The data warehouse is one of the most important business intelligence tools a business needs to have. It turns the massive amount of data generated from multiple sources into a format that is easy to understand.
Discuss the data warehouse concept?
Data warehousing is defined as a
technique for collecting and managing data from various sources to
provide meaningful business related inferences. Data
warehouse generalizes and consolidates data in multidimensional
space.
Data warehouse system is also known as Decision Support System ,
Executive Information System ,Management Information System,
Business Intelligence Solution and Analytic Application.
The construction of data warehouses involves data cleaning, data
integration, and data transformation, and can be viewed as an
important preprocessing step for data mining.
Data warehouse systems allow for integration of a variety of
application systems. They support information processing by
providing a solid platform
of consolidated historic data for analysis.
A data warehouse stores the information an enterprise needs to make
strategic decisions.
Data Warehouse Models:
1. Enterprise Warehouse: An enterprise warehouse collects all of
the information about
subjects spanning the entire organization or enterprise.
2. Data mart: A data mart contains a subset of enterprise data that
is of importance to a
particular group of users. The scope is confined to specific
selected subjects.
3. Virtual warehouse: A virtual warehouse is a set of views as per
user need over operational databases.
Data warehousing process involves:
1. Data extraction: collecting and gathering raw data from varoius
sources internal or external
2. Data cleaning: cleaning the raw data of errors, outliers,
missing values,etc.
3. Data transformation: converting the data from one form to
another as per required for business.
4. Data loading: sorting and summarizing data
5. Refreshing the data, updating it at frequent time
intervals.
Metadata(data about data) are the data that define
warehouse objects. A metadata repository should contain the
following:
1.Description of the data warehouse structure.
2. Operational metadata, containing history of the data extraction
and operations performed on it.
3. Algorithms used for summarization of the data.
4. Mapping from the operational environment to the data
warehouse.
5. Business metadata relating information of data used for business
purpose.
Components of Data
warehousing:
1. Load Manager : also called as the front component. Deals with
extraction and loading of data into warehouse.
2. Warehouse Manager : manages the data like updation,
transformation, merging, backing up data etc.
3. Query Manager : A backend component manages queries regarding
the data.
4. End user access tools: Tools related to data reporting, query
tools, application development, EIS and OLAP and data mining tools.
are maintained here.
Data warehousing is used in banking, airline, healthcare, public
sector, investment and insurance sector, telecommunication and
hospitality industry, etc.