In: Computer Science
E
Data Warehousing
Data warehousing can define as a particular area of comfort wherein
subject-oriented, non-volatile collection of data happens to
support the management’s process.It senses the limited data within
the multiple data resources.It has built-in data resources that
modulate upon the data transaction.
Data Warehousing Pros Cons
The data warehouse can modulate when people have a common way of explaining new things that emerg as a particular subject. Here are some of the few characteristics of data warehousing.
Subject-oriented:
It can perform in a particular subject area. It means the data
warehousing process intends to deal with a particular subject that
is more defined.
A deep understanding will help in developing sales procedures that
define within the bounds. It deals with all the subject matters
that have a warehouse
Time-variant:
It discovers different time limits that modulate within the large
amounts of data and holds in online transaction processing.
It means by time-variant when the data sent into the causes of
the support of staging files.
It normally proceeds with the majority of data that handle by large
tables containing updated facts.
Non-volatile:
It encompasses the high quantity of data that enters into change
within the selected quantity on logical business. It enumerates the
analysis in the warehouse technologies.
Non-volatility will make people understand what has occurred. It makes a clear sense of analysis that is done.
Integrated:
It is similar to the subject orientation that made in a consistent
format. It should resolve the problems and make the disparate
problem. It has a finite number of procedures for issues such as
naming conventions, conflicts, units of measure, inconsistent
values. It manages a different subject related to warehouse
information.
Functions
It works as a repository and the data here hold by an organization
that ensures the facilities to backup data functions.
It reduces the cost of the storage system and even the backup data at the organizational level.
It stores facts about the tables that have high granular transaction levels that monitor to define the data warehousing techniques. Functions involved are:
Data consolidations
Data cleaning
Data integration
Data Extraction
Data Cleaning
Data Transformation
Data Loading
Refreshing
Alternative Names for Data warehouse system:
Data warehouse system also knows by the following names,
Decision Support System (DSS)
Executive Information System
Management Information System
Business Intelligence Solution
Analytic Application
Data Warehouse
The data flown will be in the following formats
Structured
Semi-structured
Unstructured data
Types of Data Warehouse:
There are mainly 3 types of Data Warehouses, and they are
Enterprise Data Warehouse
Operational Data Store
Data Mart
Data Warehouse Stages :
The usage of data warehousing simple earlier, but as time passes by
the procedures in assessing the data changes a lot. Following are
the few stages involved in the use of data warehousing
Offline Operational Database
Offline Data Warehouse:
Real-time Data Warehouse:
Integrated Data Warehouse
Applications of Data Warehouse:
The business executives help in performing various other businesses
to organize and analyze the detailed data description. These
instances execute within the loop and monitor within a closed loop.
Data warehousing mainly follow in the following fields:
Airline
Banking services
Healthcare
Public sector
Investment and Insurance sector
Telecommunication
Hospitality Industry
Financial services
Retail sectors
Consumer goods
Controlled manufacturing
Steps to Implement Data Warehouse:
The risk connected to data warehousing implementation is huge and
needs to take into consideration at the earliest and the finest way
is to use a 3 level strategy.
Enterprise strategy
Phased delivery
Iterative Prototyping
Here are a few steps in the implementation of Datawarehousing along
with its deliverables.
Datawarehouse Implementation Table
Steps
Tasks
Output
1
Specifying project scope
Scope definition
2
Ascertain business needs
Logical data model
3
Defining Operational Datastore requirements
Operational Data Store Model
4
Develop or Obtain Extraction tools
Extract software and tools
5
Specifying Data Warehouse Data Needs
Transition Data Model
6
Document missing information
To Do Project List
7
Mapping Operational Data Store to Data Warehouse
D/W Data Integration Map
8
Improve Data Warehouse Database design
D/W Database Design
9
Pull Out Data from Operational Data Store
Integrated D/W Data Extracts
10
Load Data Warehouse
Initial Data Load
11
Manage Data Warehouse
Continuous Data Access and Subsequent Loads
Data Warehouse Tools:
Though you can find many data warehouse tools online, we have
mentioned here a few best ones
Oracle
MarkLogic
Amazon Redshift
Pros or Advantages of Data Warehousing:
It is a common process for the new implementations in a business
that is based on variou
a. Cleans data:
It mainly follows in data cleansing of removing errors that are
inconsistent to improve the data and its respective quality. It
emerges as a database containing many files. It has a variety of
resources that made by using creativeness. It undergoes a process
that enables one to deal with data cleaning substances.
Metadata reflects in sufficient quantity that especially means for all the constraints and even the system translation.
b. Indexes multiple types:
Indexing has created multiple database tables and created to speed
up the accessing of information.
It can handle a large quantity of data and iterative queries before building the aligned form of data using OLTP applications. It has a huge number of existence within the modulated database system management queries.
c. Secured data and its access:
Security is the best way to mitigate the self publish breaches on
rapid warehousing and that has to apply for all aspects as
tradeoffs into potential warehousing behavior.
It has consolidated layered form of data with the objectives enabled and database enforced as to improve its values and gains.
It has critical compromising of sequential data within the unauthorized access.
d. Query processing with multiple options:
Query processes caries out in a parallel manner that helps in
defining the unthinkable state of technology. These query tools
design to process and load the data into various modules.
It accesses using simple logics along with a parallel repository of data. It enhances the defined field of routes and queries. It has a large number of query tools that manage heterogeneous resources. It handles requests from the tools online.
e. Enhanced business intelligence:
These insights develop within the information access and free from
decision making. It limits the gut feelings and also defines each
strategic credible fact of the evidence and backup.
It has personal needs that are varied within the better involved decision makings that are more competent with that of the limited data. It has warehouse related business tactics that measure within the informed facts. Financial management plays a vital role within the inventory management.
f. Increased system and query performance:
It mainly constructs to enhance and find the retrieval of data. It
has the speed of performing different warehouses and the
corresponding storage on large volumes. It has credible facts that
involve storing large values.
It enables within the sequential information mediated within the business intelligence and has defined the modules that are matched with personal needs.
It constructs the operations of multiple subsystems. It concludes business intelligence and to alleviate the business repository. It gathers efforts for extracting the information.
g. Business Intelligence:
Many enterprises from a detailed log of multiple subsystems. It has
different platforms that physically build within the data sources
and access to a single phase of data.
It defines platforms that made different multiple sources and imagined to have a consolidated enterprise.
It enables a single data repository on a detailed subject to ensure that there is no duplication of data.
h. Timely access to data:
It helps the users to access different resources to analyze the
data for the retrieval process. It spends time on schedule
information on data that sequenced into routines. It has multiple
resources that hold time for information technology.
It sustains for the queries and the consuming of data on query language. It has lesser information about the ability to generate standard reports that define with a special performance. It also has professional queries that diminish against warehouse reports.
i. Enhanced data consistency and quality:
It manages and sequences the illuminated data with the
standardization of unique system resources. It has individual sales
and utilization of a repository of data.
It has different and consistent units of substantially increased business. It accounts for the repository of operations and manages unique resources.
j. Return on investment is high:
Here the ROI made as a revenue part and with decreased expenses. It
is a business that enables realize the project capital within the
generates revenues and the cost savings.
The study of the business and substantial impact upon the analytics of the financial status can divide into various business studies.
k. Increase revenues:
It manages similar investigation systems that joined up for
approach that might link to the stability of work and modulate
within the deploy data on the database. It exists among the
isolated warehouse departing from the cross checks and manage with
the central point of each database.
It also follows a proactive approach within the link database to detect and prevent the summarized reports. It proactively minimizes the corporate investigators that match with increased streams.
l. Standardizes data across the organization:
Data standards are followed on different secured sharing of data.
It has a particular standard within the modulated and visualized
knowledge about connectivity. It contributes to numerous
applications and is organized within the delivered data management
systems. The conflict between data sharing avoided. It has critical
applications that sequenced.
m. Database normalization:
The data can be stored and extracted in various forms that are
stored in warehouse reports. It is a process of organizing the data
in the relational database to minimize redundancy and that is more
helpful in organizing the data. It emerges as a sequential flow of
all the required data that are minimized.
Cons or Disadvantages of Data Warehousing:
Even though there are a lot of advantages, people involve in
implementing time and cost with high sequences that involve data
translation, long time implementation of processes, lack of
flexibility in the data transfer. Here are some of the
disadvantages of data warehousing explained:
a. Raising ownership:
The majority of the data that are passed are held from the data
resources and are represented within multiple efforts of a data
warehouse. It intimates long term implementation of the schema and
its resources.
It has its issues with raising ownership, privacy and secured results. It is associated with long term owners and with high costs.
b. Extra reporting:
The data warehouse will be run depending on the risks of the
organization. It has typically generated teams that help in
business negotiations. It manages to duplicate the data exist
within the sequencing of the long term database. It consumes more
time when the extra reporting is done.
c. Data flexibility:
It is arranged when the data that is imported has many static
complaints and abilities that are mapped with the same schema and
enumerated filtered displays. It is often recognized leaks between
customers of an organization.
It generates analysis reports within the related privacy of the customer and is defined with minimal ability. It has limited value and constant transition that are mapped within the sequential processing of data.
d. Compatibility with the existing system:
The data warehouse system can be managed within the regular extract
of the data that are loaded into the system. The usage of
technology requires modification of data that has foremost
concerns. All the existing system functionalities that are engaged
are considered to be complex.
e. Keeping data online:
Softwares do not allow keeping the entire repository online after a
certain duration. It maintains the data online and is enlarged by
its textual means and large data online. It records and analyses
the data for future reference.
f. Dimensional technique:
This technique contains all the information with specific events.
It has a limit amount of information that identified with the
proper understanding of all the events. It uses for many of the
practical applications that are redundant.
The process of updating, deletion, and insertion process here. It accounts for the detailed description of the undesirable characteristics of data warehousing.
g. Costs:
Nowadays the maximum of the business started using techniques of
the data warehouse. So the price range has fallen under the price
range that most of the products towards design.
It complicated because even the small business details form when the situations are capable of designing the data provided. It manages the price range between the people in the company.
Thus, most of the tools that users begin with the transactions which in case accounts to the techniques of data warehousing. It groups all the transactions and signifies each operation that reports in detail.
It can access a large amount of information and will enable a
neutral network that is replaced with the warehouse. Users supposed
to train before using warehouse techniques
DIMENSIONAL MODELING (DM) is a data structure technique optimized
for data storage in a Data warehouse. The purpose of dimensional
model is to optimize the database for fast retrieval of data. The
concept of Dimensional Modelling was developed by Ralph Kimball and
consists of "fact" and "dimension" tables.
A Dimensional model is designed to read, summarize, analyze numeric information like values, balances, counts, weights, etc. in a data warehouse. In contrast, relation models are optimized for addition, updating and deletion of data in a real-time Online Transaction System.
These dimensional and relational models have their unique way of data storage that has specific advantages.
For instance, in the relational mode, normalization and ER models reduce redundancy in data. On the contrary, dimensional model arranges data in such a way that it is easier to retrieve information and generate reports.
Hence, Dimensional models are used in data warehouse systems and not a good fit for relational systems.
In this tutorial, you will learn-
Elements of Dimensional Data Model
Fact
Dimension
Attributes
Fact Table
Dimension table
Steps of Dimensional Modelling
Step 1) Identify the business process
Step 2) Identify the grain
Step 3) Identify the dimensions
Step 4) Identify the Fact
Step 5) Build Schema
Rules for Dimensional Modelling
Benefits of dimensional modeling
Elements of Dimensional Data Model
Fact
Facts are the measurements/metrics or facts from your business
process. For a Sales business process, a measurement would be
quarterly sales number
Dimension
Dimension provides the context surrounding a business process
event. In simple terms, they give who, what, where of a fact. In
the Sales business process, for the fact quarterly sales number,
dimensions would be
Who – Customer Names
Where – Location
What – Product Name
In other words, a dimension is a window to view information in the
facts.
Attributes
The Attributes are the various characteristics of the
dimension.
In the Location dimension, the attributes can be
State
Country
Zipcode etc.
Attributes are used to search, filter, or classify facts. Dimension
Tables contain Attributes
Fact Table
A fact table is a primary table in a dimensional model.
A Fact Table contains
Measurements/facts
Foreign key to dimension table
Dimension table
A dimension table contains dimensions of a fact.
They are joined to fact table via a foreign key.
Dimension tables are de-normalized tables.
The Dimension Attributes are the various columns in a dimension
table
Dimensions offers descriptive characteristics of the facts with the
help of their attributes
No set limit set for given for number of dimensions
The dimension can also contain one or more hierarchical
relationships
Steps of Dimensional Modelling
The accuracy in creating your Dimensional modeling determines the
success of your data warehouse implementation. Here are the steps
to create Dimension Model
Identify Business Process
Identify Grain (level of detail)
Identify Dimensions
Identify Facts
Build Star
The model should describe the Why, How much, When/Where/Who and
What of your business process
Step 1) Identify the business process
Identifying the actual business process a datarehouse should cover.
This could be Marketing, Sales, HR, etc. as per the data analysis
needs of the organization. The selection of the Business process
also depends on the quality of data available for that process. It
is the most important step of the Data Modelling process, and a
failure here would have cascading and irreparable defects.
To describe the business process, you can use plain text or use basic Business Process Modelling Notation (BPMN) or Unified Modelling Language (UML).
Step 2) Identify the grain
The Grain describes the level of detail for the business
problem/solution. It is the process of identifying the lowest level
of information for any table in your data warehouse. If a table
contains sales data for every day, then it should be daily
granularity. If a table contains total sales data for each month,
then it has monthly granularity.