In: Computer Science
Reflect on the data mining concepts, strategies, and best practices explored so far. Consider data mining from both a global perspective in the management of big data and the impact of data mining on individual organizations.
data mining concepts:-
data mining refers to extracting or \mining" know ledge from large amounts of data. The term is
actually a misnomer. Remember that the mining of gold from rocks or sand is referred to as gold mining rather than
rock or sand mining. Thus, \data mining" should have been more appropriately named \knowledge mining from
data", which is unfortunately somewhat long.
the architecture of a typical data mining system may have the following major components
1.Database, data warehouse, or other information repository. This is one or a set of databases, data warehouses, spread sheets, or other kinds of information repositories. Data cleaning and data integration
techniques may be performed on the data.
2. Database or data warehouse server. The database or data warehouse server is responsible for fetching the
relevant data, based on the user's data mining request.
3. Knowledge base. This is the domain knowledge that is used to guide the search, or evaluate the interest-
ingness of resulting patterns. Such knowledge can include concept hierarchies, used to organize attributes
or attribute values into dierent levels of abstraction. Knowledge such as user beliefs, which can be used to
assess a pattern's interestingness based on its unexpectedness, may also be included. Other examples of domain
knowledge are additional interestingness constraints or thresholds, and metadata (e.g., describing data from
multiple heterogeneous sources).
4. Data mining engine. This is essential to the data mining system and ideally consists of a set of functional
modules for tasks such as characterization, association analysis, classication, evolution and deviation analysis.
5. Pattern evaluation module. This component typically employs interestingness measures (Section 1.5) and
interacts with the data mining modules so as to focus the search towards interesting patterns. It may access
interestingness thresholds stored in the knowledge base.
Key points
Database technology has evolved from primitive le processing to the development of database management
systems with query and transaction processing. Further progress has led to the increasing demand for ecient
and eective data analysis and data understanding tools. This need is a result of the explosive growth in
data collected from applications including business and management, government administration, scientic
and engineering, and environmental control.
Data mining is the task of discovering interesting patterns from large amounts of data where the data can
be stored in databases, data warehouses, or other information repositories. It is a young interdisciplinary eld,
drawing from areas such as database systems, data warehousing, statistics, machine learning, data visualization,
information retrieval, and high performance computing. Other contributing areas include neural networks,
pattern recognition, spatial data analysis, image databases, signal processing, and inductive logic programming.
A knowledge discovery process includes data cleaning, data integration, data selection, data transformation,
data mining, pattern evaluation, and knowledge presentation.
Data patterns can be mined from many dierent kinds of databases, such as relational databases, data warehouses, and transactional, object-relational, and object-oriented databases. Interesting data patterns
can also be extracted from other kinds of information repositories, including spatial, time-related, text, multimedia, and legacy databases, and the World-Wide Web.
A data warehouse is a repository for long term storage of data from multiple sources, organized so as
to facilitate management decision making. The data are stored under a unied schema, and are typically
summarized. Data warehouse systems provide some data analysis capabilities, collectively referred to as OLAP
(On-Line Analytical Processing). OLAP operations include drill-down, roll-up, and pivot.
Data mining functionalities include the discovery of concept/class descriptions (i.e., characterization and
discrimination), association, classication, prediction, clustering, trend analysis, deviation analysis, and simi-
larity analysis. Characterization and discrimination are forms of data summarization.
A pattern represents knowledge if it is easily understood by humans, valid on test data with some degree
of certainty, potentially useful, novel, or validates a hunch about which the user was curious. Measures of
pattern interestingness, either objective or subjective, can be used to guide the discovery process.
Data mining systems can be classied according to the kinds of databases mined, the kinds of knowledge
mined, or the techniques used.