Question

In: Computer Science

Reflect on the data mining concepts, strategies, and best practices explored so far. Consider data mining...

Reflect on the data mining concepts, strategies, and best practices explored so far. Consider data mining from both a global perspective in the management of big data and the impact of data mining on individual organizations.

Solutions

Expert Solution

data mining concepts:-

data mining refers to extracting or \mining" know ledge from large amounts of data. The term is

actually a misnomer. Remember that the mining of gold from rocks or sand is referred to as gold mining rather than

rock or sand mining. Thus, \data mining" should have been more appropriately named \knowledge mining from

data", which is unfortunately somewhat long.

the architecture of a typical data mining system may have the following major components

1.Database, data warehouse, or other information repository. This is one or a set of databases, data warehouses, spread sheets, or other kinds of information repositories. Data cleaning and data integration

techniques may be performed on the data.

2. Database or data warehouse server. The database or data warehouse server is responsible for fetching the

relevant data, based on the user's data mining request.

3. Knowledge base. This is the domain knowledge that is used to guide the search, or evaluate the interest-

ingness of resulting patterns. Such knowledge can include concept hierarchies, used to organize attributes

or attribute values into dierent levels of abstraction. Knowledge such as user beliefs, which can be used to

assess a pattern's interestingness based on its unexpectedness, may also be included. Other examples of domain

knowledge are additional interestingness constraints or thresholds, and metadata (e.g., describing data from

multiple heterogeneous sources).

4. Data mining engine. This is essential to the data mining system and ideally consists of a set of functional

modules for tasks such as characterization, association analysis, classication, evolution and deviation analysis.

5. Pattern evaluation module. This component typically employs interestingness measures (Section 1.5) and

interacts with the data mining modules so as to focus the search towards interesting patterns. It may access

interestingness thresholds stored in the knowledge base.  

Key points

Database technology has evolved from primitive le processing to the development of database management

systems with query and transaction processing. Further progress has led to the increasing demand for ecient

and eective data analysis and data understanding tools. This need is a result of the explosive growth in

data collected from applications including business and management, government administration, scientic

and engineering, and environmental control.

Data mining is the task of discovering interesting patterns from large amounts of data where the data can

be stored in databases, data warehouses, or other information repositories. It is a young interdisciplinary eld,

drawing from areas such as database systems, data warehousing, statistics, machine learning, data visualization,

information retrieval, and high performance computing. Other contributing areas include neural networks,

pattern recognition, spatial data analysis, image databases, signal processing, and inductive logic programming.

A knowledge discovery process includes data cleaning, data integration, data selection, data transformation,

data mining, pattern evaluation, and knowledge presentation.

Data patterns can be mined from many dierent kinds of databases, such as relational databases, data warehouses, and transactional, object-relational, and object-oriented databases. Interesting data patterns

can also be extracted from other kinds of information repositories, including spatial, time-related, text, multimedia, and legacy databases, and the World-Wide Web.

A data warehouse is a repository for long term storage of data from multiple sources, organized so as

to facilitate management decision making. The data are stored under a unied schema, and are typically

summarized. Data warehouse systems provide some data analysis capabilities, collectively referred to as OLAP

(On-Line Analytical Processing). OLAP operations include drill-down, roll-up, and pivot.

Data mining functionalities include the discovery of concept/class descriptions (i.e., characterization and

discrimination), association, classication, prediction, clustering, trend analysis, deviation analysis, and simi-

larity analysis. Characterization and discrimination are forms of data summarization.

A pattern represents knowledge if it is easily understood by humans, valid on test data with some degree

of certainty, potentially useful, novel, or validates a hunch about which the user was curious. Measures of

pattern interestingness, either objective or subjective, can be used to guide the discovery process.

Data mining systems can be classied according to the kinds of databases mined, the kinds of knowledge

mined, or the techniques used.


Related Solutions

Reflect on and relate to what you have learned so far in the course with respect...
Reflect on and relate to what you have learned so far in the course with respect to your ability to calculate gross domestic product and its impact on the economy’s business cycle, unemployment, and inflation; using gross domestic product formulate recommendations for a government’s role in achieving full employment in an economy.
So far in this course, we’ve discussed the concepts of risk, security controls, and the value...
So far in this course, we’ve discussed the concepts of risk, security controls, and the value of addressing security early and throughout the development lifecycle of systems. We’ve also discussed different threats to those systems, which can result (and have resulted!) in breaches of our data. Though cybercrime laws and regulations are trying to catch up with the changing technology landscape, there are increasing concerns over our ability to retain some degree of personal privacy. For this question, and using...
Reflect on the concepts of informatics and knowledge work as presented in the Resources. Consider a...
Reflect on the concepts of informatics and knowledge work as presented in the Resources. Consider a hypothetical scenario based on your own healthcare practice or organization that would require or benefit from the access/collection and application of data. Your scenario may involve a patient, staff, or management problem or gap.
SUBJECT: Preventing and Detecting Fraudulent resumes. Reflect on best practices to prevent and detect fraudulent resumes....
SUBJECT: Preventing and Detecting Fraudulent resumes. Reflect on best practices to prevent and detect fraudulent resumes. Include thoughts on subject.
C. Anti-reflection coatings So far you have explored constructive interference from multi-layer thin films. It is...
C. Anti-reflection coatings So far you have explored constructive interference from multi-layer thin films. It is also possible for interference to be destructive, a phenomenon exploited in making antireflection coatings for optical elements such as eyeglasses. In order to allow the lenses to be thinner (and thus lighter weight), eyeglass lenses can be made of a plastic that has a high index of refraction (n p = 1.70). The high index causes the plastic to reflect light more effectively than...
From the article, "A whole community approach to emergency management: Strategies and best practices of 7...
From the article, "A whole community approach to emergency management: Strategies and best practices of 7 community programs," What does it mean "disasters happen locally"? Explain with an example.
please write a brief essay describing the important concepts you have learned so far on FIN555....
please write a brief essay describing the important concepts you have learned so far on FIN555. Be as specific as possible and provide example of things you consider important. 1) present value concept 2) alter ative to npv; stock valuation 3)financial statements; free cash flow 4)capital budgeting; bond valuation
Identify the prominent features of the company's current business and functional strategies, and consider the Best...
Identify the prominent features of the company's current business and functional strategies, and consider the Best Buy family of complementary products and services. How has the company's strategy fostered the development of core competencies?
I only need the best and effective advice on choosing the right data mining method for...
I only need the best and effective advice on choosing the right data mining method for this problem. I am not sure which data model is best.   Choose a data mining method to create the automatization. Use your model to make predictions on the testing dataset. Evaluate the precision of your model using Root Mean Square Error (rmse) and Correlation (cor) as metrics. You are hired as a data analyst for Diamonds Inc., a company who appraises diamonds. The company...
Some data mining algorithms work so "well" that they have a tendency to overfit the training...
Some data mining algorithms work so "well" that they have a tendency to overfit the training data. What does the term overfit mean, and what difficulties does overlooking it cause for the data scientist?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT