Question

In: Computer Science

After reviewing the resources for this module, discuss the power of clustering and association models. Give...

After reviewing the resources for this module, discuss the power of clustering and association models. Give an example of a company that collects or uses data for various reasons. How can clustering or association models help the company complete the sentence "You might also be interested in

Solutions

Expert Solution

`Hey,

Note: Brother if you have any queries related the answer please do comment. I would be very happy to resolve all your queries.

The Clustering is an explorative analysis that tries to recognize structures within the data. Clustering is utilized to
recognize groups of cases if the gathering is not previously known. Clustering is often part of the sequence of
analysis of factor analysis, cluster analysis, and finally, discriminant analysis. The general categories of cluster
analysis methods are Joining (Tree Clustering), Two-way Joining (Block Clustering), Hierarchical Clustering and kmeans
Clustering. In short, whatever the way of your business is, sometime you will keep running into a clustering
problem of some structure.
Hierarchical Cluster is the most common method. It creates a series of models with cluster solutions from “1” (all
cases in one cluster) to “n” (all cases are an individual cluster). In addition, hierarchical cluster analysis can deal
with nominal, ordinal, and scale data, however, it is not recommended to blend different levels of estimation. Kmeans
cluster is a strategy to rapidly cluster huge data sets, which ordinarily take a while to compute with the
preferred hierarchical cluster analysis.
The purpose of cluster analysis is to place objects into groups, or clusters, suggested by the data, not defined a priori,
such that objects in a given cluster tend to be similar to each other in some sense, and objects in different clusters
tend to be dissimilar. You can also use cluster analysis to summarize data rather than to find "natural" or "real"
clusters; this use of clustering is sometimes called dissection. Clustering techniques have been applied to a wide
variety of research problems.
The reason for cluster analysis is to place objects into groups, or clusters, recommended by the data, not defined a
priori, such that objects in a given cluster have a tendency to be like one another in some sense, and objects in
different clusters have a tendency to be different. You can likewise utilize cluster analysis to summarize data as
rather than to find "natural" or "real" clusters; this utilization of clustering is sometimes called dissection. Clustering
techniques have been connected to a wide variety of research problems.
For example, in the field of medicine, clustering diseases, cures for diseases, or symptoms of diseases can lead to
very useful taxonomies. What are the diagnostic clusters? To answer this question the researcher would devise a
diagnostic questionnaire that entails the symptoms (for example in psychology standardized scales for anxiety,
depression etc.). The cluster analysis can then identify groups of patients that present with similar symptoms and
simultaneously maximize the difference between the groups. In Marketing – What are the customer segments? To
answer this question a market researcher conducts a survey most commonly covering needs, attitudes,
demographics, and behavior of customers. The researcher then uses the cluster analysis to identify homogenous
groups of customers that have similar needs and attitudes but are distinctively different from other customer
segments. In general, whenever we need to classify a "mountain" of information into manageable meaningful piles,
cluster analysis is of great utility.
Association rule reveal fascinating affiliations and correlation relationships among large sets of data items.
Association rules show attribute value conditions that happen regularly together in a given data set. A typical
example of association rule mining is Market Basket Analysis. In data mining, association’s rules are helpful for
analyzing and foreseeing customer behavior. Data is collected using bar-code scanners in supermarkets. Such market
basket databases consist of a large number of transaction records. Every record list all items purchased by a
customer on a single purchase transaction. The Association model is often associated with "market basket analysis",
which is utilized to find relationships or correlations in a set of items. A typical association rule of this kind affirms
the probability that, for instance, "70% of the general population who purchase spaghetti, wine, and sauce likewise
purchase garlic bread."
Clustering and Association models can help organizations to be interested in Market Segmentation which is being
one of the best uses of data mining is to segment your customers. Furthermore, it's really simple; from your
information you can separate your market into important segments such as age, income, occupation or gender.
Segmentations can also help you with understanding your competition. This insight alone will offer you some
assistance with identifying that the typical suspects are not the only ones focusing on the same client money as you
seem to be.

The purpose of cluster analysis is to place objects into groups, or clusters, suggested by the data,
not defined a priori, such that objects in a given cluster tend to be similar to each other in some
sense, and objects in different clusters tend to be dissimilar. You can also use cluster analysis to
summarize data rather than to find "natural" or "real" clusters; this use of clustering is sometimes
called dissection. The SAS/STAT procedures for clustering are oriented toward disjoint or
hierarchical clusters from coordinate data, distance data, or a correlation or covariance matrix.
The SAS/STAT cluster analysis procedures include the following:
 ACECLUS Procedure — Obtains approximate estimates of the pooled within-cluster
covariance matrix when the clusters are assumed to be multivariate normal with equal
covariance matrices
 CLUSTER Procedure — Hierarchically clusters the observations in a SAS data
 DISTANCE Procedure — Computes various measures of distance, dissimilarity, or
similarity between the observations (rows) of a SAS data set. Proximity measures are
stored as a lower triangular matrix or a square matrix in an output data set that can then
be used as input to the CLUSTER, MDS, and MODECLUS procedures.
 FASTCLUS Procedure — Disjoint cluster analysis on the basis of distances computed
from one or more quantitative variables
 MODECLUS Procedure — Clusters observations in a SAS data set
 TREE Procedure — Produces a tree diagram, also known as a dendrogram or
phenogram, from a data set created by the CLUSTER or VARCLUS procedure
 VARCLUS Procedure — Divides a set of numeric variables into disjoint or hierarchical
clusters

Statistical Significance Testing
Note that the above discussions refer to clustering algorithms and do not mention anything about
statistical significance testing. In fact, cluster analysis is not as much a typical statistical test as it
is a "collection" of different algorithms that "put objects into clusters according to well defined
similarity rules." The point here is that, unlike many other statistical procedures, cluster analysis
methods are mostly used when we do not have any a priori hypotheses, but are still in the
exploratory phase of our research. In a sense, cluster analysis finds the "most significant solution
possible." Therefore, statistical significance testing is really not appropriate here, even in cases
when p-levels are reported (as in k -means clustering).
Area of Application
Clustering techniques have been applied to a wide variety of research problems. Hartigan (1975)
provides an excellent summary of the many published studies reporting the results of cluster
analyses. For example, in the field of medicine, clustering diseases, cures for diseases, or
symptoms of diseases can lead to very useful taxonomies. In the field of psychiatry, the correct
diagnosis of clusters of symptoms such as paranoia, schizophrenia, etc. is essential for successful
therapy. In archeology, researchers have attempted to establish taxonomies of stone tools, funeral
objects, etc. by applying cluster analytic techniques. In general, whenever we need to classify a
"mountain" of information into manageable meaningful piles, cluster analysis is of great utility

What is the Cluster Analysis?
The Cluster Analysis is an explorative analysis that tries to identify structures within the data.
Cluster analysis is also called segmentation analysis or taxonomy analysis. More specifically, it
tries to identify homogenous groups of cases, i.e., observations, participants, respondents.
Cluster analysis is used to identify groups of cases if the grouping is not previously known.
Because it is explorative it does make any distinction between dependent and independent
variables.
The Cluster Analysis is often part of the sequence of analyses of factor analysis, cluster analysis,
and finally, discriminant analysis. First, a factor analysis that reduces the dimensions and
therefore the number of variables makes it easier to run the cluster analysis. Also, the factor
analysis minimizes multicollinearity effects. The next analysis is the cluster analysis, which
identifies the grouping. Lastly, a discriminant analysis checks the goodness of fit of the model
that the cluster analysis found and profiles the clusters

Medicine – What are the diagnostic clusters? To answer this question the researcher would
devise a diagnostic questionnaire that entails the symptoms (for example in psychology
standardized scales for anxiety, depression etc.). The cluster analysis can then identify groups of
patients that present with similar symptoms and simultaneously maximize the difference between
the groups.
 Marketing – What are the customer segments? To answer this question a market researcher
conducts a survey most commonly covering needs, attitudes, demographics, and behavior of
customers. The researcher then uses the cluster analysis to identify homogenous groups of
customers that have similar needs and attitudes but are distinctively different from other
customer segments.
 Education – What are student groups that need special attention? The researcher measures a
couple of psychological, aptitude, and achievement characteristics. A cluster analysis then
identifies what homogeneous groups exist among students (for example, high achievers in all
subjects, or students that excel in certain subjects but fail in others, etc.). A discriminant analysis
then profiles these performance clusters and tells us what psychological, environmental,
aptitudinal, affective, and attitudinal factors characterize these student groups.
 Biology – What is the taxonomy of species? The researcher has collected a data set of
different plants and noted different attributes of their phenotypes. A hierarchical cluster analysis
groups those observations into a series of clusters and builds a taxonomy tree of groups and
subgroups of similar plants.
K-means cluster is a method to quickly cluster large data sets, which typically take a while to
compute with the preferred hierarchical cluster analysis. The researcher must to define the
number of clusters in advance. This is useful to test different models with a different assumed
number of clusters (for example, in customer segmentation).
Hierarchical cluster is the most common method. We will discuss this method shortly. It takes
time to calculate, but it generates a series of models with cluster solutions from 1 (all cases in
one cluster) to n (all cases are an individual cluster). Hierarchical cluster also works with
variables as opposed to cases; it can cluster variables together in a manner somewhat similar to
factor analysis. In addition, hierarchical cluster analysis can handle nominal, ordinal, and scale
data, however it is not recommended to mix different levels of measurement.

Kindly revert for any queries

Thanks.


Related Solutions

After reviewing the resources provided for you in this module, research and report on the current...
After reviewing the resources provided for you in this module, research and report on the current rate of unemployment in your city and state and compare those rates to the national rate. With that information, write a two- to three-page paper and answer the following questions : What do the rates tell you? What are the different measurements of unemployment and what factors affect them? If you have ever been unemployed, or know someone who has been unemployed, which measurement...
After reading the required resources for this module and reviewing the database you just designed for...
After reading the required resources for this module and reviewing the database you just designed for your final project, think about what security considerations should be taken with the design. Write these considerations in the form of a database design security checklist that includes your top six to eight security implications for the database design. A security assessment checklist is a common practice in the industry and proves to a CIO or IT auditor that you are following best practices...
After reading the required resources for this module and reviewing the database you just designed for...
After reading the required resources for this module and reviewing the database you just designed for your final project, think about what security considerations should be taken with the design. Write these considerations in the form of a database design security checklist that includes your top six to eight security implications for the database design. A security assessment checklist is a common practice in the industry and proves to a CIO or IT auditor that you are following best practices...
Neustadt's Source of Presidential Power: After reviewing Neustadt's ideas, apply them to the role of a...
Neustadt's Source of Presidential Power: After reviewing Neustadt's ideas, apply them to the role of a CEO. What lessons can the CEO learn from reading Neustadt clearly? Illustrate specifically how a CEO could implement Neustadt's principles to help the firm fulfill its strategy.
After reviewing the concepts of utilitarianism and universalism in this unit’s reading, discuss which of these...
After reviewing the concepts of utilitarianism and universalism in this unit’s reading, discuss which of these principles you think is most often applied in business. Explain your reasoning by providing examples to support your thought process.
After reviewing the resources on Dunkin Donuts and its marketing across digital channels (social media, websites,...
After reviewing the resources on Dunkin Donuts and its marketing across digital channels (social media, websites, email, apps, etc.), choose a brand that you are loyal to and analyze its use of digital platforms, including email marketing, mobile apps, and social media. What do you find most effective in reaching you as a consumer? What is least effective? Why do you think this is the case? Has it ever felt as though a business was treating you as a consumer?...
Discuss the sources of power in bargaining. Give examples of use of power during the negotiation...
Discuss the sources of power in bargaining. Give examples of use of power during the negotiation process.
After reviewing the various theories of nursing, reflect on your personal philosophy of nursing. Discuss the...
After reviewing the various theories of nursing, reflect on your personal philosophy of nursing. Discuss the major theorists who influenced you and identify the elements you incorporate into your philosophy.     
After thoroughly reviewing the octet rule, respond to the following: Discuss the octet rule and why...
After thoroughly reviewing the octet rule, respond to the following: Discuss the octet rule and why there are exceptions to the rule. Give examples of the rule and of exceptions written in 250 words minimum
After reviewing Utilitarianism and Universalism this week, discuss which of these principles you think is most...
After reviewing Utilitarianism and Universalism this week, discuss which of these principles you think is most often applied in business. Explain your reasoning by providing examples to support your thought process.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT