In: Computer Science
Give an example of a company that collects or uses data for various reasons. How can clustering or association models help the company?
`Hey,
Note: Brother if you have any queries related the answer please do comment. I would be very happy to resolve all your queries.
Below is the answer which I made to the same question during my colege time.
The Clustering is an explorative analysis that tries to
recognize structures within the data. Clustering is utilized
to
recognize groups of cases if the gathering is not previously known.
Clustering is often part of the sequence of
analysis of factor analysis, cluster analysis, and finally,
discriminant analysis. The general categories of cluster
analysis methods are Joining (Tree Clustering), Two-way Joining
(Block Clustering), Hierarchical Clustering and kmeans
Clustering. In short, whatever the way of your business is,
sometime you will keep running into a clustering
problem of some structure.
Hierarchical Cluster is the most common method. It creates a series
of models with cluster solutions from “1” (all
cases in one cluster) to “n” (all cases are an individual cluster).
In addition, hierarchical cluster analysis can deal
with nominal, ordinal, and scale data, however, it is not
recommended to blend different levels of estimation. Kmeans
cluster is a strategy to rapidly cluster huge data sets, which
ordinarily take a while to compute with the
preferred hierarchical cluster analysis.
The purpose of cluster analysis is to place objects into groups, or
clusters, suggested by the data, not defined a priori,
such that objects in a given cluster tend to be similar to each
other in some sense, and objects in different clusters
tend to be dissimilar. You can also use cluster analysis to
summarize data rather than to find "natural" or "real"
clusters; this use of clustering is sometimes called dissection.
Clustering techniques have been applied to a wide
variety of research problems.
The reason for cluster analysis is to place objects into groups, or
clusters, recommended by the data, not defined a
priori, such that objects in a given cluster have a tendency to be
like one another in some sense, and objects in
different clusters have a tendency to be different. You can
likewise utilize cluster analysis to summarize data as
rather than to find "natural" or "real" clusters; this utilization
of clustering is sometimes called dissection. Clustering
techniques have been connected to a wide variety of research
problems.
For example, in the field of medicine, clustering diseases, cures
for diseases, or symptoms of diseases can lead to
very useful taxonomies. What are the diagnostic clusters? To answer
this question the researcher would devise a
diagnostic questionnaire that entails the symptoms (for example in
psychology standardized scales for anxiety,
depression etc.). The cluster analysis can then identify groups of
patients that present with similar symptoms and
simultaneously maximize the difference between the groups. In
Marketing – What are the customer segments? To
answer this question a market researcher conducts a survey most
commonly covering needs, attitudes,
demographics, and behavior of customers. The researcher then uses
the cluster analysis to identify homogenous
groups of customers that have similar needs and attitudes but are
distinctively different from other customer
segments. In general, whenever we need to classify a "mountain" of
information into manageable meaningful piles,
cluster analysis is of great utility.
Association rule reveal fascinating affiliations and correlation
relationships among large sets of data items.
Association rules show attribute value conditions that happen
regularly together in a given data set. A typical
example of association rule mining is Market Basket Analysis. In
data mining, association’s rules are helpful for
analyzing and foreseeing customer behavior. Data is collected using
bar-code scanners in supermarkets. Such market
basket databases consist of a large number of transaction records.
Every record list all items purchased by a
customer on a single purchase transaction. The Association model is
often associated with "market basket analysis",
which is utilized to find relationships or correlations in a set of
items. A typical association rule of this kind affirms
the probability that, for instance, "70% of the general population
who purchase spaghetti, wine, and sauce likewise
purchase garlic bread."
Clustering and Association models can help organizations to be
interested in Market Segmentation which is being
one of the best uses of data mining is to segment your customers.
Furthermore, it's really simple; from your
information you can separate your market into important segments
such as age, income, occupation or gender.
Segmentations can also help you with understanding your
competition. This insight alone will offer you some
assistance with identifying that the typical suspects are not the
only ones focusing on the same client money as you
seem to be.
https://blog.kissmetrics.com/data-mining/
Clustering and Association models can help companies to be
interested in Market Segmentation
which is one of the best uses of data mining is to segment your
customers. And it’s pretty simple.
From your data you can break down your market into meaningful
segments like age, income,
occupation or gender. Segmentation can also help you understand
your competition. This insight
alone will help you identify that the usual suspects are not the
only ones targeting the same
customer money as you are.
http://www.solver.com/xlminer/help/association-rules
Association rules are if/then statements that help uncover
relationships between seemingly
unrelated data in a relational database or other information
repository. An example of an
association rule would be "If a customer buys a dozen eggs, he is
80% likely to also purchase
milk." In data mining, association rules are useful for analyzing
and predicting customer
behavior. They play an important part in shopping basket data
analysis, product clustering,
catalog design and store layout.
Association rule mining finds interesting associations and
correlation relationships among large
sets of data items. Association rules show attribute value
conditions that occur frequently
together in a given data set. A typical example of association rule
mining is Market Basket
Analysis.
Data is collected using bar-code scanners in supermarkets. Such
market basket databases consist
of a large number of transaction records. Each record lists all
items bought by a customer on a
single purchase transaction. Managers would be interested to know
if certain groups of items are
consistently purchased together. They could use this data for
adjusting store layouts (placing
items optimally with respect to each other), for cross-selling, for
promotions, for catalog design,
and to identify customer segments based on buying patterns.
https://docs.oracle.com/cd/B14117_01/datamine.101/b10698/4descrip.htm
The Association model is often associated with "market basket
analysis", which is used to
discover relationships or correlations in a set of items. It is
widely used in data analysis for direct
marketing, catalog design, and other business decision-making
processes. A typical association
rule of this kind asserts the likelihood that, for example, "70% of
the people who buy spaghetti,
wine, and sauce also buy garlic bread."
Association models capture the co-occurrence of items or events in
large volumes of customer
transaction data. Because of progress in bar-code technology, it is
now possible for retail
organizations to collect and store massive amounts of sales data,
referred to as "basket data."
Association models were initially defined on basket data, even
though they are applicable in
several other applications. Finding all such rules is valuable for
cross-marketing and mail-order
promotions, but there are other applications as well: catalog
design, add-on sales, store layout,
customer segmentation, web page personalization, and target
marketing.
http://support.sas.com/rnd/app/stat/procedures/ClusterAnalysis.html
The purpose of cluster analysis is to place objects into groups, or
clusters, suggested by the data,
not defined a priori, such that objects in a given cluster tend to
be similar to each other in some
sense, and objects in different clusters tend to be dissimilar. You
can also use cluster analysis to
summarize data rather than to find "natural" or "real" clusters;
this use of clustering is sometimes
called dissection. The SAS/STAT procedures for clustering are
oriented toward disjoint or
hierarchical clusters from coordinate data, distance data, or a
correlation or covariance matrix.
The SAS/STAT cluster analysis procedures include the
following:
ACECLUS Procedure — Obtains approximate estimates of the pooled
within-cluster
covariance matrix when the clusters are assumed to be multivariate
normal with equal
covariance matrices
CLUSTER Procedure — Hierarchically clusters the observations in a
SAS data
DISTANCE Procedure — Computes various measures of distance,
dissimilarity, or
similarity between the observations (rows) of a SAS data set.
Proximity measures are
stored as a lower triangular matrix or a square matrix in an output
data set that can then
be used as input to the CLUSTER, MDS, and MODECLUS
procedures.
FASTCLUS Procedure — Disjoint cluster analysis on the basis of
distances computed
from one or more quantitative variables
MODECLUS Procedure — Clusters observations in a SAS data
set
TREE Procedure — Produces a tree diagram, also known as a
dendrogram or
phenogram, from a data set created by the CLUSTER or VARCLUS
procedure
VARCLUS Procedure — Divides a set of numeric variables into
disjoint or hierarchical
clusters
------------
http://documents.software.dell.com/Statistics/Textbook/Cluster-Analysis
The term cluster analysis (first used by Tryon, 1939) encompasses a
number of different
algorithms and methods for grouping objects of similar kind into
respective categories. A general
question facing researchers in many areas of inquiry is how to
organize observed data into
meaningful structures, that is, to develop taxonomies. In other
words cluster analysis is an
exploratory data analysis tool which aims at sorting different
objects into groups in a way that
the degree of association between two objects is maximal if they
belong to the same group and
minimal otherwise. Given the above, cluster analysis can be used to
discover structures in data
without providing an explanation/interpretation. In other words,
cluster analysis simply discovers
structures in data without explaining why they exist.
We deal with clustering in almost every aspect of daily life. For
example, a group of diners
sharing the same table in a restaurant may be regarded as a cluster
of people. In food stores items
of similar nature, such as different types of meat or vegetables
are displayed in the same or
nearby locations. There is a countless number of examples in which
clustering plays an important
role. For instance, biologists have to organize the different
species of animals before a
meaningful description of the differences between animals is
possible. According to the modern
system employed in biology, man belongs to the primates, the
mammals, the amniotes, the
vertebrates, and the animals. Note how in this classification, the
higher the level of aggregation
the less similar are the members in the respective class. Man has
more in common with all other
primates (e.g., apes) than it does with the more "distant" members
of the mammals (e.g., dogs),
etc. For a review of the general categories of cluster analysis
methods, see Joining (Tree
Clustering), Two-way Joining (Block Clustering), and k-Means
Clustering. In short, whatever
the nature of your business is, sooner or later you will run into a
clustering problem of one form
or another.
Statistical Significance Testing
Note that the above discussions refer to clustering algorithms and
do not mention anything about
statistical significance testing. In fact, cluster analysis is not
as much a typical statistical test as it
is a "collection" of different algorithms that "put objects into
clusters according to well defined
similarity rules." The point here is that, unlike many other
statistical procedures, cluster analysis
methods are mostly used when we do not have any a priori
hypotheses, but are still in the
exploratory phase of our research. In a sense, cluster analysis
finds the "most significant solution
possible." Therefore, statistical significance testing is really
not appropriate here, even in cases
when p-levels are reported (as in k -means clustering).
Area of Application
Clustering techniques have been applied to a wide variety of
research problems. Hartigan (1975)
provides an excellent summary of the many published studies
reporting the results of cluster
analyses. For example, in the field of medicine, clustering
diseases, cures for diseases, or
symptoms of diseases can lead to very useful taxonomies. In the
field of psychiatry, the correct
diagnosis of clusters of symptoms such as paranoia, schizophrenia,
etc. is essential for successful
therapy. In archeology, researchers have attempted to establish
taxonomies of stone tools, funeral
objects, etc. by applying cluster analytic techniques. In general,
whenever we need to classify a
"mountain" of information into manageable meaningful piles, cluster
analysis is of great utility.
-----
http://www.statisticssolutions.com/cluster-analysis-2/
What is the Cluster Analysis?
The Cluster Analysis is an explorative analysis that tries to
identify structures within the data.
Cluster analysis is also called segmentation analysis or taxonomy
analysis. More specifically, it
tries to identify homogenous groups of cases, i.e., observations,
participants, respondents.
Cluster analysis is used to identify groups of cases if the
grouping is not previously known.
Because it is explorative it does make any distinction between
dependent and independent
variables.
The Cluster Analysis is often part of the sequence of analyses of
factor analysis, cluster analysis,
and finally, discriminant analysis. First, a factor analysis that
reduces the dimensions and
therefore the number of variables makes it easier to run the
cluster analysis. Also, the factor
analysis minimizes multicollinearity effects. The next analysis is
the cluster analysis, which
identifies the grouping. Lastly, a discriminant analysis checks the
goodness of fit of the model
that the cluster analysis found and profiles the clusters
Medicine – What are the diagnostic clusters? To answer this
question the researcher would
devise a diagnostic questionnaire that entails the symptoms (for
example in psychology
standardized scales for anxiety, depression etc.). The cluster
analysis can then identify groups of
patients that present with similar symptoms and simultaneously
maximize the difference between
the groups.
Marketing – What are the customer segments? To answer this
question a market researcher
conducts a survey most commonly covering needs, attitudes,
demographics, and behavior of
customers. The researcher then uses the cluster analysis to
identify homogenous groups of
customers that have similar needs and attitudes but are
distinctively different from other
customer segments.
Education – What are student groups that need special attention?
The researcher measures a
couple of psychological, aptitude, and achievement characteristics.
A cluster analysis then
identifies what homogeneous groups exist among students (for
example, high achievers in all
subjects, or students that excel in certain subjects but fail in
others, etc.). A discriminant analysis
then profiles these performance clusters and tells us what
psychological, environmental,
aptitudinal, affective, and attitudinal factors characterize these
student groups.
Biology – What is the taxonomy of species? The researcher has
collected a data set of
different plants and noted different attributes of their
phenotypes. A hierarchical cluster analysis
groups those observations into a series of clusters and builds a
taxonomy tree of groups and
subgroups of similar plants.
K-means cluster is a method to quickly cluster large data sets,
which typically take a while to
compute with the preferred hierarchical cluster analysis. The
researcher must to define the
number of clusters in advance. This is useful to test different
models with a different assumed
number of clusters (for example, in customer segmentation).
Hierarchical cluster is the most common method. We will discuss
this method shortly. It takes
time to calculate, but it generates a series of models with cluster
solutions from 1 (all cases in
one cluster) to n (all cases are an individual cluster).
Hierarchical cluster also works with
variables as opposed to cases; it can cluster variables together in
a manner somewhat similar to
factor analysis. In addition, hierarchical cluster analysis can
handle nominal, ordinal, and scale
data, however it is not recommended to mix different levels of
measurement.
Kindly revert for any queries
Thanks.