Question

In: Computer Science

Give an example of a company that collects or uses data for various reasons. How can...

Give an example of a company that collects or uses data for various reasons. How can clustering or association models help the company?

Solutions

Expert Solution

`Hey,

Note: Brother if you have any queries related the answer please do comment. I would be very happy to resolve all your queries.

Below is the answer which I made to the same question during my colege time.

The Clustering is an explorative analysis that tries to recognize structures within the data. Clustering is utilized to
recognize groups of cases if the gathering is not previously known. Clustering is often part of the sequence of
analysis of factor analysis, cluster analysis, and finally, discriminant analysis. The general categories of cluster
analysis methods are Joining (Tree Clustering), Two-way Joining (Block Clustering), Hierarchical Clustering and kmeans
Clustering. In short, whatever the way of your business is, sometime you will keep running into a clustering
problem of some structure.
Hierarchical Cluster is the most common method. It creates a series of models with cluster solutions from “1” (all
cases in one cluster) to “n” (all cases are an individual cluster). In addition, hierarchical cluster analysis can deal
with nominal, ordinal, and scale data, however, it is not recommended to blend different levels of estimation. Kmeans
cluster is a strategy to rapidly cluster huge data sets, which ordinarily take a while to compute with the
preferred hierarchical cluster analysis.
The purpose of cluster analysis is to place objects into groups, or clusters, suggested by the data, not defined a priori,
such that objects in a given cluster tend to be similar to each other in some sense, and objects in different clusters
tend to be dissimilar. You can also use cluster analysis to summarize data rather than to find "natural" or "real"
clusters; this use of clustering is sometimes called dissection. Clustering techniques have been applied to a wide
variety of research problems.
The reason for cluster analysis is to place objects into groups, or clusters, recommended by the data, not defined a
priori, such that objects in a given cluster have a tendency to be like one another in some sense, and objects in
different clusters have a tendency to be different. You can likewise utilize cluster analysis to summarize data as
rather than to find "natural" or "real" clusters; this utilization of clustering is sometimes called dissection. Clustering
techniques have been connected to a wide variety of research problems.
For example, in the field of medicine, clustering diseases, cures for diseases, or symptoms of diseases can lead to
very useful taxonomies. What are the diagnostic clusters? To answer this question the researcher would devise a
diagnostic questionnaire that entails the symptoms (for example in psychology standardized scales for anxiety,
depression etc.). The cluster analysis can then identify groups of patients that present with similar symptoms and
simultaneously maximize the difference between the groups. In Marketing – What are the customer segments? To
answer this question a market researcher conducts a survey most commonly covering needs, attitudes,
demographics, and behavior of customers. The researcher then uses the cluster analysis to identify homogenous
groups of customers that have similar needs and attitudes but are distinctively different from other customer
segments. In general, whenever we need to classify a "mountain" of information into manageable meaningful piles,
cluster analysis is of great utility.
Association rule reveal fascinating affiliations and correlation relationships among large sets of data items.
Association rules show attribute value conditions that happen regularly together in a given data set. A typical
example of association rule mining is Market Basket Analysis. In data mining, association’s rules are helpful for
analyzing and foreseeing customer behavior. Data is collected using bar-code scanners in supermarkets. Such market
basket databases consist of a large number of transaction records. Every record list all items purchased by a
customer on a single purchase transaction. The Association model is often associated with "market basket analysis",
which is utilized to find relationships or correlations in a set of items. A typical association rule of this kind affirms
the probability that, for instance, "70% of the general population who purchase spaghetti, wine, and sauce likewise
purchase garlic bread."
Clustering and Association models can help organizations to be interested in Market Segmentation which is being
one of the best uses of data mining is to segment your customers. Furthermore, it's really simple; from your
information you can separate your market into important segments such as age, income, occupation or gender.
Segmentations can also help you with understanding your competition. This insight alone will offer you some
assistance with identifying that the typical suspects are not the only ones focusing on the same client money as you
seem to be.

https://blog.kissmetrics.com/data-mining/
Clustering and Association models can help companies to be interested in Market Segmentation
which is one of the best uses of data mining is to segment your customers. And it’s pretty simple.
From your data you can break down your market into meaningful segments like age, income,
occupation or gender. Segmentation can also help you understand your competition. This insight
alone will help you identify that the usual suspects are not the only ones targeting the same
customer money as you are.
http://www.solver.com/xlminer/help/association-rules
Association rules are if/then statements that help uncover relationships between seemingly
unrelated data in a relational database or other information repository. An example of an
association rule would be "If a customer buys a dozen eggs, he is 80% likely to also purchase
milk." In data mining, association rules are useful for analyzing and predicting customer
behavior. They play an important part in shopping basket data analysis, product clustering,
catalog design and store layout.
Association rule mining finds interesting associations and correlation relationships among large
sets of data items. Association rules show attribute value conditions that occur frequently
together in a given data set. A typical example of association rule mining is Market Basket
Analysis.
Data is collected using bar-code scanners in supermarkets. Such market basket databases consist
of a large number of transaction records. Each record lists all items bought by a customer on a
single purchase transaction. Managers would be interested to know if certain groups of items are
consistently purchased together. They could use this data for adjusting store layouts (placing
items optimally with respect to each other), for cross-selling, for promotions, for catalog design,
and to identify customer segments based on buying patterns.
https://docs.oracle.com/cd/B14117_01/datamine.101/b10698/4descrip.htm
The Association model is often associated with "market basket analysis", which is used to
discover relationships or correlations in a set of items. It is widely used in data analysis for direct
marketing, catalog design, and other business decision-making processes. A typical association
rule of this kind asserts the likelihood that, for example, "70% of the people who buy spaghetti,
wine, and sauce also buy garlic bread."
Association models capture the co-occurrence of items or events in large volumes of customer
transaction data. Because of progress in bar-code technology, it is now possible for retail
organizations to collect and store massive amounts of sales data, referred to as "basket data."
Association models were initially defined on basket data, even though they are applicable in
several other applications. Finding all such rules is valuable for cross-marketing and mail-order
promotions, but there are other applications as well: catalog design, add-on sales, store layout,
customer segmentation, web page personalization, and target marketing.

http://support.sas.com/rnd/app/stat/procedures/ClusterAnalysis.html
The purpose of cluster analysis is to place objects into groups, or clusters, suggested by the data,
not defined a priori, such that objects in a given cluster tend to be similar to each other in some
sense, and objects in different clusters tend to be dissimilar. You can also use cluster analysis to
summarize data rather than to find "natural" or "real" clusters; this use of clustering is sometimes
called dissection. The SAS/STAT procedures for clustering are oriented toward disjoint or
hierarchical clusters from coordinate data, distance data, or a correlation or covariance matrix.
The SAS/STAT cluster analysis procedures include the following:
 ACECLUS Procedure — Obtains approximate estimates of the pooled within-cluster
covariance matrix when the clusters are assumed to be multivariate normal with equal
covariance matrices
 CLUSTER Procedure — Hierarchically clusters the observations in a SAS data
 DISTANCE Procedure — Computes various measures of distance, dissimilarity, or
similarity between the observations (rows) of a SAS data set. Proximity measures are
stored as a lower triangular matrix or a square matrix in an output data set that can then
be used as input to the CLUSTER, MDS, and MODECLUS procedures.
 FASTCLUS Procedure — Disjoint cluster analysis on the basis of distances computed
from one or more quantitative variables
 MODECLUS Procedure — Clusters observations in a SAS data set
 TREE Procedure — Produces a tree diagram, also known as a dendrogram or
phenogram, from a data set created by the CLUSTER or VARCLUS procedure
 VARCLUS Procedure — Divides a set of numeric variables into disjoint or hierarchical
clusters
------------
http://documents.software.dell.com/Statistics/Textbook/Cluster-Analysis
The term cluster analysis (first used by Tryon, 1939) encompasses a number of different
algorithms and methods for grouping objects of similar kind into respective categories. A general
question facing researchers in many areas of inquiry is how to organize observed data into
meaningful structures, that is, to develop taxonomies. In other words cluster analysis is an
exploratory data analysis tool which aims at sorting different objects into groups in a way that
the degree of association between two objects is maximal if they belong to the same group and
minimal otherwise. Given the above, cluster analysis can be used to discover structures in data
without providing an explanation/interpretation. In other words, cluster analysis simply discovers
structures in data without explaining why they exist.
We deal with clustering in almost every aspect of daily life. For example, a group of diners
sharing the same table in a restaurant may be regarded as a cluster of people. In food stores items
of similar nature, such as different types of meat or vegetables are displayed in the same or
nearby locations. There is a countless number of examples in which clustering plays an important
role. For instance, biologists have to organize the different species of animals before a

meaningful description of the differences between animals is possible. According to the modern
system employed in biology, man belongs to the primates, the mammals, the amniotes, the
vertebrates, and the animals. Note how in this classification, the higher the level of aggregation
the less similar are the members in the respective class. Man has more in common with all other
primates (e.g., apes) than it does with the more "distant" members of the mammals (e.g., dogs),
etc. For a review of the general categories of cluster analysis methods, see Joining (Tree
Clustering), Two-way Joining (Block Clustering), and k-Means Clustering. In short, whatever
the nature of your business is, sooner or later you will run into a clustering problem of one form
or another.

Statistical Significance Testing
Note that the above discussions refer to clustering algorithms and do not mention anything about
statistical significance testing. In fact, cluster analysis is not as much a typical statistical test as it
is a "collection" of different algorithms that "put objects into clusters according to well defined
similarity rules." The point here is that, unlike many other statistical procedures, cluster analysis
methods are mostly used when we do not have any a priori hypotheses, but are still in the
exploratory phase of our research. In a sense, cluster analysis finds the "most significant solution
possible." Therefore, statistical significance testing is really not appropriate here, even in cases
when p-levels are reported (as in k -means clustering).
Area of Application
Clustering techniques have been applied to a wide variety of research problems. Hartigan (1975)
provides an excellent summary of the many published studies reporting the results of cluster
analyses. For example, in the field of medicine, clustering diseases, cures for diseases, or
symptoms of diseases can lead to very useful taxonomies. In the field of psychiatry, the correct
diagnosis of clusters of symptoms such as paranoia, schizophrenia, etc. is essential for successful
therapy. In archeology, researchers have attempted to establish taxonomies of stone tools, funeral
objects, etc. by applying cluster analytic techniques. In general, whenever we need to classify a
"mountain" of information into manageable meaningful piles, cluster analysis is of great utility.
-----
http://www.statisticssolutions.com/cluster-analysis-2/
What is the Cluster Analysis?
The Cluster Analysis is an explorative analysis that tries to identify structures within the data.
Cluster analysis is also called segmentation analysis or taxonomy analysis. More specifically, it
tries to identify homogenous groups of cases, i.e., observations, participants, respondents.
Cluster analysis is used to identify groups of cases if the grouping is not previously known.
Because it is explorative it does make any distinction between dependent and independent
variables.
The Cluster Analysis is often part of the sequence of analyses of factor analysis, cluster analysis,
and finally, discriminant analysis. First, a factor analysis that reduces the dimensions and
therefore the number of variables makes it easier to run the cluster analysis. Also, the factor
analysis minimizes multicollinearity effects. The next analysis is the cluster analysis, which
identifies the grouping. Lastly, a discriminant analysis checks the goodness of fit of the model
that the cluster analysis found and profiles the clusters

Medicine – What are the diagnostic clusters? To answer this question the researcher would
devise a diagnostic questionnaire that entails the symptoms (for example in psychology
standardized scales for anxiety, depression etc.). The cluster analysis can then identify groups of
patients that present with similar symptoms and simultaneously maximize the difference between
the groups.
 Marketing – What are the customer segments? To answer this question a market researcher
conducts a survey most commonly covering needs, attitudes, demographics, and behavior of
customers. The researcher then uses the cluster analysis to identify homogenous groups of
customers that have similar needs and attitudes but are distinctively different from other
customer segments.
 Education – What are student groups that need special attention? The researcher measures a
couple of psychological, aptitude, and achievement characteristics. A cluster analysis then
identifies what homogeneous groups exist among students (for example, high achievers in all
subjects, or students that excel in certain subjects but fail in others, etc.). A discriminant analysis
then profiles these performance clusters and tells us what psychological, environmental,
aptitudinal, affective, and attitudinal factors characterize these student groups.
 Biology – What is the taxonomy of species? The researcher has collected a data set of
different plants and noted different attributes of their phenotypes. A hierarchical cluster analysis
groups those observations into a series of clusters and builds a taxonomy tree of groups and
subgroups of similar plants.
K-means cluster is a method to quickly cluster large data sets, which typically take a while to
compute with the preferred hierarchical cluster analysis. The researcher must to define the
number of clusters in advance. This is useful to test different models with a different assumed
number of clusters (for example, in customer segmentation).
Hierarchical cluster is the most common method. We will discuss this method shortly. It takes
time to calculate, but it generates a series of models with cluster solutions from 1 (all cases in
one cluster) to n (all cases are an individual cluster). Hierarchical cluster also works with
variables as opposed to cases; it can cluster variables together in a manner somewhat similar to
factor analysis. In addition, hierarchical cluster analysis can handle nominal, ordinal, and scale
data, however it is not recommended to mix different levels of measurement.

Kindly revert for any queries

Thanks.


Related Solutions

Give specific examples of how a company uses TVM in the business environment. Your example can...
Give specific examples of how a company uses TVM in the business environment. Your example can be either present value or future value application. It is not necessary to provide sources for your research.
Give an example to explain how factor analysis can be useful in Multivariate data?
Give an example to explain how factor analysis can be useful in Multivariate data?
How can data provide information to evaluate quality patient outcomes? Give an example of data that...
How can data provide information to evaluate quality patient outcomes? Give an example of data that can reflect poor quality in care. How can quality improvement be a daily task in patient care? Why does continuous quality improvement need to be associated with change? Please provide a reference. Thanks in advance.
How can data provide information to evaluate quality patient outcomes? Give an example of data that...
How can data provide information to evaluate quality patient outcomes? Give an example of data that can reflect poor quality in care. How can quality improvement be a daily task in patient care? Why does continuous quality improvement need to be associated with change? (ANSWER in paragraph format)
How can data provide information to evaluate quality patient outcomes? Give an example of data that...
How can data provide information to evaluate quality patient outcomes? Give an example of data that can reflect poor quality in care. How can quality improvement be a daily task in patient care? Why does continuous quality improvement need to be associated with change?
Give examples and describe the various means by which the US government collects revenues and then...
Give examples and describe the various means by which the US government collects revenues and then spends or redistribute the gained revenues.
Phillosophy!! What are the various reasons Descartes provides to show how he can doubt various categories...
Phillosophy!! What are the various reasons Descartes provides to show how he can doubt various categories of belief? (Meditation 1)
The Economist collects data each year on the price of a Big Mac in various countries...
The Economist collects data each year on the price of a Big Mac in various countries around the world. A sample of McDonald's restaurants in Europe in July 2016 resulted in the following Big Mac prices (after conversion to U.S. dollars). 4.45 3.18 2.42 3.96 4.33 4.53 4.16 3.68 4.63 3.80 3.33 3.85 The mean price of a Big Mac in the U.S. in July 2016 was $5.04. For purposes of this exercise, you can assume it is reasonable to...
describe how the various types of protozoans move, and give an example of each type
describe how the various types of protozoans move, and give an example of each type
give an example of a company that succeeded both financially and sustainably. when interest of various...
give an example of a company that succeeded both financially and sustainably. when interest of various stakeholders may differ, how should a manger choose between them? Consider Walmart: the company helped millions of consumers save their hard- earned money, but paid very low salaries to its employees.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT