Question

In: Statistics and Probability

Customer Age Female Income Married Children Loan Mortgage A 29 0 12623.4 1 1 1 0...

Customer Age Female Income Married Children Loan Mortgage
A 29 0 12623.4 1 1 1 0
B 25 0 23818.6 1 0 0 0
C 40 1 31473.9 0 2 0 1
D 48 0 20268 1 0 0 0
E 65 0 51417 1 2 0 0
F 59 1 30971.8 1 3 1 1
G 61 1 47025 0 2 1 1
H 30 1 9672.25 1 0 1 0
I 31 1 15976.3 1 0 1 0
J 29 0 14711.8 1 0 0 1

Know Thy Customer (KTC) is a financial consulting company that provides personalized financial advice to its clients. As a basis for developing this tailored advising, KTC would like to segment its customers into several representative groups based on key characteristics. Peyton Blake, the director of KTC’s fledging analytics division, plans to establish the set of representative customer profiles based on 600 customer records in the file KnowThyCustomer. Each customer record contains data on age, gender, annual income, marital status, number of children, whether the customer has a car loan, and whether the customer has a home mortgage. KTC’s market research staff has determined that these seven characteristics should form the basis of the customer clustering.

The data contains both categorical variables (Female, Married, Car, and Mortgage) and numerical variables (Age, Income, and Children).

  1. Using matching coefficient to compute dissimilarity between observations (hints: consider only the categorical/binary variables Female, Married, Loan, and Mortgage.)
  1. Using hierarchical clustering method in analyzing the selected data set (hints: consider only the numerical variables; normalize the values of variables – calculating z-scores before you conduct the cluster analysis)

  1. For the selected data set, apply k-means clustering with using Age, Income, and Children as variables. Normalize the values of the input variables. This will generate a total of two clusters. Describe these two clusters of clients according to their “average” characteristics.

  1. For Question 3, set k = 3, i.e., three clusters and rerun the cluster analysis. Compare the clustering results (k = 2 vs. k = 3) and discuss, in your opinion, which method is better while describing customers’ average charteristics and designing marketing segmentation.

  1. Applying appropriate data visualization tool(s) while illustrating your analysis results.

Solutions

Expert Solution

Question:

Using hierarchical clustering method in analyzing the selected data set (hints: consider only the numerical variables; normalize the values of variables – calculating z-scores before you conduct the cluster analysis)

Answer:

Cluster Analysis of Observations: Age, Income, Children

Standardized Variables, Euclidean Distance, Complete Linkage

Amalgamation Steps

At each step in the amalgamation process, view the clusters that are formed and examine their similarity and distance levels. The higher the similarity level, the more similar the observations are in each cluster. The lower the distance level, the closer the observations are in each cluster.

Ideally, the clusters should have a relatively high similarity level and a relatively low distance level. However, you must balance that goal with having a reasonable and practical number of clusters.

Key Results: Similarity level, Distance level

In these results, the data contain a total of 10 observations. In step 1, two clusters (observations 9 and 10 in the worksheet) are joined to form a new cluster. This step creates 9 clusters in the data, with a similarity level of 96.1429 and a distance level of 0.15756. Although the similarity level is high and the distance level is low, the number of clusters is too high to be useful. At each subsequent step, as new clusters are formed, the similarity level decreases and the distance level increases. At the final step, all the observations are joined into a single cluster.

Step Number of
clusters
Similarity
level
Distance
level
Clusters
joined
New
cluster
Number
of obs.
in new
cluster
1 9 96.1429 0.15756 9 10 9 2
2 8 90.1496 0.40238 5 7 5 2
3 7 89.1278 0.44413 8 9 8 3
4 6 77.8102 0.90645 1 8 1 4
5 5 70.7584 1.19451 1 2 1 5
6 4 62.9473 1.51359 3 6 3 2
7 3 60.7288 1.60422 1 4 1 6
8 2 47.5129 2.14409 3 5 3 4
9 1 0.0000 4.08498 1 3 1 10

Final Partition

Number of
observations
Within
cluster sum
of squares
Average
distance
from
centroid
Maximum
distance
from
centroid
Cluster1 6 2.66623 0.585548 1.09268
Cluster2 4 3.76429 0.943228 1.24289

Cluster Centroids

Variable Cluster1 Cluster2 Grand centroid
Age -0.633492 0.95024 -0.0000000
Income -0.670189 1.00528 0.0000000
Children -0.721688 1.08253 0.0000000

Distances Between Cluster Centroids

Cluster1 Cluster2
Cluster1 0.00000 2.92756
Cluster2 2.92756 0.00000

Dendrogram

In these results, the data contain a total of 10 observations. In step 1, two clusters (observations 9 and 10 in the worksheet) are joined to form a new cluster. This step creates 9 clusters in the data, with a similarity level of 96.1429 and a distance level of 0.15756. Although the similarity level is high and the distance level is low, the number of clusters is too high to be useful. At each subsequent step, as new clusters are formed, the similarity level decreases and the distance level increases. At the final step, all the observations are joined into a single cluster.

Question:

For the selected data set, apply k-means clustering with using Age, Income, and Children as variables. Normalize the values of the input variables. This will generate a total of two clusters. Describe these two clusters of clients according to their “average” characteristics

Answer :

K-means Cluster Analysis: Age, Income, Children

Method

Number of clusters 2
Standardized variables Yes

Final Partition

Number of
observations
Within
cluster
sum of
squares
Average
distance
from
centroid
Maximum
distance
from
centroid
Cluster1 4 3.764 0.943 1.243
Cluster2 6 2.666 0.586 1.093

Cluster Centroids

Variable Cluster1 Cluster2 Grand
centroid
Age 0.9502 -0.6335 0.0000
Income 1.0053 -0.6702 0.0000
Children 1.0825 -0.7217 0.0000

Distances Between Cluster Centroids

Cluster1 Cluster2
Cluster1 0.0000 2.9276
Cluster2 2.9276 0.0000

For Question 3, set k = 3, i.e., three clusters and rerun the cluster analysis. Compare the clustering results (k = 2 vs. k = 3) and discuss, in your opinion, which method is better while describing customers’ average charteristics and designing marketing segmentation

K-means Cluster Analysis: Age, Income, Children

Method

Number of clusters 3
Standardized variables Yes

Final Partition

Number of
observations
Within
cluster
sum of
squares
Average
distance
from
centroid
Maximum
distance
from
centroid
Cluster1 2 0.398 0.446 0.446
Cluster2 4 1.569 0.562 0.970
Cluster3 4 3.764 0.943 1.243

Cluster Centroids

Variable Cluster1 Cluster2 Cluster3 Grand
centroid
Age -0.7968 -0.5519 0.9502 0.0000
Income -1.0207 -0.4949 1.0053 0.0000
Children -0.4330 -0.8660 1.0825 0.0000

Distances Between Cluster Centroids

Cluster1 Cluster2 Cluster3
Cluster1 0.0000 0.7239 3.0747
Cluster2 0.7239 0.0000 2.8816
Cluster3 3.0747 2.8816 0.0000

Amalgamation Steps

Step Number of
clusters
Similarity
level
Distance
level
Clusters
joined
New
cluster
Number
of obs.
in new
cluster
1 9 96.1429 0.15756 9 10 9 2
2 8 90.1496 0.40238 5 7 5 2
3 7 89.1278 0.44413 8 9 8 3
4 6 77.8102 0.90645 1 8 1 4
5 5 70.7584 1.19451 1 2 1 5
6 4 62.9473 1.51359 3 6 3 2
7 3 60.7288 1.60422 1 4 1 6
8 2 47.5129 2.14409 3 5 3 4
9 1 0.0000 4.08498 1 3 1 10

The similarity decreases by more than 13 (from 60.7288 to 47.5129) at steps 7 and 8, when the number of clusters changes from 3 to 2. These results indicate that 3 clusters may be sufficient for the final partition. If this grouping makes intuitive sense, then it is probably a good choice.

This dendrogram was created using a final partition of 3 clusters, which occurs at a similarity level of approximately 60. The first cluster (far left) is composed of two observations (the observations in rows 5 and 7 of the worksheet). The second cluster, directly to the right, is composed of 6 observations (the observations in rows 1,2,4,8,9 and 10 of the worksheet). The third cluster is composed of 2 observations (the observations in rows 3 and 6).If you cut the dendrogram higher, then there would be fewer final clusters, but their similarity level would be lower. If you cut the dendrogram lower, then the similarity level would be higher, but there would be more final clusters.

The method for k=3 is better while describing customers’ average charteristics and designing marketing segmentation.


Related Solutions

Jed, age 55, is married with no children. During 2018, Jed had the following income and...
Jed, age 55, is married with no children. During 2018, Jed had the following income and expense items: Three years ago, Jed loaned a friend $10,000 to help him purchase a new car. In June of the current year, Jed learned that his friend had been declared bankrupt and had left the country. There is no possibility that Jed will ever collect any of the $10,000. In April of last year, Jed purchased some stock for $5,000. In March of...
Peter Nelson, age 36, and Connie Nelson, age 34, are married with 2 children, Robert, age...
Peter Nelson, age 36, and Connie Nelson, age 34, are married with 2 children, Robert, age 4, and Mary, age 2. Peter is a senior executive with Mega Corporation where he has an extensive benefits package including an endorsement method split-dollar life insurance plan, group disability income insurance, group long-term care insurance, group term life insurance ($70,000), nonqualified deferred compensation, stock options, group dental and vision insurance, a Section 401(k) plan, and group medical expense insurance. Connie has a middle...
Mike (age 40) and Molly (age 38) are married and have three children ages 5, 10,...
Mike (age 40) and Molly (age 38) are married and have three children ages 5, 10, and 13. Their salaries for the year amounted to $91,375 and they received $3,750 of taxable interest income. Mike and Molly’s deductions for adjusted gross income amounted to $3,150, and their itemized deductions were $13,250. Mike and Molly file a joint income tax return for 2017. Calculate the following amounts (answer each question independently from all other questions): A.Adjusted gross income (AGI). Show the...
Mildred is a 45-y.o. married female with three children. She presents to you with complaints of...
Mildred is a 45-y.o. married female with three children. She presents to you with complaints of fatigue and difficulties sleeping. She states she wants to get a good night’s sleep and is requesting a prescription to help her sleep. Mildred tells you she is awake off and on during the night, frequently thinking about her husband’s recent layoff from construction work and the effect this is having on the family. She lies down often during the day and has been...
Person Gender Married Age Children Salary Spent 1 Male No 28 1 48600 750 2 Male...
Person Gender Married Age Children Salary Spent 1 Male No 28 1 48600 750 2 Male Yes 35 2 75500 1980 3 Female No 33 2 58900 1820 4 Female Yes 53 1 92200 990 5 Female No 49 1 100700 1990 6 Male Yes 57 1 128900 1900 7 Female Yes 53 1 84900 1000 8 Female No 34 2 62500 2470 9 Male Yes 55 3 142700 2400 10 Male No 33 1 92300 2460 11 Female Yes...
MALE OF 26 YEARS OF AGE, 4 YEARS OF MARRIED, WITHOUT CHILDREN, PRESENTED TO THE CONSULTATION...
MALE OF 26 YEARS OF AGE, 4 YEARS OF MARRIED, WITHOUT CHILDREN, PRESENTED TO THE CONSULTATION FOR ASSESSMENT AND CHECK SINCE THEY HAVE NOT BEEN ABLE TO HAVE CHILDREN, THE WIFE WAS ALREADY UNDER GYNECOLOGY ASSESSMENT FINDING HER WITHOUT ALTERATIONS AND PHYSIOLOGICALLY PERFECT FOR PROCREATION, WHICH IS WHY THE PATIENT IS PRESENTED AT THE UROLOGY MEDICAL CONSULTATION TO BE ABLE TO HAVE THE DIAGNOSIS AND TREATMENT. AS ANTECEDENT REFERRED TO FREQUENT AFFECTION OF PALPABLE CYSTIC MASSES IN THE EPIPIDYM OF...
Dave and Sally Tufts, both age 35, are married with two children and file a joint...
Dave and Sally Tufts, both age 35, are married with two children and file a joint return. Assume the children do not qualify for the child tax credit, and the mortgage is not over $750,000. From the following information, compute their tax owed or refund due for 2020. Dave's Salary = $50,000 Federal income tax withheld = 4,000 Sally's Salary= 42,000 Federal income tax withheld = 5,000 Andy's contribution to an IRA (assume IRA is deductible for AGI)= 2,000 Dividends...
Dave and Sally Tufts, both age 35, are married with two children and file a joint...
Dave and Sally Tufts, both age 35, are married with two children and file a joint return. Assume the children do not qualify for the child tax credit, and the mortgage is not over $750,000. From the following information, compute their tax owed or refund due for 2020. Dave's Salary = $50,000 Federal income tax withheld = 4,000 Sally's Salary= 42,000 Federal income tax withheld = 5,000 Andy's contribution to an IRA (assume IRA is deductible for AGI)= 2,000 Dividends...
There is a married couple with three children and an annual income of $24,000 who, because...
There is a married couple with three children and an annual income of $24,000 who, because of the standard deduction and personal exemptions, owe no federal income tax. Suppose the child tax credit is such that for families that pay less income tax than the child credit to which they are entitled the law allows them to receive a refund of 15% of earnings over $12,000, up to a maximum of $800. a. What is the family’s tax credit per...
Bought Income Children ViewedAd 0 37.00 2 2 1 47.00 1 1 0 47.00 1 2...
Bought Income Children ViewedAd 0 37.00 2 2 1 47.00 1 1 0 47.00 1 2 0 49.00 2 2 1 59.00 1 1 0 13.00 2 1 0 51.00 1 2 0 38.00 1 2 0 60.00 1 1 1 48.00 1 1 0 17.00 1 2 0 60.00 2 2 0 38.00 1 1 0 24.00 1 2 0 15.00 1 2 1 59.00 1 2 0 28.00 1 2 0 36.00 1 2 0 10.00 2 1...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT