In: Statistics and Probability
Research conducted by Worldwide, Inc., a manufacturer of laptop computers, shows that potential laptop customers (i.e., “buyers”) differ in the importance that they attach to the following two laptop features: (1) that the laptop include a solid state drive (i.e., SSD), and (2) that the laptop include a high resolution screen. A sample of five representative (prospective) customers of Worldwide reveals the following set of preferences for these attributes/benefits (the data are comma-delimited):
1, 6, 8
2, 3, 4
3, 4, 1
4, 4, 8
5, 5.5, 7 (this is not a typo: the x value for customer #5 is 5 ½)
where:
(1) the measurement scale is continuous, and ranges from 1=very unimportant to
10=very important
(2) the first entry in a row is the respondent i.d. number
(3) the second entry (x-axis) is the importance weight attached to “includes a SSD”
(4) the third entry (y-axis) is the importance weight attached to “includes a high resolution
screen”
For Question #1, parts (a) through (h) below, perform a k-means cluster analysis of the Worldwide data. For purposes of this question, set k= 2, and use the point (3,5) as initial centroid #1 and the point (6,1) as initial centroid #2. Perform all numeric calculations to 3 decimal places of precision (e.g., 8.352).
(1a) Which customers (i.e., “buyers”) are assigned to starting centroid #1, and what is the Euclidean distance between each of these customers and starting centroid #1? In your answer clearly indicate each customer’s id, and the Euclidean distance (to 3 decimal places) between the customer and starting (i.e., initial) centroid #1.
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
(Note: here, and below, complete for as many customers as appropriate)
(1b) Which customers are assigned to starting centroid #2, and what is the Euclidean distance between each of these customers and starting centroid #2? In your answer clearly indicate each customer’s id, and the Euclidean distance (to 3 decimal places) between the customer and starting centroid #2.
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
(1c) Following your assignment (in 1a and 1b above) of customers to the two starting centroids, what are the revised (i.e., updated) centroid values for centroid #1 and centroid #2? (Note: you can refer to these revisions as “1st iteration”-revised centroids)
1st iteration-revised centroid #1: ___________
1st iteration-revised centroid #2: ___________
(1d) Next, based on your answer to part (1c), and continuing the k-means clustering process, which customers should be assigned to the 1st iteration-revised centroid #1, and what is the Euclidean distance between each of these customers and the 1st iteration-revised centroid #1?
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
(1e) Similarly, based on your answer to part (1c), which customers should be assigned to the 1st iteration-revised centroid #2, and what is the Euclidean distance between each of these customers and the 1st iteration-revised centroid #2?
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
Buyer id: _________ Distance: ____________
(1f) Based on your customer assignments in parts (1d) and (1e), what are the 2nd iteration-revised centroid values, for centroid #1 and centroid #2?
2nd iteration-revised centroid #1: ___________
2nd iteration-revised centroid #2: ___________
(1g) Are any additional iterations needed in this k-means clustering problem? Yes or no? Why or why not?
___________________________________________________________________
(1h) What is the Euclidean distance between the final cluster centroids (i.e., the 2nd iteration-revised centers)?
Distance = _______________
Number of clusters: 2
Number of within cluster Average distance Maximum distance
Observations sum of squares from centroid from centroid
Cluster1 3 2.833 0.952 1.213
Cluster2 2 5.000 1.581 1.581
Cluster Centroids
Variable Cluster1 Cluster2 Grand centroid
SSD 5.1667 3.5000 4.5000
HIGHRES 7.6667 2.5000 5.6000
Distances between Cluster Centroids
Cluster1 Cluster2
Cluster1 0.0000 5.4288
Cluster2 5.4288 0.0000
Initial Cluster Centers |
||||||||
---|---|---|---|---|---|---|---|---|
Cluster |
||||||||
1 |
2 |
|||||||
SSD |
6.00 |
4.00 |
||||||
HIGHRESO |
8.00 |
1.00 |
||||||
Iteration Historya |
||||||||
Iteration |
Change in Cluster Centers |
|||||||
1 |
2 |
|||||||
1 |
.898 |
1.581 |
||||||
2 |
.000 |
.000 |
||||||
a. Convergence achieved due to no or small change in cluster centers. The maximum absolute coordinate change for any center is .000. The current iteration is 2. The minimum distance between initial centers is 7.280. |
||||||||
Final Cluster Centers |
||||||||
Cluster |
||||||||
1 |
2 |
|||||||
SSD |
5.17 |
3.50 |
||||||
HIGHRESO |
7.67 |
2.50 |
||||||
|
||||||||
Cluster |
1 |
3.000 |
||||||
2 |
2.000 |
|||||||
Valid |
5.000 |
|||||||
Missing |
.000 |