In: Math
COMPLETE A LOGISTIC REGRESSION, AS WELL AS A K-MEANS CLUSTER ANALYSIS IN EXCEL?
Using the data to find four clusters of cities. Write a short report about the clusters you find. Does the clustering make sense? Can you provide descriptive, meaningful names for the clusters? SHOW GRAPHS PLEASE (Scatter plot/cluster)
Metropolitan_Area | Cost_Living | Transportation | Jobs | Education |
Abilene, TX | 96.32 | 36.54 | 17.28 | 49.29 |
Akron, OH | 47.31 | 69.68 | 86.11 | 71.95 |
Albany, GA | 86.12 | 28.02 | 32.01 | 26.62 |
Albany-Schenectady-Troy, NY | 25.22 | 82.71 | 52.97 | 99.43 |
Albuquerque, NM | 44.48 | 84.13 | 90.65 | 71.67 |
Alexandria, LA | 92.36 | 42.49 | 19.26 | 11.61 |
Allentown-Bethlehem-Easton, PA | 33.72 | 66.57 | 29.46 | 63.45 |
Altoona, PA | 61.76 | 26.91 | 12.18 | 1.69 |
Amarillo, TX | 96.89 | 60.05 | 28.32 | 54.1 |
Anchorage, AK | 15.87 | 84.41 | 76.48 | 41.35 |
Ann Arbor, MI | 7.37 | 15.86 | 77.33 | 83 |
Anniston, AL | 93.21 | 7.08 | 7.64 | 21.81 |
Appleton-Oshkosh-Neenah, WI | 54.4 | 70.82 | 79.32 | 47.3 |
Asheville, NC | 54.11 | 54.67 | 54.1 | 59.77 |
Athens, GA | 62.4 | 29.46 | 47.3 | 45.32 |
Atlanta, GA | 39.38 | 98.3 | 99.15 | 82.71 |
Atlantic City-Cape May, NJ | 30.03 | 66.85 | 62.03 | 20.39 |
Augusta-Aiken, GA-SC | 77.91 | 35.41 | 63.45 | 46.45 |
Austin-San Marcos, TX | 50.43 | 78.75 | 98.01 | 98.3 |
K-means Cluster Analysis: cost_living, transportation, jobs, education
Final Partition
Number of clusters: 4
Average Maximum
Within distance distance
Number of cluster sum from from
observations of squares centroid centroid
Cluster1 2 349.031 13.210 13.210
Cluster2 9 13337.304 35.629 64.568
Cluster3 6 5671.536 28.924 39.832
Cluster4 2 1090.015 23.345 23.345
Cluster Centroids
Grand
Variable Cluster1 Cluster2 Cluster3 Cluster4 centroid
cost_living 96.6050 38.1533 78.9600 29.4700 56.2784
transportation 48.2950 69.2744 28.2283 74.6400 54.6689
jobs 22.8000 80.3533 30.3067 41.2150 54.3711
education 51.6950 64.0489 25.5833 81.4400 52.4321
Distances Between Cluster Centroids
Cluster1 Cluster2 Cluster3 Cluster4
Cluster1 0.0000 85.5672 38.1076 80.1564
Cluster2 85.5672 0.0000 85.6401 44.0278
Cluster3 38.1076 85.6401 0.0000 88.5565
Cluster4 80.1564 44.0278 88.5565 0.0000
It is order-independent; for a given initial seed set of cluster centers, it generates the same partition of the data irrespective of the order in which the patterns are presented to the algorithm.