COMPLETE A LOGISTIC REGRESSION, AS WELL AS A K-MEANS CLUSTER ANALYSIS IN EXCEL? Using the data...

COMPLETE A LOGISTIC REGRESSION, AS WELL AS A K-MEANS CLUSTER ANALYSIS IN EXCEL?

Using the data to find four clusters of cities. Write a short report about the clusters you find. Does the clustering make sense? Can you provide descriptive, meaningful names for the clusters? SHOW GRAPHS PLEASE (Scatter plot/cluster)

Metropolitan_Area	Cost_Living	Transportation	Jobs	Education
Abilene, TX	96.32	36.54	17.28	49.29
Akron, OH	47.31	69.68	86.11	71.95
Albany, GA	86.12	28.02	32.01	26.62
Albany-Schenectady-Troy, NY	25.22	82.71	52.97	99.43
Albuquerque, NM	44.48	84.13	90.65	71.67
Alexandria, LA	92.36	42.49	19.26	11.61
Allentown-Bethlehem-Easton, PA	33.72	66.57	29.46	63.45
Altoona, PA	61.76	26.91	12.18	1.69
Amarillo, TX	96.89	60.05	28.32	54.1
Anchorage, AK	15.87	84.41	76.48	41.35
Ann Arbor, MI	7.37	15.86	77.33	83
Anniston, AL	93.21	7.08	7.64	21.81
Appleton-Oshkosh-Neenah, WI	54.4	70.82	79.32	47.3
Asheville, NC	54.11	54.67	54.1	59.77
Athens, GA	62.4	29.46	47.3	45.32
Atlanta, GA	39.38	98.3	99.15	82.71
Atlantic City-Cape May, NJ	30.03	66.85	62.03	20.39
Augusta-Aiken, GA-SC	77.91	35.41	63.45	46.45
Austin-San Marcos, TX	50.43	78.75	98.01	98.3

Expert Solution

K-means Cluster Analysis: cost_living, transportation, jobs, education

Final Partition

Number of clusters: 4

Average Maximum
Within distance distance
Number of cluster sum from from
observations of squares centroid centroid
Cluster1 2 349.031 13.210 13.210
Cluster2 9 13337.304 35.629 64.568
Cluster3 6 5671.536 28.924 39.832
Cluster4 2 1090.015 23.345 23.345

Cluster Centroids

Grand
Variable Cluster1 Cluster2 Cluster3 Cluster4 centroid
cost_living 96.6050 38.1533 78.9600 29.4700 56.2784
transportation 48.2950 69.2744 28.2283 74.6400 54.6689
jobs 22.8000 80.3533 30.3067 41.2150 54.3711
education 51.6950 64.0489 25.5833 81.4400 52.4321

Distances Between Cluster Centroids

Cluster1 Cluster2 Cluster3 Cluster4
Cluster1 0.0000 85.5672 38.1076 80.1564
Cluster2 85.5672 0.0000 85.6401 44.0278
Cluster3 38.1076 85.6401 0.0000 88.5565
Cluster4 80.1564 44.0278 88.5565 0.0000

It is order-independent; for a given initial seed set of cluster centers, it generates the same partition of the data irrespective of the order in which the patterns are presented to the algorithm.

milcah answered 2 years ago

When should logistic regression be used for data analysis? What is the assumption of logistic regression?...

When should logistic regression be used for data analysis? What is the assumption of logistic regression? How to explain odds ratio?

Using the ruspini dataset provided with the cluster package in R, perform a k-means analysis. Document...

Using the ruspini dataset provided with the cluster package in R, perform a k-means analysis. Document the findings and justify the choice of K. Hint: use data(ruspini) to load the dataset into the R workspace.

We've now had an introduction to several different models: Linear regression, logistic regression, k-means, hierarchical clustering,...

We've now had an introduction to several different models: Linear regression, logistic regression, k-means, hierarchical clustering, GMM, Naive Bayes, and decision trees. For this assignment, I would like you to choose three models from the above list and describe two problems that each of the models could potentially be used to solve. You can do one big post with all three models and six solvable problems or do three separate posts if you prefer. Short Explanation of Decision Trees Decision...

Using excel and the data in chapter 14 data set 2, complete the analysis and interpret...

Using excel and the data in chapter 14 data set 2, complete the analysis and interpret the results. It is a 2 x 3 experiment: There are two levels of severity, where the Level 1 is severe and Level 2 is mild, and there are three levels of treatment, where Level 1 is Drug #1, Level 2 is Drug #2, and Level 3 is Placebo. This is an ANOVA with replication because each participant received all three levels of treatment,...

i. Use MS Excel Data Analysis ToolPak to perform a multiple regression analysis using Quality as...

i. Use MS Excel Data Analysis ToolPak to perform a multiple regression analysis using Quality as the response variable and Helpfulness and Clarity as the explanatory variables. Write down the corresponding coefficient estimates and provide the regression output. j. Perform an F-test for the overall usefulness of the model in part i) using a 5% significance level. Make sure you follow all the steps for hypothesis testing indicated in the Instructions section and clearly state your conclusion. k. Test manually...

How would you differentiate among multiple discriminant analysis, regression analysis, logistic regression analysis, and analysis of...

How would you differentiate among multiple discriminant analysis, regression analysis, logistic regression analysis, and analysis of variance and demonstrate statistical significance for each?

One way to cluster objects is called k-means clustering. The goal is to find k different...

One way to cluster objects is called k-means clustering. The goal is to find k different clusters, each represented by a "prototype", defined as the centroid of cluster. The centroid is computed as follows: the jth value in the centroid is the mean (average) of the jth values of all the members of the cluster. Our goal is for every member a cluster to be closer to that cluster's prototype than to any of the other prototypes. Thus a prototype...

2. What is the consequence of using unconditional logistic regression to analyze the data collected from...

2. What is the consequence of using unconditional logistic regression to analyze the data collected from a 1:M matched case-control study?

In the exer- cise, you will implement logistic regression algorithm using SGA, similar to the logistic...

In the exer- cise, you will implement logistic regression algorithm using SGA, similar to the logistic regression algorithm that you have seen in class. You will work with the datasets attached to the assignment and complete the lo- gisticRegression.py file to learn the coefficients and predict binary class labels. The data comes from breast cancer diagnosis where each sample (30 features) is labeled by a diagnose: either M (malignant) or B (be- nign) (recorded in the 31-st column in the...

Describe the relationship of discriminant analysis to regression and ANOVA List two uses of cluster analysis...

Describe the relationship of discriminant analysis to regression and ANOVA List two uses of cluster analysis in marketing? List two market research problems where conjoint analysis could be applied. What is the most common use of the covariate in ANCOVA? Explain with an example.

Question