Question

In: Statistics and Probability

USE R CODE AND SHOW OUTPUT APPLIED STATISTICS 2 Traditionally, the policy for students’ course grade,...

USE R CODE AND SHOW OUTPUT

APPLIED STATISTICS 2

  1. Traditionally, the policy for students’ course grade, >=90, A; between 80 to 89, B, between 70 to 79, C; between 60-69, D; and F, if <60.

Now, suppose we use a new grade policy. We just to separate all students into four parts, with the first parts assigning grade A, second parts assigning grade B, then, C, then D (no F). We use the data RecordMath2526.txt to have a try for our new grading policy.

  1. . In the record file, there is no students ID. Assign students ID by random number between 10000 to 99999. You can use the following R code to get this, which produce 20 random numbers between 10000 to 99999.

    ID<-sample(10000:99999, 20)

  1. Use kmeans function to cluster students into four clusters, based on Exam1, Exam2, and Final score.
  1. Produce a data file with students ID and their cluster number (1, 2, 3, 4).
  1. (Optional). Convert the numbers (1, 2, 3, 4) into grades (A,B,C,D) or (D,C,B,A).

USE R CODE AND SHOW OUTPUT

APPLIED STATISTICS 2

RecordMath2526.txt

Index Gender Hw1     Hw2     Hw3     Exam1 Hw4     Exam2 Hw5     Hw6     Hw7     Final

1 F 9    6          8          60        7          82        10        10        9          69

2 M 10   10        10        94        9          98        10        10        8          91

3 M 9    10        8          79        9          55        10        6          8          43

4 F 10   9          9          91        8          88        10        9          8          84

5 F 9    8          9          71        9          97        10        9          9          89

6 F 9    9          7          64        9          87        10        9          9          58

7 M 9    9          9          55        7          59        10        6          0          68

8 M 9    10        7          71        10        70        10        8          10        59

9 M 8    10        9          81        10        100      10        10        9          98

10 F 9   9          7          76        6          58        10        5          8          50

11 F 10  6          7          69        5          55        10        4          8          47

12 F 9   5          4          46        7          72        10        6          7          78

13 M 9   9          10        71        9          85        10        8          7          67

14 M 8   9          8          60        10        71        10        8          10        75

15 M 10 9          10        71        10        93        10        9          9          67

16 F 9   10        8          70        7          80        10        10        8          83

17 F 9   10        9          72        7          89        10        9          9          78

18 M 10  10        10        80        10        94        10        9          10        71

19 F 10  10        9          66        9          78        10        6          8          83

20 F 8   9          7          78        6          81        9          7          8 84           

USE R CODE AND SHOW OUTPUT

APPLIED STATISTICS 2

Solutions

Expert Solution

R code

##### Cluster Analysis####


cluster_data <- read.csv(file.choose(),header=T) # read saved CSV file containing data
attach(cluster_data)

  
## First part #########
ID<-sample(10000:99999,20)
cluster_data_ID<-data.frame(cluster_data,ID)

head(cluster_data_ID) # To see top rows


## Extracting Exam1, Exam2 and Final column
cluster_data_ID.new<- cluster_data_ID[,c(6,8,12)]
cluster_data_ID.class<- cluster_data_ID[,"ID"]
head(cluster_data_ID.new)

## To nomalize the data
normalize <- function(x){
return ((x-min(x))/(max(x)-min(x)))
}

cluster_data_ID.new$Exam1<- normalize(cluster_data_ID.new$Exam1)
cluster_data_ID.new$Exam2<- normalize(cluster_data_ID.new$Exam2)
cluster_data_ID.new$Final<- normalize(cluster_data_ID.new$Final)

head(cluster_data_ID.new)


### K means clustering
result<- kmeans(cluster_data_ID.new,4) #aplly k-means algorithm with no. of centroids(k)=4
result$size # gives no. of records in each cluster

result$centers # gives value of cluster center datapoint value(4 centers for k=4)
result$cluster #gives cluster vector showing the custer where each record falls


# To produce data file with ID and Cluster number

clust_number<-result$cluster

data_file<-data.frame(cluster_data_ID.class,clust_number)


data_file


Related Solutions

USE R, SHOW R CODE AND OUTPUT APPLIED STATISTICS 2 Are Angry People More Likely to...
USE R, SHOW R CODE AND OUTPUT APPLIED STATISTICS 2 Are Angry People More Likely to have Heart Disease’? People who get angry easily tend to be more likely to have heart disease. That is the conclusion of a study that followed a random sample of 12,986 people from three locations over about four years. All subjects were free of heart disease at the beginning of the study. The subjects took the Spielberger Trait Anger Scale, which measures how prone...
APPLIED STATISTICS 2 USE R CODE! SHOW R CODE Use data file RecordMath2526.txt, to produce a...
APPLIED STATISTICS 2 USE R CODE! SHOW R CODE Use data file RecordMath2526.txt, to produce a plot graph with Exam1 as x, Exam2 as y, use Gender as color, and Hw1 as pch. RecordMath2526 information Index Gender Hw1 Hw2 Hw3 Exam1 Hw4 Exam2 Hw5 Hw6 Hw7 Final 1 F 9 6 8 60 7 82 10 10 9 69 2 M 10 10 10 94 9 98 10 10 8 91 3 M 9 10 8 79 9 55 10...
APPLIED STATISTICS 2 PLEASE USE R, SHOW R CODE AND OUTPUT, with conclusion Let x<-c(1,2,3,4,5,6,7,8), y<-c(4,6,3,7,8,3,9,...
APPLIED STATISTICS 2 PLEASE USE R, SHOW R CODE AND OUTPUT, with conclusion Let x<-c(1,2,3,4,5,6,7,8), y<-c(4,6,3,7,8,3,9, 6.5). By vector operation (other method will get 0 point), find a). the equation of regression line, that is, find a, b. b). Find SSR, SSE c). Find F-value d). Find p-value e). Make your decision, that is, answer the question, can we say y and x have linear relationship at alpha=0.05?.
Please use R or Rstudio for this exercise and show everything, including the R output. Pay...
Please use R or Rstudio for this exercise and show everything, including the R output. Pay attention in everything in Bold, please. " The quality of Pinot Noir wine is thought to be related to the properties of clarity, aroma, body, flavor, and oakiness. Data for 38 wines are given in stat5_prob1. (a) Fit a multiple linear regression model relating wine quality to these regressors. (b) Construct the ANOVA table. (c) Test for the significance of the regression in a...
1. Use R Studio: Include R Code A survey is taken of 250 students, and a...
1. Use R Studio: Include R Code A survey is taken of 250 students, and a phat of 0.48 is found. The same survey is repeated with 1000 students, and the same phat value is found. Compare the two 95% confidence intervals. What is the relationship between them? Is the margin of error for the second one four times smaller? If not, how much smaller is it?
Use R to complete the following questions. You should include your R code, output and plots...
Use R to complete the following questions. You should include your R code, output and plots in your answer. 1. Two methods of generating a standard normal random variable are: a. Take the sum of 5 uniform (0,1) random numbers and scale to have mean 0 and standard deviation 1. (Use the properties of the uniform distribution to determine the required transformation). b. Generate a standard uniform and then apply inverse cdf function to obtain a normal random variate (Hint:...
Use R to complete the following questions. You should include your R code, output and plots...
Use R to complete the following questions. You should include your R code, output and plots in your answer. 1. Two methods of generating a standard normal random variable are: a. Take the sum of 5 uniform (0,1) random numbers and scale to have mean 0 and standard deviation 1. (Use the properties of the uniform distribution to determine the required transformation). b. Generate a standard uniform and then apply inverse cdf function to obtain a normal random variate (Hint:...
pl use R code to do that and show me the program Use a linear regression...
pl use R code to do that and show me the program Use a linear regression of Y~log(X) using the labtestdata.csv data to predict Y (2dp) when x = 250 on the unlogged scale? Calculate the F-value (1 dp) from an ANOVA on the regression of Y~log(X) using the data contained in labtestdata.csv abtestdata.csv y x 1.018746 1 1.508895 2 0.727282 3 1.787127 4 2.903983 5 3.181554 6 1.737834 7 2.715766 8 1.570552 9 3.046107 10 4.499675 11 4.240688 12...
Suppose that the probability of obtaining a particular grade in an undergraduate statistics course, is defined...
Suppose that the probability of obtaining a particular grade in an undergraduate statistics course, is defined by the following table: grade A B C D F probability .25 .35 .2 .15 .05 (a) Using the usual numerical values for the grades, define the corresponding random variable, X, and its probability mass function, p(x). (b) Calculate P(X ≤ 2), P(X < 2), and P(X ≥ 3). (c) Plot the cumulative distribution function F(x). (d) Compute the mean µ = E(X).
The semester average grade for a statistics course is 76 with a standard deviation of 5.5....
The semester average grade for a statistics course is 76 with a standard deviation of 5.5. Assume that stats grades have a bell-shaped distribution and use the empirical rule to answer the following questions (explain your responses with the help of a graph): 1. What is the probability of a student’s stat grade being greater than 87? 2. What percentage of students has stat grades between 70.5 and 81.5? 3. What percentage of students has stat grades between 70.5 and...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT