In: Statistics and Probability
USE R CODE AND SHOW OUTPUT
APPLIED STATISTICS 2
Now, suppose we use a new grade policy. We just to separate all students into four parts, with the first parts assigning grade A, second parts assigning grade B, then, C, then D (no F). We use the data RecordMath2526.txt to have a try for our new grading policy.
ID<-sample(10000:99999, 20)
USE R CODE AND SHOW OUTPUT
APPLIED STATISTICS 2
RecordMath2526.txt
Index Gender Hw1 Hw2 Hw3 Exam1 Hw4 Exam2 Hw5 Hw6 Hw7 Final
1 F 9 6 8 60 7 82 10 10 9 69
2 M 10 10 10 94 9 98 10 10 8 91
3 M 9 10 8 79 9 55 10 6 8 43
4 F 10 9 9 91 8 88 10 9 8 84
5 F 9 8 9 71 9 97 10 9 9 89
6 F 9 9 7 64 9 87 10 9 9 58
7 M 9 9 9 55 7 59 10 6 0 68
8 M 9 10 7 71 10 70 10 8 10 59
9 M 8 10 9 81 10 100 10 10 9 98
10 F 9 9 7 76 6 58 10 5 8 50
11 F 10 6 7 69 5 55 10 4 8 47
12 F 9 5 4 46 7 72 10 6 7 78
13 M 9 9 10 71 9 85 10 8 7 67
14 M 8 9 8 60 10 71 10 8 10 75
15 M 10 9 10 71 10 93 10 9 9 67
16 F 9 10 8 70 7 80 10 10 8 83
17 F 9 10 9 72 7 89 10 9 9 78
18 M 10 10 10 80 10 94 10 9 10 71
19 F 10 10 9 66 9 78 10 6 8 83
20 F 8 9 7 78 6 81 9 7 8 84
USE R CODE AND SHOW OUTPUT
APPLIED STATISTICS 2
R code
##### Cluster Analysis####
cluster_data <- read.csv(file.choose(),header=T) # read saved
CSV file containing data
attach(cluster_data)
## First part #########
ID<-sample(10000:99999,20)
cluster_data_ID<-data.frame(cluster_data,ID)
head(cluster_data_ID) # To see top rows
## Extracting Exam1, Exam2 and Final column
cluster_data_ID.new<- cluster_data_ID[,c(6,8,12)]
cluster_data_ID.class<- cluster_data_ID[,"ID"]
head(cluster_data_ID.new)
## To nomalize the data
normalize <- function(x){
return ((x-min(x))/(max(x)-min(x)))
}
cluster_data_ID.new$Exam1<-
normalize(cluster_data_ID.new$Exam1)
cluster_data_ID.new$Exam2<-
normalize(cluster_data_ID.new$Exam2)
cluster_data_ID.new$Final<-
normalize(cluster_data_ID.new$Final)
head(cluster_data_ID.new)
### K means clustering
result<- kmeans(cluster_data_ID.new,4) #aplly k-means algorithm
with no. of centroids(k)=4
result$size # gives no. of records in each cluster
result$centers # gives value of cluster center datapoint value(4
centers for k=4)
result$cluster #gives cluster vector showing the custer where each
record falls
# To produce data file with ID and Cluster number
clust_number<-result$cluster
data_file<-data.frame(cluster_data_ID.class,clust_number)
data_file