In: Statistics and Probability
R Programming Exercise Book
Problem 39 (Problem Difficulty: Easy Thinking Question)
Suppose you have an arbitrary data set "airquality.csv" which
consists of a data of temperature in different months throughout
the year. If you wanted to take 30 random temperature reading from
each month and then calculate its mean and standard deviation, then
how would you do it? Name the commands that you would use to solve
this problem and explain in few words your logic to approach this
problem.
Assumption: The temperature column is the fourth column in the data
set and the month column is the fifth column in the data set. Also
assume that you used read.csv("airquality.csv", nrows = 150,
stringsAsFactors = FALSE) to read the data.
R-program and codes:
library(dplyr)
#Read csv file by browsing folder
daq_all<-read.csv(file.choose(),header = T)
#To see table of months and their data counts
table(daq_all$Month)
# We use 'for-loop' to do the task
# store months in the variable m
m=c(5,6,7,8,9)
# for loop
for(i in m){
# filter data for each month
daq<-filter(daq_all,Month==i)
# select sample of 30
sn<-sample_n(daq,30)
# Calculate mean
mean<-mean(sn$Temp)
# Calculate standard deviation
sd<-sd(sn$Temp)
# round the results to one decimal
mean<-round(mean,1)
sd<-round(sd,1)
# Print the results
print(paste('Month=',i))
print(paste('Mean',mean,'SD',sd))
}
R-output: