In: Statistics and Probability
R Programming Exercise Book
Problem 31
(a) "airquality.csv" is a data set which consists of ozone,
solar radiation, wind and temperature measurements taken in New
York city from May to September of 1973. Use the command read.csv
to read the data set. Now write a code which will take 7
random temperature values from each month and then calculate the
mean and the standard deviation for the 7 samples. Display
the mean as a variables which includes the name of the month., i.e.
Mean_Temp_May. Similarly, do the same for standard deviation, i.e
Standard_Deviation_Temp_May.
(b) Use aggregate command to find the mean and standard deviation
of all the temperature values from each month.
Assume the temperature data is in column 4 and the month is represented in column 5 as numbers. i.e may is 5.
a)
R code which will take 7 random temperature values from each month and then calculate the mean and the standard deviation for the 7 samples. Displays the mean as a variables which includes the name of the month., i.e. Mean_Temp_May. Similarly, it will be for standard deviation, i.e Standard_Deviation_Temp_May.
library(datasets)
library(dplyr)
aqData<- read.csv(file.choose(),header = T)
table(aqData$Month)
# For month May (5)
m5 <- filter(aqData,Month==5)
s5 <- sample_n(m5,7)
Mean_Temp_May<-mean(s5$Temp)
Standard_Deviation_Temp_May<-sd(s5$Temp)
# For month June (6)
m6 <- filter(aqData,Month==6)
s6 <- sample_n(m6,7)
Mean_Temp_June<-mean(s6$Temp)
Standard_Deviation_Temp_June<-sd(s6$Temp)
# For month July (7)
m7 <- filter(aqData,Month==7)
s7 <- sample_n(m7,7)
Mean_Temp_July<-mean(s7$Temp)
Standard_Deviation_Temp_July<-sd(s7$Temp)
# For month August (8)
m8 <- filter(aqData,Month==8)
s8 <- sample_n(m8,7)
Mean_Temp_August<-mean(s8$Temp)
Standard_Deviation_Temp_August<-sd(s8$Temp)
# For month September (9)
m9 <- filter(aqData,Month==9)
s9 <- sample_n(m9,7)
Mean_Temp_September<-mean(s9$Temp)
Standard_Deviation_Temp_September<-sd(s9$Temp)
b)
Aggregate command ( from dplyr package) to find the mean and standard deviation of all the temperature values from each month.
summarise(group_by(aqData, Month), mean(Temp, na.rm = TRUE),sd(Temp, na.rm = TRUE))