Question

In: Statistics and Probability

R Programming Exercise Book Problem 31 (a) "airquality.csv" is a data set which consists of ozone,...

R Programming Exercise Book
Problem 31

(a) "airquality.csv" is a data set which consists of ozone, solar radiation, wind and temperature measurements taken in New York city from May to September of 1973. Use the command read.csv to read the data set. Now write a code which will take 7 random temperature values from each month and then calculate the mean and the standard deviation for the 7 samples. Display the mean as a variables which includes the name of the month., i.e. Mean_Temp_May. Similarly, do the same for standard deviation, i.e Standard_Deviation_Temp_May.

(b) Use aggregate command to find the mean and standard deviation of all the temperature values from each month.

Assume the temperature data is in column 4 and the month is represented in column 5 as numbers. i.e may is 5.

Solutions

Expert Solution

a)

R code which will take 7 random temperature values from each month and then calculate the mean and the standard deviation for the 7 samples. Displays the mean as a variables which includes the name of the month., i.e. Mean_Temp_May. Similarly, it will be for standard deviation, i.e Standard_Deviation_Temp_May.

library(datasets)
library(dplyr)

aqData<- read.csv(file.choose(),header = T)

table(aqData$Month)

# For month May (5)

m5 <- filter(aqData,Month==5)
s5 <- sample_n(m5,7)
Mean_Temp_May<-mean(s5$Temp)
Standard_Deviation_Temp_May<-sd(s5$Temp)


# For month June (6)

m6 <- filter(aqData,Month==6)
s6 <- sample_n(m6,7)
Mean_Temp_June<-mean(s6$Temp)
Standard_Deviation_Temp_June<-sd(s6$Temp)


# For month July (7)

m7 <- filter(aqData,Month==7)
s7 <- sample_n(m7,7)
Mean_Temp_July<-mean(s7$Temp)
Standard_Deviation_Temp_July<-sd(s7$Temp)


# For month August (8)

m8 <- filter(aqData,Month==8)
s8 <- sample_n(m8,7)
Mean_Temp_August<-mean(s8$Temp)
Standard_Deviation_Temp_August<-sd(s8$Temp)


# For month September (9)

m9 <- filter(aqData,Month==9)
s9 <- sample_n(m9,7)
Mean_Temp_September<-mean(s9$Temp)
Standard_Deviation_Temp_September<-sd(s9$Temp)

b)

Aggregate command ( from dplyr package) to find the mean and standard deviation of all the temperature values from each month.

summarise(group_by(aqData, Month), mean(Temp, na.rm = TRUE),sd(Temp, na.rm = TRUE))



Related Solutions

R Programming Exercise Book Problem 31 (a) "airquality.csv" is a data set which consists of ozone,...
R Programming Exercise Book Problem 31 (a) "airquality.csv" is a data set which consists of ozone, solar radiation, wind and temperature measurements taken in New York city from May to September of 1973. Use the command read.csv to read the data set. Now write a code which will take 7 random temperature values from each month and then calculate the mean and the standard deviation for the 7 samples. Display the mean as a variables which includes the name of...
R Programming Exercise Book Problem 39 (Problem Difficulty: Easy Thinking Question) Suppose you have an arbitrary...
R Programming Exercise Book Problem 39 (Problem Difficulty: Easy Thinking Question) Suppose you have an arbitrary data set "airquality.csv" which consists of a data of temperature in different months throughout the year. If you wanted to take 30 random temperature reading from each month and then calculate its mean and standard deviation, then how would you do it? Name the commands that you would use to solve this problem and explain in few words your logic to approach this problem....
The data set ”airquality” in the R datasets library has data on ozone concentration, wind speed,...
The data set ”airquality” in the R datasets library has data on ozone concentration, wind speed, temperature, and solar radiation by month and day for May through September in New York. Attach airquality to your workspace and then construct side-by-side boxplots of Wind by Month. Month is a numeric variable in the airquality data frame. You can treat it as a factor by using the ”as.factor” function, e.g., > plot(Wind ∼ as.factor(Month)) Next, do an analysis of variance to determine...
In the R programming language, we would like to use the data set called iris to...
In the R programming language, we would like to use the data set called iris to build a simple linear regression model to predict Sepal.Length based on Petal.Length. Calculate the least squares regression line to predict Sepal.Length based on Petal.Length. Interpret the slope of the line in the context of the problem. Remember that both variables are measured in centimeters. Plot the regression line in a scatterplot of Sepal.Length vs. Petal.Length. Test H1: ??1 ≠ 0 at ?? = 0.05...
Use R studio to do this problem. This problem uses the wblake data set in the...
Use R studio to do this problem. This problem uses the wblake data set in the alr4 package. This data set includes samples of small mouth bass collected in West Bearskin Lake, Minnesota, in 1991. Interest is in predicting length with age. Finish this problem without using Im() (a) Compute the regression of length on age, and report the estimates, their standard errors, the value of the coefficient of determination, and the estimate of variance. Write a sentence or two...
There are four numeric columns in R programming language's iris data set. Create a scatter plot...
There are four numeric columns in R programming language's iris data set. Create a scatter plot between the four numeric columns using R programming language and give answers to the following parts. Calculate the correlation between each pair of the four numeric columns in iris. Which pair of variables has the strongest linear relationship? Interpret their ??. Which pair of variables has the weakest linear relationship? Interpret their ??. Which pair(s) of variables can you conclude have a population correlation...
R Programming Exercise 3.4 From a normal distribution which has a standard deviation of 40 and...
R Programming Exercise 3.4 From a normal distribution which has a standard deviation of 40 and mean of 10, generate 2 to 600 samples. After generating the samples utilize the plot command to plot the mean of the generated sample (x-axis) against the number of samples (Y-axis). Use proper axis labels. Create a second plot of the density of the 600 samples that you generated. Use adequate comments to explain your reasoning. This code can be solved in 4 to...
R Programming: Load the {ISLR} and {GGally} libraries. Load and attach the College{ISLR} data set. 1.2...
R Programming: Load the {ISLR} and {GGally} libraries. Load and attach the College{ISLR} data set. 1.2 Inspect the data with the ggpairs(){GGally} function, but do not run the ggpairs plots on all variables because it will take a very long time. Only include these variables in your ggpairs plot: “Outstate”,“S.F.Ratio”,“Private”,“PhD”,“Grad.Rate”. 1.3 Briefly answer: if we are interested in predicting out of state tuition (Outstate), can you tell from the plots if any of the other variables have a curvilinear relationship...
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
This problem is going to use the data set in R called "ChickWeight" that has 4...
This problem is going to use the data set in R called "ChickWeight" that has 4 variables, as described below. ChickWeight: A data frame with 578 observations on 4 variables. 1) weight: a numeric vector giving the body weight of the chick (gm). 2) Time: a numeric vector giving the number of days since birth when the measurement was made. 3) Chick: an ordered factor with levels 18 < ... < 48 giving a unique identifier for the chick. The...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT