Question

In: Statistics and Probability

WILL RATE HIGH! question 2.4.1, problem 6 The data set airquality is one of R’s included...

WILL RATE HIGH! question 2.4.1, problem 6 The data set airquality is one of R’s included data sets. It shows daily measurements of ozone concentration (Ozone), solar radiation (Solar.R), wind speed (Wind), and temperature (Temp) for 5 summer months in 1977 in New York City. Some of the observations are missing and are recorded as NA, meaning not available. View an overall summary of the variables in airquality with the command > summary(airquality) Ignore the summaries for Month and Day since those variables should be factors, not numeric variables, and their summaries are meaningless. Attach airquality to your workspace Go to TOC CHAPTER 2. DESCRIPTIVE AND GRAPHICAL STATISTICS 33 > attach(airquality) and make boxplots of Ozone, Solar.R, Wind, and Temp. Comment on any noteworthy features

Solutions

Expert Solution

All the sentences beginning with # are comments

a)

# R-code

# View some records of the dataset airquality
head(airquality)

#--output--

We can see that the required fields are present and some observations have NA

b) View an overall summary of the variables in airquality

#R-Code

#View an overall summary of the variables in airquality
summary(airquality)

#--output

6 point summary of all the variables is in the putput. Ozone has 37 observations with missing values and Solar.R has 7.

We can look at if the mean is less than or greater than the median and comment on the skewness of each of the variable. We will do that in the box plot.

c) Attach airquality to your workspace & make boxplots

#R code

#Attach airquality to your workspace
attach(airquality)

#command to plot all the plots in one window -- comment this if you want to have separate plots
par(mfrow=c(2,2))

#make boxplots of Ozone, Solar.R, Wind, and Temp
boxplot(Ozone,main="Ozone Concentration",xlab="ozone concentration",horizontal=TRUE)
boxplot(Solar.R,main="Solar Radiation",xlab="Solar Radiation",horizontal=TRUE)
boxplot(Wind,main="Wind Speed",xlab="wind speed",horizontal=TRUE)
boxplot(Temp,main="Temperature",xlab="Temperature",horizontal=TRUE)

#--output

Boxplots

  • Presence of outliers. Outliers are the observations which are outside the fence. Let Q1 be the 1st quartile and Q3 is the 3rd quartile then Inter quartile raneg IQR = Q3-Q1 and the lower value of the fence is lci=Q1-1.5*IQR and the upper value of the fence is uci=Q3+1.5*IQR. Any observation which is less than lci or more than uci is considered an outlier.
    • Ozone concentration has 2 outliers and wind speed has 3 outliers indicated by the dots outside of the wiskers.
  • skewness
    • Ozone: The median and the box are located to the left and hence the data is right skewed. This is also indicated by median (31.50) being less than the mean (42.13)
    • Solar radiation: The data is left skewed (as the median is located towards to right). This is also indicated by median (205.0) being larger than the mean (185.9)
    • Wind speed looks almost symmetric. Also indicated by similar values for mean and median.
    • Temperature: Almost symmetric. Also indicated by similar values for mean and median.
  • Spread. It is not appropriate to compare the spreads as the variables have different units. But we can see that the solar radiation has the biggest spread among the variables (as indicated by the box width)

Related Solutions

The data set airquality is one of R’s included data sets. It shows daily measurements of...
The data set airquality is one of R’s included data sets. It shows daily measurements of ozone concentration (Ozone), solar radiation (Solar.R), wind speed (Wind), and temperature (Temp) for 5 summer months in 1977 in New York City. Some of the observations are missing and are recorded as NA, meaning not available. View an overall summary of the variables in airquality with the command > summary(airquality) Ignore the summaries for Month and Day since those variables should be factors, not...
The data set ”airquality” in the R datasets library has data on ozone concentration, wind speed,...
The data set ”airquality” in the R datasets library has data on ozone concentration, wind speed, temperature, and solar radiation by month and day for May through September in New York. Attach airquality to your workspace and then construct side-by-side boxplots of Wind by Month. Month is a numeric variable in the airquality data frame. You can treat it as a factor by using the ”as.factor” function, e.g., > plot(Wind ∼ as.factor(Month)) Next, do an analysis of variance to determine...
This is the complete set of data for this question Problem 1: The information is available...
This is the complete set of data for this question Problem 1: The information is available for the first year of operations for Medeiros, Inc. The following differences enter into the reconciliation of financial income and taxable income of Medeiros, Inc. for the year ended December 31, 2018, its first year of operations. The enacted income tax rate is 40% for all years a) The company has chosen to depreciate all of its fixed assets on an accelerated basis for...
This question involves coding in RStudio Problem 3 (Verzani problem 8.18): The data set normtemp (Data...
This question involves coding in RStudio Problem 3 (Verzani problem 8.18): The data set normtemp (Data set can be found in the UsingR package) contains measurements of 130 healthy, randomly selected individuals. The variable temperature contains normal body temperature. Does the data appear to come from a normal distribution? Is so, find a 90% confidence interval for the mean normal body temperature of the population that was sampled. Does it include 98.6 degrees Fahrenheit?
Consider one of the subset regression models for each data set obtained in Problem Set 4...
Consider one of the subset regression models for each data set obtained in Problem Set 4 and answer the following questions. (i) Draw the scatter plot matrix, residual vs. predictor variable plots and added variable plots. Comment on the regression model based on these plots. (ii) Draw the normal-probability plot and comment. (iii) Draw the correlogram and comment. (iv) Detect leverage points from the data. (v) Compute Cook’s distance statistics and detect all outlier points from the data. (vi) Compute...
Consider one of the subset regression models for each data set obtained in Problem Set 4...
Consider one of the subset regression models for each data set obtained in Problem Set 4 and answer the following questions. (i) Draw the scatter plot matrix, residual vs. predictor variable plots and added variable plots. Comment on the regression model based on these plots. (ii) Draw the normal-probability plot and comment. (iii) Draw the correlogram and comment. (iv) Detect leverage points from the data. (v) Compute Cook’s distance statistics and detect all outlier points from the data. (vi) Compute...
A paper included analysis of data from a national sample of 1,000 Americans. One question on...
A paper included analysis of data from a national sample of 1,000 Americans. One question on the survey is given below. "You owe $3,000 on your credit card. You pay a minimum payment of $30 each month. At an Annual Percentage Rate of 12% (or 1% per month), how many years would it take to eliminate your credit card debt if you made no additional charges?" Answer options for this question were: (a) less than 5 years; (b) between 5...
Note: If the alpha level is not included, set the alpha to .05. Problem 1: The...
Note: If the alpha level is not included, set the alpha to .05. Problem 1: The Graded Naming Test (GNT) asks respondents to name objects in a set of 30 black and white drawings in order to detect brain damage. The GNT population norm for adults in England is 20.4. Researchers wondered whether a sample for Canadian adults had different scores from adults in England (Roberts, 2003). If the scores were different, the English norms would not be valid for...
Problem 6. Show that a closed set is a Gδ and open set is Fδ.
Problem 6. Show that a closed set is a Gδ and open set is Fδ.
A sale was made for $22,000 and the sales tax rate is 6%. What is included...
A sale was made for $22,000 and the sales tax rate is 6%. What is included in the journal entry to record this sale? - credit to sales tax payable for 1320 -debit to cash for 22000 -debit to sales discount for 1320 -credit to sales revenue for 23200 If $12,000 is collected in advance on November 1st for 6 months' rent. What amount of rent revenue should be recognized by December 31? - none, it will be recognized at...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT