Question

In: Statistics and Probability

Please use R to solve part e and f The data file data2.txt gives a data...

Please use R to solve part e and f

The data file data2.txt gives a data set with two variables x and y. The first column in the data set is just row numbers not useful for this question.

(e) Use the Shapiro-Wilks test to test for Normality of the data. State your null and alternative hypotheses, p-value and conclusion. Use α = 0.05

(f) Apply the transformation y 0 = log(y) and run the regression on y 0 on x. Now repeat parts (c), (d) and (e) with the residuals from this transformed model.

Data file:

Row x y
1 60.26 63.95
2 64.64 67.42
3 69.17 73.36
4 61.49 65.30
5 65.10 68.74
6 61.34 65.22
7 84.12 86.78
8 73.58 75.01
9 69.51 71.46
10 51.94 56.18
11 54.39 58.07
12 69.25 72.53
13 76.64 78.53
14 73.16 75.50
15 67.99 70.37
16 42.23 47.08
17 62.95 67.80
18 70.12 74.66
19 63.96 66.32
20 60.32 64.22
21 60.33 64.56
22 55.28 58.95
23 51.48 58.13
24 76.90 84.55
25 69.79 71.75
26 79.31 80.64
27 68.12 71.40
28 65.70 68.22
29 50.85 54.99
30 59.47 63.75

Solutions

Expert Solution

I am sharing R markdown output file. R commands are in blue , output in bold black and comments/conclusions in italic bold brown.

library(xlsx)
importing dataframe
xy_data = read.xlsx("C:\\Users\\ADMIN\\Desktop\\xydata.xlsx",1)

Accessing x and y variables from the data

x = xy_data$x
y = xy_data$y


e) Checking normality of both the variables.
level of significance (alpha) = 0.05.

Hypothesis –

H0: Data comes from normally distributed population. VS

H1: Data is not from normally distributed population.

For variable x -

shapiro.test(x)

## Shapiro-Wilk normality test
##
## data: x
## W = 0.98819, p-value = 0.9786

p - value = 0.9786
p - value > alpha i.e 0.9786>0.05
Decision - Accept H0 ( Null hypothesis is rejected if p-value < alpha)
Conclusion - Variable x is from normally distributed population.

For variable y -
H0: Data comes from normally distributed population.  
VS

H1: Data is not from normally distributed population.


shapiro.test(y)

##
## Shapiro-Wilk normality test
##
## data: y
## W = 0.98766, p-value = 0.9735

p - value = 0.9735
p - value > alpha i.e 0.9735>0.05
Decision - Accept H0
conclusion - Variable y is from normally distributed population.

Hence , data comes from normally distributed population.

f) Transformation on y - y0 = log(y)

Using log to the base e function to transform y

y0 = log(y)

Fitting regression line on y0 on x taking y0 as response and x as predictor.

fit = lm(y0~x)
summary(fit)

##
## Call:
## lm(formula = y0 ~ x)
##
## Residuals:
##       Min        1Q    Median        3Q       Max
## -0.046893 -0.012009 -0.004173 0.010305 0.051214
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) 3.3050759 0.0265154 124.65   <2e-16 ***
## x           0.0140579 0.0004061   34.62   <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.02037 on 28 degrees of freedom
## Multiple R-squared: 0.9772, Adjusted R-squared: 0.9764
## F-statistic: 1198 on 1 and 28 DF, p-value: < 2.2e-16

Values of Rsquare are close to 1. Hence model is strong.

Accessing residuals from the fitted model

residuals = fit$residuals

Checking normaltiy of the residuals

H0: Data comes from normally distributed population.
VS

H1: Data is not from normally distributed population.

shapiro.test(residuals)

##
## Shapiro-Wilk normality test
##
## data: residuals
## W = 0.98349, p-value = 0.9088

p - value = 0.9088
p - value > alpha i.e 0.9088>0.05
Decision - Accept H0
Conclusion - residuals obtained from the regression model are from normally distributed population.


Related Solutions

APPLIED STATISTICS 2 USE R CODE! SHOW R CODE Use data file RecordMath2526.txt, to produce a...
APPLIED STATISTICS 2 USE R CODE! SHOW R CODE Use data file RecordMath2526.txt, to produce a plot graph with Exam1 as x, Exam2 as y, use Gender as color, and Hw1 as pch. RecordMath2526 information Index Gender Hw1 Hw2 Hw3 Exam1 Hw4 Exam2 Hw5 Hw6 Hw7 Final 1 F 9 6 8 60 7 82 10 10 9 69 2 M 10 10 10 94 9 98 10 10 8 91 3 M 9 10 8 79 9 55 10...
Please submit SQL statements as a plain text file (.txt). If blackboard rejects txt file you...
Please submit SQL statements as a plain text file (.txt). If blackboard rejects txt file you can submit a zipped file containing the text file. Word, PDF, or Image format are not accepted. You do not need to show screen shot. Make sure you have tested your SQL statements in Oracle 11g. Problem 1. Please create the following tables for a tool rental database with appropriate primary keys & foreign keys. [30 points] Assumptions: Each tool belongs to a category....
Please submit SQL statements as a plain text file (.txt). If blackboard rejects txt file you...
Please submit SQL statements as a plain text file (.txt). If blackboard rejects txt file you can submit a zipped file containing the text file. Word, PDF, or Image format are not accepted. You do not need to show screen shot. Make sure you have tested your SQL statements in Oracle 11g. The list of tables is: Tables: Cust Table: cid, -- customer id cname, --- customer name cphone, --- customer phone cemail, --- customer email Category table: ctid, ---...
I would need both SAS & R code, the output and .txt file/s, please. Because many...
I would need both SAS & R code, the output and .txt file/s, please. Because many HMOs either do not cover mental health costs or provide only minimal coverage, min- isters and priests often need to provide counseling to persons suffering from mental illness. An in- terdenominational organization wanted to determine whether the clerics from different religions have different levels of awareness with respect to the causes of mental illness. Fifteen clerics from different Christian denominations were sampled. Each was...
Use the data below to solve. The data gives the square footage and sales prices for...
Use the data below to solve. The data gives the square footage and sales prices for several houses in Bellevue, Washington. Use this information to solve the following: PLEASE SHOW YOUR WORK : ) 1) You plan to build a 500 square foot addition to your home. How much do you think your home value will increase as a result? 2) What percentage of the variation in home value is explained by the variation in the house size? 3) A...
Solve in excel please, thank you It gives the data on the lifetime in hours of...
Solve in excel please, thank you It gives the data on the lifetime in hours of a sample of 100 lightbulbs. The company manufacturing these bulbs wants to know whether it can claim that its lightbulbs typically last more than 1000 burning hours. So it did a study. a. Identify the null and the alternate hypotheses for this study. b. Can this lightbulb manufacturer claim at a significance level of 5% that its lightbulbs typically last more than 1000 hours?...
2.6 Collins temperature data (Data file: ftcollinstemp) The data file gives the mean temperature in the...
2.6 Collins temperature data (Data file: ftcollinstemp) The data file gives the mean temperature in the fall of each year, defined as Sep- tember 1 to November 30, and the mean temperature in the following winter, defined as December 1 to the end of February in the following calendar year, in degrees Fahrenheit, for Ft. Collins, CO (Colorado Climate Center, 2012). These data cover the time period from 1900 to 2010. The question of interest is: Does the average fall...
You must download the file “Assn3_Qu#2_W19” to use the required data. It gives the number of...
You must download the file “Assn3_Qu#2_W19” to use the required data. It gives the number of city-bus users (Ridership) on a public transportation system of a large city in 3 given working days chosen at random in units of hundreds. It gives this data separately for the 4 busy bus routes and for 5 time slots. Here, TSlot1: from start of day to 9:30 am, TSlot2: 9:30 – 12:30, TSlot3: 12:30 – 15:30, TSlot4: 15:30 – 18:30 and Time-Slot5: 18:30...
You must download the file “Assn3_Qu#2_W19” to use the required data. It gives the number of...
You must download the file “Assn3_Qu#2_W19” to use the required data. It gives the number of city-bus users (Ridership) on a public transportation system of a large city in 3 given working days chosen at random in units of hundreds. It gives this data separately for the 4 busy bus routes and for 5 time slots. Here, TSlot1: from start of day to 9:30 am, TSlot2: 9:30 – 12:30, TSlot3: 12:30 – 15:30, TSlot4: 15:30 – 18:30 and Time-Slot5: 18:30...
Can you explain and answer part e and part f please? I already understand parts c...
Can you explain and answer part e and part f please? I already understand parts c and d Firm 1 and Firm 2 are functioning in a market as competitors. The inverse market demand for chicken is given by P (Y ) = 100 − 2Y , and the total cost function for any firm in the industry if given by TC(y) = 4y. (c) Suppose that two Cournot firms operated in the market and the reaction firm for Firm...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT