Question

In: Statistics and Probability

Regression Analysis/Need R code/Step By Step Explanation/No Hand Written Regression: A) Provide a 95% Confidence interval...

Regression Analysis/Need R code/Step By Step Explanation/No Hand Written Regression:

A) Provide a 95% Confidence interval for each of the estimated parameters B ) Use Hypothesis Testing to test the significance of Regression and Residual Analysis C) Perform Lack of fit test Apply corresponding transformation to correct model inadequacies if any. D) Perform Multicollinearity and Validate your model Accordingly. We seek a model of the form # # B = A0 * X0 + A1 * X1 + A2 * X2 + A3 * X3 + A4 * X4

Dataset is in format that can be used and run in R Studio directly:

 a0, 1; a1, the petrol tax; a2, the per capita income;a3, the number of miles of paved highway;a4, the proportion of drivers; b,  the consumption of petrol.

b <- c(541, 524,561, 414,410,457,344, 467, 464, 498, 580, 471, 525, 508, 566,635,603,714,865,640,649
,540, 464, 547, 460, 566, 577, 631, 574, 534, 571, 554, 577, 628, 487, 644, 640, 704, 648, 968
, 587, 699, 632, 591, 782, 510,610, 524)

a4 <- c(0.525, 0.572, 0.580, 0.529, 0.544, 0.571, 0.451, 0.553, 0.529, 0.552, 0.530, 0.525,
0.574,0.545,0.608,0.586,0.572,0.540,0.724,0.677,0.663,0.602,0.511,0.517,0.551,0.544, 0.548,
0.579, 0.563, 0.493, 0.518, 0.513, 0.578, 0.547, 0.487, 0.629, 0.566, 0.586, 0.663, 0.672
,0.626,0.563,0.603,0.508,0.672,0.571,0.623, 0.593)

a3<- c(1976,1250, 1586, 2351, 431, 1333, 11868, 2138, 8577, 8507, 5939, 14186, 6930, 6580,8159,
10340, 8508, 4725, 5915, 6010,7834,602, 2449, 4686,2619, 4746, 5399,9061,5975, 4650, 6905,
6594,6524,4121,3495, 7834, 17782, 6385, 3274, 3905,4639, 3985, 3635, 2611, 2302, 3942, 4083
, 9794)

a2 <- c(3571, 4092, 3865, 4870, 4399, 5342, 5319, 5126, 4447, 4512, 4391, 5126, 4817, 4207, 4332, 4318,
4206, 3718, 4716, 4341, 4593, 4983, 4897, 4258, 4574, 3721, 3448, 3846, 4188, 3601, 3640, 3333,
3063,3357,3528,3802,4045, 3897, 3635, 4345, 4449, 3656, 4300, 3745, 5215, 4476, 4296 ,5002)

a1 <- c(9.00, 9.00, 9.00, 7.50, 8.00, 10.00, 8.00, 8.00, 8.00, 7.00, 8.00, 7.50, 7.00, 7.00, 7.00, 7.00,
7.00, 7.00, 7.00, 8.50, 7.00, 8.00, 9.00, 9.00, 8.50, 9.00, 8.00, 7.50, 8.00, 9.00, 7.00,
7.00, 8.00, 7.50, 8.00, 6.58, 5.00, 7.00, 8.50, 7.00, 7.00, 7.00, 7.00, 7.00, 6.00, 9.00, 7.00
, 7.00)

Solutions

Expert Solution

Note : I assume X0 =1 .

b <- c(541, 524,561, 414,410,457,344, 467, 464, 498, 580, 471, 525, 508, 566,635,603,714,865,640,649,540, 464, 547, 460, 566, 577, 631, 574, 534, 571, 554, 577, 628, 487, 644, 640, 704, 648, 968, 587, 699, 632, 591, 782, 510,610, 524)
a4 <- c(0.525, 0.572, 0.580, 0.529, 0.544, 0.571, 0.451, 0.553, 0.529, 0.552, 0.530, 0.525,0.574,0.545,0.608,0.586,0.572,0.540,0.724,0.677,0.663,0.602,0.511,0.517,0.551,0.544, 0.548,0.579, 0.563, 0.493, 0.518, 0.513, 0.578, 0.547, 0.487, 0.629, 0.566, 0.586, 0.663, 0.672,0.626,0.563,0.603,0.508,0.672,0.571,0.623, 0.593)
a3<- c(1976,1250, 1586, 2351, 431, 1333, 11868, 2138, 8577, 8507, 5939, 14186, 6930, 6580,8159,10340, 8508, 4725, 5915, 6010,7834,602, 2449, 4686,2619, 4746, 5399,9061,5975, 4650, 6905,6594,6524,4121,3495, 7834, 17782, 6385, 3274, 3905,4639, 3985, 3635, 2611, 2302, 3942, 4083, 9794)
a2 <- c(3571, 4092, 3865, 4870, 4399, 5342, 5319, 5126, 4447, 4512, 4391, 5126, 4817, 4207, 4332, 4318,4206, 3718, 4716, 4341, 4593, 4983, 4897, 4258, 4574, 3721, 3448, 3846, 4188, 3601, 3640, 3333,3063,3357,3528,3802,4045, 3897, 3635, 4345, 4449, 3656, 4300, 3745, 5215, 4476, 4296 ,5002)
a1 <- c(9.00, 9.00, 9.00, 7.50, 8.00, 10.00, 8.00, 8.00, 8.00, 7.00, 8.00, 7.50, 7.00, 7.00, 7.00, 7.00,7.00, 7.00, 7.00, 8.50, 7.00, 8.00, 9.00, 9.00, 8.50, 9.00, 8.00, 7.50, 8.00, 9.00, 7.00,7.00, 8.00, 7.50, 8.00, 6.58, 5.00, 7.00, 8.50, 7.00, 7.00, 7.00, 7.00, 7.00, 6.00, 9.00, 7.00, 7.00)

m0 is the regression model.

 m0 <- lm(b~a1+a2+a3+a4) 

a. 95%Confidence Intervals

confint(m0)

# for accessing individual confidence intervals ,eg. a2/personal income tax

 confint(m0)[2,]

b. Significance testing

summary(m0) #you will get these results. Estimate Std. Error t value Pr(>|t|) (Intercept) 3.773e+02 1.855e+02 2.033 0.048207 * a1 -3.479e+01 1.297e+01 -2.682 0.010332 * a2 -6.659e-02 1.722e-02 -3.867 0.000368 *** a3 -2.426e-03 3.389e-03 -0.716 0.477999 a4 1.336e+03 1.923e+02 6.950 1.52e-08 *** Residual standard error: 66.31 on 43 degrees of freedom Multiple R-squared: 0.6787,    Adjusted R-squared: 0.6488 F-statistic: 22.71 on 4 and 43 DF, p-value: 3.907e-10

Since p-value is 3.907*10-10 ,which is less than 0.05 , we can say that this regression is useful. in explaining variation in"b" / consumption of petrol.

Residual Analysis.

par(mfrow = c(2,2)) par(mar=c(1,1,1,1)) plot(m0)

On Analyzing the Residuals v/s Fitted plot you will see that most of the points (except points : 18 & 40) lie around the centre line.

On Analyzing the Residual Q-Q plot you will see that , the residual are following normal distribution (except for point 40).

On Analyzing the Residual v/s leverage plot you will see that only pint 40 is the outlier.

Lack of fit test.

#(n-p)*σ-hat2/σ2 #σ-hat = residual standard error =66.31 on 43 degrees of freedom test.stat<-(48 - 4)*66.31 1-pchisq(test.stat,44) 

You will get the answer as 0, which means lack of fit.

Multicollinearity test

 car::vif(m0)

a1 a2 a3 a4
1.625676 ,1.043274 ,1.496937, 1.216355

Since all the VIF's are ess than 5 , we can safely sa that Multicollinearity doesnot exist


Related Solutions

Find the 95% confidence interval of the mean of a vector in r code. The vector...
Find the 95% confidence interval of the mean of a vector in r code. The vector length is 100.
Using dataset "PlantGrowth" in R (r code) Construct a 95% confidence interval for the true mean...
Using dataset "PlantGrowth" in R (r code) Construct a 95% confidence interval for the true mean weight. Interpret the confidence interval in in the context of the problem.
Need R codes to compile and plots , no hand written. Step-1: Type the data in...
Need R codes to compile and plots , no hand written. Step-1: Type the data in R Step-2: Perform Least-Squares regression Step-3: Make a normal Probability Plot using rstudent residuals Step-4: Plotting residuals versus predicted response yhat Step-5: Plotting Residuals versus each regressor. Step-6: Partial regression plots of residuals vs. regressors Step-7: Partial regression plots of residuals vs. regressors data:  (p.555 y, x1 and x5). y <- c(271.8, 264,238.8,230.7,251.6,257.9,263.9,266.5,229.1,239.3,258, 257.6,267.3,267,259.6,240.4,227.2,196,278.7,272.3,267.4,254.5,224.7, 181.5,227.5,253.6,263,265.8,263.8) x1 <- c(783.35, 748.45,684.45,827.8,860.45,875.15,909.45,905.55,756,769.35,793.5,801.65,819.65,808.55,774.95,711.85,694.85,638.1,774.55,757.9,753.35,704.7, 666.8,568.55,653.1,704.05,709.6,726.9,697.15) x5 <- c(13.2, 14.11,15.68,10.53,11,11.31,11.96,12.58,10.66,10.85,11.41,11.91,12.85,13.58,14.21,15.56,15.83,16.41,13.1,13.63,14.51,15.38, 16.1,16.73,10.58,11.28,11.91,12.65,14.06)
4.4 Given a 95% confidence interval for β4 is (−0.12, 0.05) in a multiple regression analysis,...
4.4 Given a 95% confidence interval for β4 is (−0.12, 0.05) in a multiple regression analysis, which of the following is true? (A) There is enough evidence to conclude x4 is useful in predicting y. (B) The value of b4 equals −0.035. (C) There is enough evidence to conclude that x4 and y are positively associated. (D) There is enough evidence to conclude that x4 and y are negatively associated. (E) None of the x variables are useful in predicting...
step by step please. It is reported that a 95% confidence interval for the population average...
step by step please. It is reported that a 95% confidence interval for the population average of a variable X normally distributed is [37.1;46.9]. Consider that the population standard deviation is 12.5 and that the interval was obtained considering a population of infinite size. If P (Z> 1.96) = 0.025, what is the sample size used in this study?
For the data set below, calculate r, r 2, and a 95% confidence interval in r...
For the data set below, calculate r, r 2, and a 95% confidence interval in r units. Then write a one- to two-sentence conclusion statement that includes whether the null hypothesis was rejected or not. Assume a two-tailed hypothesis and α = .05. Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 X 1.05 1.15 1.30 2.00 1.75 1.00 Y 2 2 3 4 5 2
Construct a 95% confidence interval by hand for what proportion of students are vegan, if you...
Construct a 95% confidence interval by hand for what proportion of students are vegan, if you took a random sample of 100 students and 3 were vegan.
How large a sample should be selected to provide a 95% confidence interval with a margin...
How large a sample should be selected to provide a 95% confidence interval with a margin of error of 6? Assume that the population standard deviation is 40 . Round your answer to next whole number.
Give and interpret the 95% confidence intervals for males and a second 95% confidence interval for...
Give and interpret the 95% confidence intervals for males and a second 95% confidence interval for females on the SLEEP variable. Which is wider and why? Known values for Male and Female: Males: Sample Size = 17; Sample Mean = 7.765; Standard Deviation = 1.855 Females: Sample Size = 18; Sample Mean = 7.667; Standard Deviation = 1.879 Using t-distribution considering sample sizes (Male/Female count) are less than 30
using SAS or R, provide and interpret the 95% confidence interval for the mean and variance from your data. Comment on interpretation and validity.
Data Set: n=20 1.32, 1.01, 0, 2.21, 1.69, 1.73, 2.01, 0, 0.73, 0.91, 0, 3.03, 2.22, 1.23, 3.71, 0, 0.45, 2.18, 3.12, 1.91 - One sample parametric estimates - using SAS or R, provide and interpret the 95% confidence interval for the mean and variance from your data. Comment on interpretation and validity.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT