In: Statistics and Probability
BrainWeight | BodyWeight |
3.385 | 44.5 |
0.48 | 15.5 |
1.35 | 8.1 |
465 | 423 |
36.33 | 119.5 |
27.66 | 115 |
14.83 | 98.2 |
1.04 | 5.5 |
4.19 | 58 |
0.425 | 6.4 |
0.101 | 4 |
0.92 | 5.7 |
1 | 6.6 |
0.005 | 0.14 |
0.06 | 1 |
3.5 | 10.8 |
2 | 12.3 |
1.7 | 6.3 |
2547 | 4603 |
0.023 | 0.3 |
187.1 | 419 |
521 | 655 |
0.785 | 3.5 |
10 | 115 |
3.3 | 25.6 |
0.2 | 5 |
1.41 | 17.5 |
529 | 680 |
207 | 406 |
85 | 325 |
0.75 | 12.3 |
62 | 1320 |
6654 | 5712 |
3.5 | 3.9 |
6.8 | 179 |
35 | 56 |
4.05 | 17 |
0.12 | 1 |
0.023 | 0.4 |
0.01 | 0.25 |
1.4 | 12.5 |
250 | 490 |
2.5 | 12.1 |
55.5 | 175 |
100 | 157 |
52.16 | 440 |
10.55 | 179.5 |
0.55 | 2.4 |
60 | 81 |
3.6 | 21 |
4.288 | 39.2 |
0.28 | 1.9 |
0.075 | 1.2 |
0.122 | 3 |
0.048 | 0.33 |
192 | 180 |
3 | 25 |
160 | 169 |
0.9 | 2.6 |
1.62 | 11.4 |
0.104 | 2.5 |
4.235 | 50.4 |
a. Input the data to R and draw a scatter plot, and you can see
that the current scale is not the best for display. You can apply a
log-transformation on both variables. This can be done by using the
log() function, you can put the old data.frame in the parenthesis,
and assign the output a name so that you will have a new data.frame
of the transformed data, something like below
> new.data <- log(old.data)
Draw a scatter plot of the new data, does it look much better?
c. Fit a linear model on the original data. Draw plot the
residual against the predictor using something similar to
> plot(old.data$BodyWeight, lm.fit$res)
What do you think about the assumption that the error term does not
depend on x ?
d. Fit a linear model on the log-transformed data. Draw a plot the residual against the predictor. What do you see now?
Can you please show all work?
The R codes are presented here
setwd("C:/Users/hp/Desktop")
olddata<-read.csv("olddata.csv",header=TRUE)
names(olddata)
#---------- olddata-----------
lm.fit1<-lm(olddata$BrainWeight~olddata$BodyWeight)
summary(lm.fit1)
plot(olddata$BrainWeight,olddata$BodyWeight,col="red")
abline(lm.fit1,col="blue")
plot(olddata$BodyWeight, lm.fit1$res)
#---------- newdata-----------
newdata<-log(olddata)
lm.fit2<-lm(newdata$BrainWeight~newdata$BodyWeight)
summary(lm.fit2)
plot(newdata$BrainWeight,newdata$BodyWeight,col="red")
abline(lm.fit2,col="blue")
plot(newdata$BodyWeight, lm.fit2$res)
summary(lm.fit1)
Call:
lm(formula = olddata$BrainWeight ~ olddata$BodyWeight)
Residuals:
Min 1Q Median 3Q Max
-1552.25 -8.00 47.36 55.10 1553.42
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -56.85555 42.97805 -1.323 0.191
olddata$BodyWeight 0.90291 0.04453 20.278 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 323.5 on 60 degrees of freedom
Multiple R-squared: 0.8727, Adjusted R-squared:
0.8705
F-statistic: 411.2 on 1 and 60 DF, p-value: < 2.2e-16
Scatterplot of olddata
Residual plot for olddata
in both the cases data seems not to be normal
Scatterplot of Newdata
Residual plot for newdata
the data seems to be normal now as residuals are randomly distributed in the about residual plot