Question

In: Statistics and Probability

BrainWeight BodyWeight 3.385 44.5 0.48 15.5 1.35 8.1 465 423 36.33 119.5 27.66 115 14.83 98.2...

BrainWeight BodyWeight
3.385 44.5
0.48 15.5
1.35 8.1
465 423
36.33 119.5
27.66 115
14.83 98.2
1.04 5.5
4.19 58
0.425 6.4
0.101 4
0.92 5.7
1 6.6
0.005 0.14
0.06 1
3.5 10.8
2 12.3
1.7 6.3
2547 4603
0.023 0.3
187.1 419
521 655
0.785 3.5
10 115
3.3 25.6
0.2 5
1.41 17.5
529 680
207 406
85 325
0.75 12.3
62 1320
6654 5712
3.5 3.9
6.8 179
35 56
4.05 17
0.12 1
0.023 0.4
0.01 0.25
1.4 12.5
250 490
2.5 12.1
55.5 175
100 157
52.16 440
10.55 179.5
0.55 2.4
60 81
3.6 21
4.288 39.2
0.28 1.9
0.075 1.2
0.122 3
0.048 0.33
192 180
3 25
160 169
0.9 2.6
1.62 11.4
0.104 2.5
4.235 50.4

a. Input the data to R and draw a scatter plot, and you can see that the current scale is not the best for display. You can apply a log-transformation on both variables. This can be done by using the log() function, you can put the old data.frame in the parenthesis, and assign the output a name so that you will have a new data.frame of the transformed data, something like below
> new.data <- log(old.data)
Draw a scatter plot of the new data, does it look much better?

c. Fit a linear model on the original data. Draw plot the residual against the predictor using something similar to
> plot(old.data$BodyWeight, lm.fit$res)
What do you think about the assumption that the error term does not depend on x ?

d. Fit a linear model on the log-transformed data. Draw a plot the residual against the predictor. What do you see now?

Can you please show all work?

Solutions

Expert Solution

The R codes are presented here

setwd("C:/Users/hp/Desktop")
olddata<-read.csv("olddata.csv",header=TRUE)
names(olddata)
#---------- olddata-----------
lm.fit1<-lm(olddata$BrainWeight~olddata$BodyWeight)
summary(lm.fit1)
plot(olddata$BrainWeight,olddata$BodyWeight,col="red")
abline(lm.fit1,col="blue")
plot(olddata$BodyWeight, lm.fit1$res)
#---------- newdata-----------
newdata<-log(olddata)
lm.fit2<-lm(newdata$BrainWeight~newdata$BodyWeight)
summary(lm.fit2)
plot(newdata$BrainWeight,newdata$BodyWeight,col="red")
abline(lm.fit2,col="blue")
plot(newdata$BodyWeight, lm.fit2$res)

summary(lm.fit1)

Call:
lm(formula = olddata$BrainWeight ~ olddata$BodyWeight)

Residuals:
Min 1Q Median 3Q Max
-1552.25 -8.00 47.36 55.10 1553.42

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -56.85555 42.97805 -1.323 0.191
olddata$BodyWeight 0.90291 0.04453 20.278 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 323.5 on 60 degrees of freedom
Multiple R-squared: 0.8727,   Adjusted R-squared: 0.8705
F-statistic: 411.2 on 1 and 60 DF, p-value: < 2.2e-16

Scatterplot of olddata

Residual plot for olddata

in both the cases data seems not to be normal

Scatterplot of Newdata

Residual plot for newdata

the data seems to be normal now as residuals are randomly distributed in the about residual plot


Related Solutions

ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT