BrainWeight BodyWeight 3.385 44.5 0.48 15.5 1.35 8.1 465 423 36.33 119.5 27.66 115 14.83 98.2...

BrainWeight	BodyWeight
3.385	44.5
0.48	15.5
1.35	8.1
465	423
36.33	119.5
27.66	115
14.83	98.2
1.04	5.5
4.19	58
0.425	6.4
0.101	4
0.92	5.7
1	6.6
0.005	0.14
0.06	1
3.5	10.8
2	12.3
1.7	6.3
2547	4603
0.023	0.3
187.1	419
521	655
0.785	3.5
10	115
3.3	25.6
0.2	5
1.41	17.5
529	680
207	406
85	325
0.75	12.3
62	1320
6654	5712
3.5	3.9
6.8	179
35	56
4.05	17
0.12	1
0.023	0.4
0.01	0.25
1.4	12.5
250	490
2.5	12.1
55.5	175
100	157
52.16	440
10.55	179.5
0.55	2.4
60	81
3.6	21
4.288	39.2
0.28	1.9
0.075	1.2
0.122	3
0.048	0.33
192	180
3	25
160	169
0.9	2.6
1.62	11.4
0.104	2.5
4.235	50.4

a. Input the data to R and draw a scatter plot, and you can see that the current scale is not the best for display. You can apply a log-transformation on both variables. This can be done by using the log() function, you can put the old data.frame in the parenthesis, and assign the output a name so that you will have a new data.frame of the transformed data, something like below
> new.data <- log(old.data)
Draw a scatter plot of the new data, does it look much better?

c. Fit a linear model on the original data. Draw plot the residual against the predictor using something similar to
> plot(old.data$BodyWeight, lm.fit$res)
What do you think about the assumption that the error term does not depend on x ?

d. Fit a linear model on the log-transformed data. Draw a plot the residual against the predictor. What do you see now?

Can you please show all work?

Solutions

Expert Solution

The R codes are presented here

setwd("C:/Users/hp/Desktop")
olddata<-read.csv("olddata.csv",header=TRUE)
names(olddata)
#---------- olddata-----------
lm.fit1<-lm(olddata$BrainWeight~olddata$BodyWeight)
summary(lm.fit1)
plot(olddata$BrainWeight,olddata$BodyWeight,col="red")
abline(lm.fit1,col="blue")
plot(olddata$BodyWeight, lm.fit1$res)
#---------- newdata-----------
newdata<-log(olddata)
lm.fit2<-lm(newdata$BrainWeight~newdata$BodyWeight)
summary(lm.fit2)
plot(newdata$BrainWeight,newdata$BodyWeight,col="red")
abline(lm.fit2,col="blue")
plot(newdata$BodyWeight, lm.fit2$res)

summary(lm.fit1)

Call:
lm(formula = olddata$BrainWeight ~ olddata$BodyWeight)

Residuals:
Min 1Q Median 3Q Max
-1552.25 -8.00 47.36 55.10 1553.42

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -56.85555 42.97805 -1.323 0.191
olddata$BodyWeight 0.90291 0.04453 20.278 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 323.5 on 60 degrees of freedom
Multiple R-squared: 0.8727, Adjusted R-squared: 0.8705
F-statistic: 411.2 on 1 and 60 DF, p-value: < 2.2e-16

Scatterplot of olddata