In: Math
Using R Studio.
The dataset weightloss.txt presents data for the weight loss of a compound for different amounts of time the compound was exposed to the air. Additional information was also available on the humidity of the environment during exposure. The relative humidity has been coded as A= 20%, B= 30% and C=40% humidity and the dummy variables x2 and x3 have been formed to code humidity accordingly.
a)Determine and overall simple LSRL model for predicting weight based off of time and humidity (variables x2 and x3). Is this model a “good” model? Explain.
b)Create a scatterplot (change colors) to illustrate that there appears to exist an interaction effect between time and level of humidity.
c)Create a regression model that includes the appropriate interaction terms. Does this model appear to be a “better” model than part (a)? Explain.
d)Using the regression coefficients found in part (c), find the “best” fitting linear model for each of the three humidity levels
e).Plot each of the three lines found in part (d) onto the original scatterplot along with the line from part (a).
weightloss.txt
> weight<-c(7.3,6.5,5.1,4,4,5.2,6.6,6.6,2,4,5.7,6.5)
> time<-c(4,5,6,7,4,5,6,7,4,5,6,7)
> x2<-c(1,1,1,1,0,0,0,0,0,0,0,0)
> x3<-c(0,0,0,0,1,1,1,1,0,0,0,0)
> humidity<-c('A','A','A','A','B','B','B','B','C','C','C','C')
Please see the complete r code below
weight<-c(7.3,6.5,5.1,4,4,5.2,6.6,6.6,2,4,5.7,6.5)
time<-c(4,5,6,7,4,5,6,7,4,5,6,7)
x2<-c(1,1,1,1,0,0,0,0,0,0,0,0)
x3<-c(0,0,0,0,1,1,1,1,0,0,0,0)
humidity<-c('A','A','A','A','B','B','B','B','C','C','C','C')
df <- data.frame(weight,time,x2,x3,humidity)
### fit the model
fit <- lm(weight ~ time + x2 +x3)
summary(fit)
library(ggplot2)
# Basic scatter plot
ggplot(df, aes(x=weight, y=time)) + geom_point()
# Change the point size, and shape
ggplot(df, aes(x=weight, y=time,fill=humidity)) +
geom_point(size=2, shape=23)
## regression seperate
ggplot(df, aes(x=weight, y=time, shape=humidity)) +
geom_point() +
geom_smooth(method=lm, aes(fill=humidity))
library(dplyr)
library(broom)
dfh =df %>% group_by(humidity) %>% do(fithumidity=lm(weight ~
time,data=.))
dfcoeff= tidy(dfh,fithumidity)
dfcoeff
dfstats = glance(dfh,fithumidity)
dfstats
The results are
summary(fit)
Call:
lm(formula = weight ~ time + x2 + x3)
Residuals:
Min 1Q Median 3Q Max
-2.38000 -0.86875 0.08167 0.94708 2.23000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.1483 2.3946 0.897 0.396
time 0.4367 0.4107 1.063 0.319
x2 1.1750 1.1246 1.045 0.327
x3 1.0500 1.1246 0.934 0.378
Residual standard error: 1.59 on 8 degrees of freedom
Multiple R-squared: 0.2343, Adjusted R-squared: -0.05286 ## value of r2 is 0.2343 , hence the model is not good
F-statistic: 0.8159 on 3 and 8 DF, p-value: 0.5203
dfcoeff
# A tibble: 6 x 6
# Groups: humidity [3]
humidity term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl>
<dbl>
1 A (Intercept) 11.9 0.445 26.8 0.00139
2 A time -1.13 0.0794 -14.2 0.00490
3 B (Intercept) 0.54 1.24 0.436 0.706
4 B time 0.920 0.221 4.16 0.0531
5 C (Intercept) -3.81 1.09 -3.49 0.0731
6 C time 1.52 0.194 7.82 0.0160
dfstats
# A tibble: 3 x 12
# Groups: humidity [3]
humidity r.squared adj.r.squared sigma statistic p.value df logLik
AIC BIC deviance
<fct> <dbl> <dbl> <dbl> <dbl>
<dbl> <int> <dbl> <dbl> <dbl>
<dbl>
1 A 0.990 0.985 0.177 203. 0.00490 2 2.63 0.748 -1.09 0.063
## best model as r square is highest here
2 B 0.897 0.845 0.494 17.3 0.0531 2 -1.47 8.94 7.10 0.488
## lowest r square
3 C 0.968 0.952 0.435 61.1 0.0160 2 -0.957 7.91 6.07 0.378