Question

In: Math

Using R Studio. The dataset weightloss.txt presents data for the weight loss of a compound for...

Using R Studio.

The dataset weightloss.txt presents data for the weight loss of a compound for different amounts of time the compound was exposed to the air. Additional information was also available on the humidity of the environment during exposure. The relative humidity has been coded as A= 20%, B= 30% and C=40% humidity and the dummy variables x2 and x3 have been formed to code humidity accordingly.

a)Determine and overall simple LSRL model for predicting weight based off of time and humidity (variables x2 and x3). Is this model a “good” model? Explain.

b)Create a scatterplot (change colors) to illustrate that there appears to exist an interaction effect between time and level of humidity.

c)Create a regression model that includes the appropriate interaction terms. Does this model appear to be a “better” model than part (a)? Explain.

d)Using the regression coefficients found in part (c), find the “best” fitting linear model for each of the three humidity levels

e).Plot each of the three lines found in part (d) onto the original scatterplot along with the line from part (a).

weightloss.txt

> weight<-c(7.3,6.5,5.1,4,4,5.2,6.6,6.6,2,4,5.7,6.5)
> time<-c(4,5,6,7,4,5,6,7,4,5,6,7)
> x2<-c(1,1,1,1,0,0,0,0,0,0,0,0)
> x3<-c(0,0,0,0,1,1,1,1,0,0,0,0)

> humidity<-c('A','A','A','A','B','B','B','B','C','C','C','C')

Solutions

Expert Solution

Please see the complete r code below

weight<-c(7.3,6.5,5.1,4,4,5.2,6.6,6.6,2,4,5.7,6.5)
time<-c(4,5,6,7,4,5,6,7,4,5,6,7)
x2<-c(1,1,1,1,0,0,0,0,0,0,0,0)
x3<-c(0,0,0,0,1,1,1,1,0,0,0,0)

humidity<-c('A','A','A','A','B','B','B','B','C','C','C','C')

df <- data.frame(weight,time,x2,x3,humidity)

### fit the model


fit <- lm(weight ~ time + x2 +x3)

summary(fit)


library(ggplot2)
# Basic scatter plot
ggplot(df, aes(x=weight, y=time)) + geom_point()
# Change the point size, and shape
ggplot(df, aes(x=weight, y=time,fill=humidity)) +
geom_point(size=2, shape=23)

## regression seperate

ggplot(df, aes(x=weight, y=time, shape=humidity)) +
geom_point() +
geom_smooth(method=lm, aes(fill=humidity))


library(dplyr)
library(broom)
dfh =df %>% group_by(humidity) %>% do(fithumidity=lm(weight ~ time,data=.))

dfcoeff= tidy(dfh,fithumidity)
dfcoeff

dfstats = glance(dfh,fithumidity)
dfstats

The results are

summary(fit)

Call:

lm(formula = weight ~ time + x2 + x3)

Residuals:

Min 1Q Median 3Q Max

-2.38000 -0.86875 0.08167 0.94708 2.23000

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 2.1483 2.3946 0.897 0.396

time 0.4367 0.4107 1.063 0.319

x2 1.1750 1.1246 1.045 0.327

x3 1.0500 1.1246 0.934 0.378

Residual standard error: 1.59 on 8 degrees of freedom

Multiple R-squared: 0.2343, Adjusted R-squared: -0.05286 ## value of r2 is 0.2343 , hence the model is not good

F-statistic: 0.8159 on 3 and 8 DF, p-value: 0.5203

dfcoeff
# A tibble: 6 x 6
# Groups: humidity [3]
humidity term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 A (Intercept) 11.9 0.445 26.8 0.00139
2 A time -1.13 0.0794 -14.2 0.00490
3 B (Intercept) 0.54 1.24 0.436 0.706  
4 B time 0.920 0.221 4.16 0.0531
5 C (Intercept) -3.81 1.09 -3.49 0.0731
6 C time 1.52 0.194 7.82 0.0160

dfstats
# A tibble: 3 x 12
# Groups: humidity [3]
humidity r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
1 A 0.990 0.985 0.177 203. 0.00490 2 2.63 0.748 -1.09 0.063 ## best model as r square is highest here
2 B 0.897 0.845 0.494 17.3 0.0531 2 -1.47 8.94 7.10 0.488 ## lowest r square
3 C 0.968 0.952 0.435 61.1 0.0160 2 -0.957 7.91 6.07 0.378


Related Solutions

The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treatments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if there is a difference in mean weight gain between those receiving no treatment and those receiving...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In...
The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treat- ments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if there is a difference in mean weight gain between those receiving no treatment and those...
2. The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study....
2. The dataset ’anorexia’ in the MASS package in R-Studio contains data for an anorexia study. In the study, three treatments (Treat) were applied to groups of young female anorexia patients, and their weights before (Prewt) and after (Postwt) treatment were recorded. The three treatments adminstered were no treatment (Cont), Cognitive Behavioural treatment (CBT), and family treatment (FT). Determine at the 5% significance level if Cognitive Behavioral treatment is effective in helping patients gain weight. Perform all necessary steps for...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing the level in feet of Lake Huron from 1872- 1972. To assign the values into an ordinary vector,x, we can do the following 'x <- as.vector(LakeHuron)'. From there, we can access the data easily. Assume the values in X are a random sample from a normal population with distribution X. Also assume the X has an unknown mean and unknown standard deviation. With this...
In r studio, what is a method to find significant variables within an entire dataset?
In r studio, what is a method to find significant variables within an entire dataset?
Fitting a linear model using R a. Read the Toluca.txt dataset into R (this dataset can...
Fitting a linear model using R a. Read the Toluca.txt dataset into R (this dataset can be found on Canvas). Now fit a simple linear regression model with X = lotSize and Y = workHrs. Summarize the output from the model: the least square estimators, their standard errors, and corresponding p-values. b. Draw the scatterplot of Y versus X and add the least squares line to the scatterplot. c. Obtain the fitted values ˆyi and residuals ei . Print the...
Using R studio 1. Read the iris data set into a data frame. 2. Print the...
Using R studio 1. Read the iris data set into a data frame. 2. Print the first few lines of the iris dataset. 3. Output all the entries with Sepal Length > 5. 4. Plot a box plot of Petal Length with a color of your choice. 5. Plot a histogram of Sepal Width. 6. Plot a scatter plot showing the relationship between Petal Length and Petal Width. 7. Find the mean of Sepal Length by species. Hint: You could...
A chemist in interested in determining the weight loss of a particular compound as a function...
A chemist in interested in determining the weight loss of a particular compound as a function of the amount of time the compound is exposed to the air. The data below give the weight losses associated with settings of the exposure time.                   Weight Loss (in pounds)           Exposure Time (in hours)         ____________________________________________________                           4.0                                                  4                            6.0                                                  5                                 2.0                                             3                            .                                                    .                                  .                                                        .                            9.0                         ...
A chemist in interested in determining the weight loss of a particular compound as a function...
A chemist in interested in determining the weight loss of a particular compound as a function of the amount of time the compound is exposed to the air. The data below give the weight losses associated with settings of the exposure time.                   Weight Loss (in pounds)           Exposure Time (in hours)         ____________________________________________________                           4.0                                                  4                            6.0                                                  5                                 2.0                                             3                            .                                                    .                                  .                                                        .                            9.0                         ...
Please use R studio Dataset: IBM HR Analytics Employee Attrition & Performance dataset (you can download...
Please use R studio Dataset: IBM HR Analytics Employee Attrition & Performance dataset (you can download the dataset from kaggle) Name Description ATTRITION Employee leaving the company (0=no, 1=yes) BUSINESS TRAVEL (1=No Travel, 2=Travel Frequently, 3=Tavel Rarely) DEPARTMENT (1=HR, 2=R&D, 3=Sales) EDUCATION FIELD (1=HR, 2=LIFE SCIENCES, 3=MARKETING, 4=MEDICAL SCIENCES, 5=OTHERS, 6= TEHCNICAL) GENDER (1=FEMALE, 0=MALE) JOB ROLE (1=HC REP, 2=HR, 3=LAB TECHNICIAN, 4=MANAGER, 5= MANAGING DIRECTOR, 6= REASEARCH DIRECTOR, 7= RESEARCH SCIENTIST, 8=SALES EXECUTIEVE, 9= SALES REPRESENTATIVE) MARITAL STATUS (1=DIVORCED,...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT