5. a. Analyze the Bread variable in the SandwichAnts dataset using aov() in R and interpret...

5.
a. Analyze the Bread variable in the SandwichAnts dataset using aov() in R and
interpret your results.
The data may be found here:
install.packages("Lock5Data")
library(Lock5Data)
data(SandwichAnts,package="Lock5Data")
attach(SandwichAnts)
b. State the linear model for this problem. Define all notation and model terms.
c. Create the design matrix for this problem.
d. Estimate model parameters for this problem using ? = (?^T?)^-1?^T?
e. Interpret the meaning of the estimates from part d.
f. Rerun this problem using lm()in R. Interpret the coefficients in the output.
g. Rewrite the model in as a linear regression using dummy variables. Confirm the
results from part f. agree with the results from part g.
h. Perform a one-way ANOVA of Bread using a randomization test on the
SandwichAnts dataset.

Expert Solution

(a)

You did not mention the dependent variable. I suppose it is the Ants variable in SandwichAnts data If it is not then you can ask me in comments.

[ I am giving the R codes for the first 4 Questions.]

library(Lock5Data)
data(SandwichAnts,package="Lock5Data")
attach(SandwichAnts)
summary(SandwichAnts)

# (b)
# Here Ants is discrete variable order iscontinuous variable and the others are catagorical variable.
# Let us build the linear model for this
# y=b1+x1+x2+e where y:- Ants, b1:- constant term, X1:-Effect of Bread, X2:-Effect of Filling, e:- Error(follows independently Normal with mean zero and unknownvarince)

#(c)
# the design matrix is X = [1,x11,x12,x13,x14,x21,x22,x23]
# where 1 is the vector of ones X1 is divided into 4 catagories and X2 is divided into 3 catagories
# X11,x12,...,X14 are 4 dummy variables denoting four catagories of X1 and X21,x22,x23 are the three dummy variables denoting three catagories of x2

x11=as.numeric(Bread=="Multigrain") ; x12=as.numeric(Bread=="Rye")
x13=as.numeric(Bread=="White") ; x14=as.numeric(Bread=="Wholemeal")

x21=as.numeric(Filling=="Ham & Pickles") ; x22=as.numeric(Filling=="Peanut Butter") ;x23=as.numeric(Filling=="Vegemite")

X=cbind(rep(1,24),x11,x12,x13,x14,x21,x22,x23)

#(d)
library(matlib)
b=Ginv(t(X)%*%X)%*%(t(X)%*%Ants) # estimated parameters b=((X'x)^-1)X'y

[ If you have any confusion regarding anything let me know]

milcah answered 11 months ago

ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing...

ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing the level in feet of Lake Huron from 1872- 1972. To assign the values into an ordinary vector,x, we can do the following 'x <- as.vector(LakeHuron)'. From there, we can access the data easily. Assume the values in X are a random sample from a normal population with distribution X. Also assume the X has an unknown mean and unknown standard deviation. With this...

Fitting a linear model using R a. Read the Toluca.txt dataset into R (this dataset can...

Fitting a linear model using R a. Read the Toluca.txt dataset into R (this dataset can be found on Canvas). Now fit a simple linear regression model with X = lotSize and Y = workHrs. Summarize the output from the model: the least square estimators, their standard errors, and corresponding p-values. b. Draw the scatterplot of Y versus X and add the least squares line to the scatterplot. c. Obtain the fitted values ˆyi and residuals ei . Print the...

The ICU-Admissions dataset includes a variable indicating the age of the patient. Find and interpret a...

The ICU-Admissions dataset includes a variable indicating the age of the patient. Find and interpret a 90% confidence interval for mean age of ICU patients using the facts that, in the sample, the mean is 55.75 years and the standard error for such means is SE = 1.64. The sample size of 100 is large enough to use a normal distribution

In R, analyze and interpret the effect of explanatory variables on the milk intake (dl.milk) in...

In R, analyze and interpret the effect of explanatory variables on the milk intake (dl.milk) in the kfm data set (ISwR) using a multiple regression model Test by using ALPHA = 0.05. 1) Run regression for( dl.milk )on all other variables. Do you find any significance that milk intake can be explained by other variables? 2) find regression models in which fewer explanation variables should be used. i.e., select a subset of variables so that a better fit can be...

What are the different commands in R software to analyze the Mayo clinic's "pbc" dataset?

[USING R & dataset “Boston”] Using the leave-one-out cross-validation and 5-fold cross-validation techniques to compare the...

[USING R & dataset “Boston”] Using the leave-one-out cross-validation and 5-fold cross-validation techniques to compare the performance of models in (a) and (b) with: (a) SalesPredict <- lm(Sales ~ Price + Urban + US, data = Carseats) (b) SalesRevise <- lm(Sales ~ Price + US, data = Carseats) Hint: Functions update (with option subset) and predict.

Analyze used car inventory dataset using Python's pandas library - using DataFrame data structure¶ Dataset: UsedCarInventory_Assignment1.txt...

Analyze used car inventory dataset using Python's pandas library - using DataFrame data structure¶ Dataset: UsedCarInventory_Assignment1.txt (available on Canvas) This dataset shows used cars available for sale at a dealership. Each row represents a car record and columns tell information about each car. The first row in the dataset contains column headers. You must use Pandas to complete all 10 tasks.

Using dataset "PlantGrowth" in R (r code) Construct a 95% confidence interval for the true mean...

Using dataset "PlantGrowth" in R (r code) Construct a 95% confidence interval for the true mean weight. Interpret the confidence interval in in the context of the problem.

Using models, analyze and interpret the data in the following table on smoking habits of students...

Using models, analyze and interpret the data in the following table on smoking habits of students in Arizona high schools (Consider a proper model and conduct proper tests for corresponding parameters and goodness-of-ﬁt). Student Smokes Student Does Not Smoke Both parents smoke 400 1380 One parent smokes 416 1823 Neither parent smokes 188 1168 Answer in SAS code would be okay!

Interpret the tables below: R, R square interpret the regression coefficients, either b or beta. ...

Interpret the tables below: R, R square interpret the regression coefficients, either b or beta. Model Summaryb Model R R Square Adjusted R Square Std. Error of the Estimate Durbin-Watson 1 .625a .390 .390 17.5048 1.978 a. Predictors: (Constant), HIGHEST YEAR OF SCHOOL COMPLETED, FAMILY INCOME IN CONSTANT DOLLARS b. Dependent Variable: R's socioeconomic index (2010) Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. Collinearity Statistics B Std. Error Beta Tolerance VIF 1 (Constant) -9.124 1.774 -5.142 .000 FAMILY...

Question