In: Statistics and Probability
Consider the following dataset:
x1 Yellow Yellow Green Green Red Red Green Red Yellow
x2 5 2 10 4 3 7 2 5 4
y 1 3 7 5 10 18 4 8 3
(a) (3pts) Split the data into a train and test set where the test set contains observations 1 and 7. Answer this by writing down the two datasets.
(b) (2pts) Use Rstudio to estimate a model (which uses both x1 and x2 as predictor variables) using the train set
(c) (2pts) According to this model, what is the impact of an observation being Red (as opposed to Yellow) on our prediction of the response variable?
(d) (2pts) Use Rstudio to estimate a model (which uses only x2 as the predictor variable) using the train set.
(e) (3pts) Compute the MAE on the test set for both estimated models. Note: You must show all calculations for this solution. That is, it should not be done using Rstudio aside from using Rstudio for arithmetic computations (like a calculator) and to obtain the estimated models from (b) and (d).