In: Statistics and Probability
The Carseat is a data set containing sales of child car seats at 400 different stores. Your task is to develop a regression model to predict sales and use this data to estimate your model.
Description of variables:
Sales: Unit sales (in thousands) at each location
CompPrice: Price charged by competitor at each location
Income: Community income level (in thousands of dollars)
Advertising: Local advertising budget for company at each location
(in thousands of dollars)
Population: Population size in region (in thousands)
Price: Price company charges for car seats at each site
ShelveLoc: A factor with levels Bad, Good and Medium indicating the
quality of the shelving location for the car seats at each
site
Age: Average age of the local population
Education: Education level at each location
Urban: A factor with levels No and Yes to indicate whether the
store is in an urban or rural location
US: A factor with levels No and Yes to indicate whether the store
is in the US or not
a. Which variables are categorical variable? For each categorical variable, define the corresponding dummy variables. [1pt]
b. Develop a correlation matrix between all variables. Copy your
correlation matrix here. [1pt]
c. Develop a regression model to predict sales and use all the data
to estimate your model. Write the estimated equation here. What
factors are significant predictors of sales? [1pt]
d. Use the first 300 rows of the data as training set to estimate
your model and use the remaining 100 rows as test set to evaluate
the out-of-sample prediction performance of your model. Use rMSE as
the performance measure. What is the rMSE for your in-sample and
out-of-sample prediction? [1pt]