In: Statistics and Probability
The Carseat is a data set containing sales of child car seats at 400 different stores. Your task is to develop a regression model to predict sales and use this data to estimate your model.
Description of variables:
Sales: Unit sales (in thousands) at each location
CompPrice: Price charged by competitor at each location
Income: Community income level (in thousands of dollars)
Advertising: Local advertising budget for company at each location
(in thousands of dollars)
Population: Population size in region (in thousands)
Price: Price company charges for car seats at each site
ShelveLoc: A factor with levels Bad, Good and Medium indicating the
quality of the shelving location for the car seats at each
site
Age: Average age of the local population
Education: Education level at each location
Urban: A factor with levels No and Yes to indicate whether the
store is in an urban or rural location
US: A factor with levels No and Yes to indicate whether the store
is in the US or not
a. Which variables are categorical variable? For each
categorical variable, define the corresponding dummy variables.
[1pt]
Categorical variables
The categorical variables are ShelveLoc, Urban, and US
Corresponding dummy variables
A categorical variable with k different categories, we need to define k - 1 dummy variables
1)
ShelveLoc is a Categorical variable with 3 categories (i.e. Bad, Medium and Good)
Hence the number of dummy variables = 2. The dummy variables are,
X1 = 1, if Bad; X1 = 0, otherwise.
X2 = 1, if Medium; X2 = 0, otherwise.
2)
Urban is a Categorical variable with 2 categories (i.e. No and Yes)
Hence the number of dummy variables = 1. The dummy variable is,
Y = 1, if Yes; Y = 0, otherwise.
3)
US is a Categorical variable with 2 categories (i.e. No and Yes)
Hence the number of dummy variables = 1. The dummy variable is,
Z = 1, if Yes; Z = 0, otherwise.