In: Statistics and Probability
This question requires using Rstudio. This is following commands to install and import data into R:
> install.packages("ISLR")
> library(ISLR)
> data(Wage)
The required data installed and imported, now this is description of the data:
This dataset contains economic and demographic data for 3000
individuals living in the mid-Atlantic region. For each of
the
3000 individuals, the following 11 variables are recorded:
year: Year that wage information was recorded
age: Age of worker
maritl: A factor with levels 1. Never Married 2. Married 3. Widowed
4. Divorced and 5.
Separated indicating marital status
race: A factor with levels 1. White 2. Black 3. Asian and 4. Other
indicating race
education: A factor with levels 1. < HS Grad 2. HS Grad 3. Some
College 4. College Grad
and 5. Advanced Degree indicating education level
region: Region of the country (mid-atlantic only)
jobclass: A factor with levels 1. Industrial and 2. Information
indicating type of job
health: A factor with levels 1. <=Good and 2. >=Very Good
indicating health level of worker
health ins: A factor with levels 1. Yes and 2. No indicating
whether worker has health insurance
logwage: Log of workers wage
wage: Workers raw wage
This question continues with the Wage dataset.
(a) Create a binary variable, wage150, that contains a 1 if wage
contains a value above
150, and a 0 if wage contains a value below 150.
(b) Fit a logistic regression model on the training data with the
response variable being
wage150 and predictor variables being year, age, and education.
Please provide all necessary codes using Rstudio.