Question

In: Statistics and Probability

This question requires using Rstudio. This is following commands to install and import data into R:...

This question requires using Rstudio. This is following commands to install and import data into R:

> install.packages("ISLR")
> library(ISLR)
> data(Wage)

The required data installed and imported, now this is description of the data:

This dataset contains economic and demographic data for 3000 individuals living in the mid-Atlantic region. For each of the
3000 individuals, the following 11 variables are recorded:

year: Year that wage information was recorded
age: Age of worker
maritl: A factor with levels 1. Never Married 2. Married 3. Widowed 4. Divorced and 5.
Separated indicating marital status
race: A factor with levels 1. White 2. Black 3. Asian and 4. Other indicating race
education: A factor with levels 1. < HS Grad 2. HS Grad 3. Some College 4. College Grad
and 5. Advanced Degree indicating education level
region: Region of the country (mid-atlantic only)
jobclass: A factor with levels 1. Industrial and 2. Information indicating type of job
health: A factor with levels 1. <=Good and 2. >=Very Good indicating health level of worker
health ins: A factor with levels 1. Yes and 2. No indicating whether worker has health insurance
logwage: Log of workers wage
wage: Workers raw wage

This question continues with the Wage dataset.
(a) Fit a multiple regression model to predict wage using year, age, and jobclass
(b) What is the predicted wage for a 45 year old working in the industrial sector in the year
2009? What are the associated 95% condence and prediction intervals?
(c) Create a binary variable, wage150, that contains a 1 if wage contains a value above
150, and a 0 if wage contains a value below 150.

Please provide all necessary codes using Rstudio. No need to provide any screenshots.

Solutions

Expert Solution

(a)

Ran the below command in R studio to run the linear regression

model = lm(wage ~ year + age + jobclass, data = Wage)

Summary of the model is,

> summary(model)

Call:
lm(formula = wage ~ year + age + jobclass, data = Wage)

Residuals:
Min 1Q Median 3Q Max
-103.646 -24.525 -6.118 16.406 200.662

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.400e+03 7.252e+02 -3.309 0.000946 ***
year 1.235e+00 3.616e-01 3.415 0.000646 ***
age 6.362e-01 6.373e-02 9.982 < 2e-16 ***
jobclass2. Information 1.597e+01 1.471e+00 10.859 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 40.09 on 2996 degrees of freedom
Multiple R-squared: 0.07794,   Adjusted R-squared: 0.07702
F-statistic: 84.41 on 3 and 2996 DF, p-value: < 2.2e-16

(b)

The predicted wage for a 45 year old working in the industrial sector in the year 2009 are found by the below commands.

> newdata = data.frame(year = 2009, age = 45, jobclass = "1. Industrial")
> predict.lm(model, newdata, interval = c("confidence"))
fit lwr upr
1 109.5603 106.517 112.6036
> predict.lm(model, newdata, interval = c("prediction"))
fit lwr upr
1 109.5603 30.89567 188.225

The predicted wage for a 45 year old working in the industrial sector in the year 2009 is 109.5603

95% confidence interval is (106.517, 112.6036)

95% prediction intervals is (30.89567, 188.225)

(c)

The binary variable, wage150, that contains a 1 if wage contains a value above 150, and a 0 if wage contains a value below 150 can be created as below.

wage150 = Wage$wage > 150


Related Solutions

This question requires using Rstudio. This is following commands to install and import data into R:...
This question requires using Rstudio. This is following commands to install and import data into R: > install.packages("ISLR") > library(ISLR) > data(Wage) The required data installed and imported, now this is description of the data: This dataset contains economic and demographic data for 3000 individuals living in the mid-Atlantic region. For each of the 3000 individuals, the following 11 variables are recorded: year: Year that wage information was recorded age: Age of worker maritl: A factor with levels 1. Never...
This question requires using Rstudio. This is following commands to install and import data into R:...
This question requires using Rstudio. This is following commands to install and import data into R: > install.packages("ISLR") > library(ISLR) > data(Wage) The required data installed and imported, now this is description of the data: This dataset contains economic and demographic data for 3000 individuals living in the mid-Atlantic region. For each of the 3000 individuals, the following 11 variables are recorded: year: Year that wage information was recorded age: Age of worker maritl: A factor with levels 1. Never...
Assignment 1 - IN PDF FORMAT Using R and Rstudio Pick a database from: data() Then...
Assignment 1 - IN PDF FORMAT Using R and Rstudio Pick a database from: data() Then preview the first 10 rows. Print the number of rows and columns - Print the names of the variables If you have row names, print them - work with the values for a field in your dataset. You can do it by dataset[[xx]] operator with xx can be the index of the field or the nae of the field. Now use dataset[xx] to get...
Please use RStudio to answer the question and give the R command: please load data use...
Please use RStudio to answer the question and give the R command: please load data use data: library(MASS) data(cats) Use the “cats” data set to test for the variance of the body weight in male and female cats
USING R STUDIO- Write the r commands for the following. 1. Non-Linear Models 1.1 Load the...
USING R STUDIO- Write the r commands for the following. 1. Non-Linear Models 1.1 Load the {ISLR} and {GGally} libraries. Load and attach the College{ISLR} data set. [For you only]: Open the College data set and its help file and familiarize yourself with the data set and its fields. 1.2 Inspect the data with the ggpairs(){GGally} function, but do not run the ggpairs plots on all variables because it will take a very long time. Only include these variables in...
(Please answer this question accuratelly THANKS) The following commands in R computes 5000 simulations of sample...
(Please answer this question accuratelly THANKS) The following commands in R computes 5000 simulations of sample means of size 12 from a normal distribution with mean µ = 100 and standard deviation σ = 14. require (fastR2) nsamplesum <- do(5000) * c(sample.mean=mean(rnorm(12,100,14))) The following commands compute the approximate mean and standard deviation of the sample mean and plot the histogram giving the approximate distribution of the sample mean. mean(∼ sample.mean, data=nsamplesum) sd(∼ sample.mean, data=nsamplesum) gf dhistogram(∼ sample.mean, data= nsamplesum, bins=20)...
Complete the R code using Rstudio so that it calculates and returns the estimates of β,...
Complete the R code using Rstudio so that it calculates and returns the estimates of β, the intercept and regression weight of the logistic regression of approximate GPA on Rouder-Srinivasan preference. ## Data Preference <- c( 0, 0, 0, 0, 0, 1, 1, 1, 1) # 0: Rouder; 1: Srinivasan GPA <- c(2.0, 2.5, 3.0, 3.5, 4.0, 2.5, 3.0, 3.5, 4.0) Count <- c( 4, 5, 21, 22, 8, 2, 1, 4, 7) # Define the deviance function deviance <-...
To import the Carseats dataset into Rstudio: library("ISLR") data(Carseats) view(Carseats) Then, provide necessary codes for the...
To import the Carseats dataset into Rstudio: library("ISLR") data(Carseats) view(Carseats) Then, provide necessary codes for the following: a. Split the data into a training set and a test set. b. Fit a linear model using least squares on the training set to predict Sales using the entire collection of predictors. Report Cp , BIC, R2 , and RSS for this model c. Use the fitted model to predict responses for the test data and report the test error (RSS) obtained....
To import the Auto dataset into Rstudio: library("ISLR") data(Auto) view(Auto) Then, provide necessary codes for the...
To import the Auto dataset into Rstudio: library("ISLR") data(Auto) view(Auto) Then, provide necessary codes for the following: a. Use the vehicle name to name the rows and then remove the variable name from the data set since it is not of use for modelling. b. Split the data into a training set and a test set. c. Fit a regression tree to the the training set. Report the training error obtained. d. Plot the tree. e. How many terminal nodes...
When using the import wizard in MATLAB to import data fro, a .csv file the data...
When using the import wizard in MATLAB to import data fro, a .csv file the data appears in MATLAB in the following format "35:53.2" how do I convert this into more usable matlab values? I think that the duration function was used to generate the time format. The code will need to be written in MATLAB software I will leave feedback if you are able to provide a correct response. Thank you
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT