Question

In: Statistics and Probability

This question requires using Rstudio. This is following commands to install and import data into R:...

This question requires using Rstudio. This is following commands to install and import data into R:

> install.packages("ISLR")
> library(ISLR)
> data(Wage)

The required data installed and imported, now this is description of the data:

This dataset contains economic and demographic data for 3000 individuals living in the mid-Atlantic region. For each of the
3000 individuals, the following 11 variables are recorded:

year: Year that wage information was recorded
age: Age of worker
maritl: A factor with levels 1. Never Married 2. Married 3. Widowed 4. Divorced and 5.
Separated indicating marital status
race: A factor with levels 1. White 2. Black 3. Asian and 4. Other indicating race
education: A factor with levels 1. < HS Grad 2. HS Grad 3. Some College 4. College Grad
and 5. Advanced Degree indicating education level
region: Region of the country (mid-atlantic only)
jobclass: A factor with levels 1. Industrial and 2. Information indicating type of job
health: A factor with levels 1. <=Good and 2. >=Very Good indicating health level of worker
health ins: A factor with levels 1. Yes and 2. No indicating whether worker has health insurance
logwage: Log of workers wage
wage: Workers raw wage

This question continues with the Wage dataset.

You wish to fit a multiple regression model to predict wage using year, age, and jobclass.

However, you are interested in whether the change in wage as a worker ages differs between

industrial workers and information workers. Fit the appropriate model and test the

hypothesis of interest. Include your results and your conclusion.

Please provide all necessary codes using Rstudio.

Solutions

Expert Solution

library(ISLR)

data(Wage)

head(Wage)

       year age     sex           maritl     race       education             region       jobclass         health health_ins

231655 2006 18 1. Male 1. Never Married 1. White    1. < HS Grad 2. Middle Atlantic 1. Industrial      1. <=Good      2. No

86582 2004 24 1. Male 1. Never Married 1. White 4. College Grad 2. Middle Atlantic 2. Information 2. >=Very Good      2. No

161300 2003 45 1. Male       2. Married 1. White 3. Some College 2. Middle Atlantic 1. Industrial      1. <=Good     1. Yes

155159 2003 43 1. Male       2. Married 3. Asian 4. College Grad 2. Middle Atlantic 2. Information 2. >=Very Good     1. Yes

11443 2005 50 1. Male      4. Divorced 1. White      2. HS Grad 2. Middle Atlantic 2. Information      1. <=Good     1. Yes

376662 2008 54 1. Male       2. Married 1. White 4. College Grad 2. Middle Atlantic 2. Information 2. >=Very Good     1. Yes

        logwage      wage

231655 4.318063 75.04315

86582 4.255273 70.47602

161300 4.875061 130.98218

155159 5.041393 154.68529

11443 4.318063 75.04315

376662 4.845098 127.11574

regmodel <- lm(wage ~ year+age+jobclass, data = Wage)

summary(regmodel)

 
Call:
lm(formula = wage ~ year + age + jobclass, data = Wage)
 
Residuals:
     Min       1Q   Median       3Q      Max 
-103.646  -24.525   -6.118   16.406  200.662 
 
Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)            -2.400e+03  7.252e+02  -3.309 0.000946 ***
year                    1.235e+00  3.616e-01   3.415 0.000646 ***
age                     6.362e-01  6.373e-02   9.982  < 2e-16 ***
jobclass2. Information  1.597e+01  1.471e+00  10.859  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
 
Residual standard error: 40.09 on 2996 degrees of freedom
Multiple R-squared:  0.07794, Adjusted R-squared:  0.07702 
F-statistic: 84.41 on 3 and 2996 DF,  p-value: < 2.2e-16

Regression Equation

Wage = -2399.90 + 1.23 * year + 0.64 * age + 15.97 * jobclass2. Information

Null and Alternate Hypothesis

H0: All the coefficients of the linear model are zero

Ha: Not all the coefficients of the linear model are zero

From the ANOVA table, since the p-value is less than 0.05, hence we reject the null hypothesis ie the model is significant ie not all coefficients are zero.

Also, the p-value for the independent variable is less than 0.05, hence the variable is significant ie there exists a relationship between the dependent and the independent variable.


Related Solutions

This question requires using Rstudio. This is following commands to install and import data into R:...
This question requires using Rstudio. This is following commands to install and import data into R: > install.packages("ISLR") > library(ISLR) > data(Wage) The required data installed and imported, now this is description of the data: This dataset contains economic and demographic data for 3000 individuals living in the mid-Atlantic region. For each of the 3000 individuals, the following 11 variables are recorded: year: Year that wage information was recorded age: Age of worker maritl: A factor with levels 1. Never...
This question requires using Rstudio. This is following commands to install and import data into R:...
This question requires using Rstudio. This is following commands to install and import data into R: > install.packages("ISLR") > library(ISLR) > data(Wage) The required data installed and imported, now this is description of the data: This dataset contains economic and demographic data for 3000 individuals living in the mid-Atlantic region. For each of the 3000 individuals, the following 11 variables are recorded: year: Year that wage information was recorded age: Age of worker maritl: A factor with levels 1. Never...
Assignment 1 - IN PDF FORMAT Using R and Rstudio Pick a database from: data() Then...
Assignment 1 - IN PDF FORMAT Using R and Rstudio Pick a database from: data() Then preview the first 10 rows. Print the number of rows and columns - Print the names of the variables If you have row names, print them - work with the values for a field in your dataset. You can do it by dataset[[xx]] operator with xx can be the index of the field or the nae of the field. Now use dataset[xx] to get...
Please use RStudio to answer the question and give the R command: please load data use...
Please use RStudio to answer the question and give the R command: please load data use data: library(MASS) data(cats) Use the “cats” data set to test for the variance of the body weight in male and female cats
USING R STUDIO- Write the r commands for the following. 1. Non-Linear Models 1.1 Load the...
USING R STUDIO- Write the r commands for the following. 1. Non-Linear Models 1.1 Load the {ISLR} and {GGally} libraries. Load and attach the College{ISLR} data set. [For you only]: Open the College data set and its help file and familiarize yourself with the data set and its fields. 1.2 Inspect the data with the ggpairs(){GGally} function, but do not run the ggpairs plots on all variables because it will take a very long time. Only include these variables in...
(Please answer this question accuratelly THANKS) The following commands in R computes 5000 simulations of sample...
(Please answer this question accuratelly THANKS) The following commands in R computes 5000 simulations of sample means of size 12 from a normal distribution with mean µ = 100 and standard deviation σ = 14. require (fastR2) nsamplesum <- do(5000) * c(sample.mean=mean(rnorm(12,100,14))) The following commands compute the approximate mean and standard deviation of the sample mean and plot the histogram giving the approximate distribution of the sample mean. mean(∼ sample.mean, data=nsamplesum) sd(∼ sample.mean, data=nsamplesum) gf dhistogram(∼ sample.mean, data= nsamplesum, bins=20)...
Complete the R code using Rstudio so that it calculates and returns the estimates of β,...
Complete the R code using Rstudio so that it calculates and returns the estimates of β, the intercept and regression weight of the logistic regression of approximate GPA on Rouder-Srinivasan preference. ## Data Preference <- c( 0, 0, 0, 0, 0, 1, 1, 1, 1) # 0: Rouder; 1: Srinivasan GPA <- c(2.0, 2.5, 3.0, 3.5, 4.0, 2.5, 3.0, 3.5, 4.0) Count <- c( 4, 5, 21, 22, 8, 2, 1, 4, 7) # Define the deviance function deviance <-...
To import the Carseats dataset into Rstudio: library("ISLR") data(Carseats) view(Carseats) Then, provide necessary codes for the...
To import the Carseats dataset into Rstudio: library("ISLR") data(Carseats) view(Carseats) Then, provide necessary codes for the following: a. Split the data into a training set and a test set. b. Fit a linear model using least squares on the training set to predict Sales using the entire collection of predictors. Report Cp , BIC, R2 , and RSS for this model c. Use the fitted model to predict responses for the test data and report the test error (RSS) obtained....
To import the Auto dataset into Rstudio: library("ISLR") data(Auto) view(Auto) Then, provide necessary codes for the...
To import the Auto dataset into Rstudio: library("ISLR") data(Auto) view(Auto) Then, provide necessary codes for the following: a. Use the vehicle name to name the rows and then remove the variable name from the data set since it is not of use for modelling. b. Split the data into a training set and a test set. c. Fit a regression tree to the the training set. Report the training error obtained. d. Plot the tree. e. How many terminal nodes...
When using the import wizard in MATLAB to import data fro, a .csv file the data...
When using the import wizard in MATLAB to import data fro, a .csv file the data appears in MATLAB in the following format "35:53.2" how do I convert this into more usable matlab values? I think that the duration function was used to generate the time format. The code will need to be written in MATLAB software I will leave feedback if you are able to provide a correct response. Thank you
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT