Question

In: Statistics and Probability

Census data was collected on the 50 states and Washington, D.C. We are interested in determining...

Census data was collected on the 50 states and Washington, D.C. We are interested in determining whether average lifespan (LIFE) is related to the ratio of males to females in percent (MALE), birth rate per 1,000 people (BIRTH), divorce rate per 1,000 people (DIVO), number of hospital beds per 100,000 people (BEDS), percentage of population 25 years or older having completed 16 years of school (EDUC) and per capita income (INCO).

"STATE" "MALE" "BIRTH" "DIVO" "BEDS" "EDUC" "INCO" "LIFE"
AK 119.1 24.8 5.6 603.3 14.1 4638 69.31
AL 93.3 19.4 4.4 840.9 7.8 2892 69.05
AR 94.1 18.5 4.8 569.6 6.7 2791 70.66
AZ 96.8 21.2 7.2 536.0 12.6 3614 70.55
CA 96.8 18.2 5.7 649.5 13.4 4423 71.71
CO 97.5 18.8 4.7 717.7 14.9 3838 72.06
CT 94.2 16.7 1.9 791.6 13.7 4871 72.48
DC 86.8 20.1 3.0 1859.4 17.8 4644 65.71
DE 95.2 19.2 3.2 926.8 13.1 4468 70.06
FL 93.2 16.9 5.5 668.2 10.3 3698 70.66
GA 94.6 21.1 4.1 705.4 9.2 3300 68.54
HW 108.1 21.3 3.4 794.3 14.0 4599 73.60
IA 94.6 17.1 2.5 773.9 9.1 3643 72.56
ID 99.7 20.3 5.1 541.5 10.0 3243 71.87
IL 94.2 18.5 3.3 871.0 10.3 4446 70.14
IN 95.1 19.1 2.9 736.1 8.3 3709 70.88
KS 96.2 17.0 3.9 854.6 11.4 3725 72.58
KY 96.3 18.7 3.3 661.9 7.2 3076 70.10
LA 94.7 20.4 1.4 724.0 9.0 3023 68.76
MA 91.6 16.6 1.9 1103.8 12.6 4276 71.83
MD 95.5 17.5 2.4 841.3 13.9 4267 70.22
ME 94.8 17.9 3.9 919.5 8.4 3250 70.93
MI 96.1 19.4 3.4 754.7 9.4 4041 70.63
MN 96.0 18.0 2.2 905.4 11.1 3819 72.96
MO 93.2 17.3 3.8 801.6 9.0 3654 70.69
MS 94.0 22.1 3.7 763.1 8.1 2547 68.09
MT 99.9 18.2 4.4 668.7 11.0 3395 70.56
NC 95.9 19.3 2.7 658.8 8.5 3200 69.21
ND 101.8 17.6 1.6 959.9 8.4 3077 72.79
NE 95.4 17.3 2.5 866.1 9.6 3657 72.60
NH 95.7 17.9 3.3 878.2 10.9 3720 71.23
NJ 93.7 16.8 1.5 713.1 11.8 4684 70.93
NM 97.2 21.7 4.3 560.9 12.7 3045 70.32
NV 102.8 19.6 18.7 560.7 10.8 4583 69.03
NY 91.5 17.4 1.4 1056.2 11.9 4605 70.55
OH 94.1 18.7 3.7 751.0 9.3 3949 70.82
OK 94.9 17.5 6.6 664.6 10.0 3341 71.42
OR 95.9 16.8 4.6 607.1 11.8 3677 72.13
PA 92.4 16.3 1.9 948.9 8.7 3879 70.43
RI 96.2 16.5 1.8 960.5 9.4 3878 71.90
SC 96.5 20.1 2.2 739.9 9.0 2951 67.96
SD 98.4 17.6 2.0 984.7 8.6 3108 72.08
TN 93.7 18.4 4.2 831.6 7.9 3079 70.11
TX 95.9 20.6 4.6 674.0 10.9 3507 70.90
UT 97.6 25.5 3.7 470.5 14.0 3169 72.90
VA 97.7 18.6 2.6 835.8 12.3 3677 70.08
VT 95.6 18.8 2.3 1026.1 11.5 3447 71.64
WA 98.7 17.8 5.2 556.4 12.7 3997 71.72
WI 96.3 17.6 2.0 814.7 9.8 3712 72.48
WV 93.9 17.8 3.2 950.4 6.8 3038 69.48
WY 100.7 19.6 5.4 925.9 11.8 3672 70.29

Suppose we are interested in fitting a regression model using LIFE as the response variable and some subset of the variables (MALE, BIRTH, DIVO, and INCO) as predictor.

lm(life ~ male + birth + divo + inco, data = DATA)

(i.1) Perform variable selection by finding the subset model that minimizes the AIC criteria. State the ’best model’.
(i.2) Perform variable selection using forward selection. State the ’best model’.

Please answer the above question using R command. List all the R commands you have used, the screenshot of the data you referring to, and which is the best model referring to the data. Thanks!

Solutions

Expert Solution

CODES:

data<-read.table("C:/Users/Computer/Desktop/data.txt",header=TRUE,sep="")
data
model<-lm(LIFE ~ MALE + BIRTH + DIVO + INCO, data = data)
summary(model)
st1 <- stepAIC(model, direction = "both")
st2 <- stepAIC(model, direction = "forward")
st3 <- stepAIC(model, direction = "backward")
summary(st1)
summary(st2)
summary(st3)

A)

AIC(FULL MODEL)= 33.41

AIC(REDUCED MODEL)= 31.44

Therefore,Life~Male+Birth+Divo is the best model.

B)

Even looking at Adj R^2 value we can say that reduced model is the best model.


Related Solutions

Census data was collected on the 50 states and Washington, D.C. We are interested in determining...
Census data was collected on the 50 states and Washington, D.C. We are interested in determining whether average lifespan (LIFE) is related to the ratio of males to females in percent (MALE), birth rate per 1,000 people (BIRTH), divorce rate per 1,000 people (DIVO), number of hospital beds per 100,000 people (BEDS), percentage of population 25 years or older having completed 16 years of school (EDUC) and per capita income (INCO). (a) Fit the MLR model with LIFE (y) as...
Census data was collected on the 50 states and Washington, D.C. We are interested in determining...
Census data was collected on the 50 states and Washington, D.C. We are interested in determining whether average lifespan (LIFE) is related to the ratio of males to females in percent (MALE), birth rate per 1,000 people (BIRTH), divorce rate per 1,000 people (DIVO), number of hospital beds per 100,000 people (BEDS), percentage of population 25 years or older having completed 16 years of school (EDUC) and per capita income (INCO). "STATE" "MALE" "BIRTH" "DIVO" "BEDS" "EDUC" "INCO" "LIFE" AK...
Census data was collected on the 50 states and Washington, D.C. We are interested in determining...
Census data was collected on the 50 states and Washington, D.C. We are interested in determining whether average lifespan (LIFE) is related to the ratio of males to females in percent (MALE), birth rate per 1,000 people (BIRTH), divorce rate per 1,000 people (DIVO), number of hospital beds per 100,000 people (BEDS), percentage of population 25 years or older having completed 16 years of school (EDUC) and per capita income (INCO). "STATE" "MALE" "BIRTH" "DIVO" "BEDS" "EDUC" "INCO" "LIFE" AK...
A friend who lives in Los Angeles makes frequent consulting trips to Washington, D.C.; 50% of...
A friend who lives in Los Angeles makes frequent consulting trips to Washington, D.C.; 50% of the time she travels on airline #1, 20% of the time on airline #2, and the remaining 30%of the time on airline #3. For airline #1, flights are late into D.C. 15% of the time and late into L.A. 10% of the time. For airline #2, these percentages are 40% and 30%, whereas for airline #3 the percentages are 35% and 20%. If we...
Suppose that we collected annual data to identify factors determining technological innovation of firms (Data Obtained)...
Suppose that we collected annual data to identify factors determining technological innovation of firms (Data Obtained) a. Technological Innovation (number of new patents) of firms b. No of employees / c. R&D expenditures /d. Appropriability /e. No of technological alliances with another firms. We want to check if 1) there is inverted U shape relation between the no of alliances and the technological innovation 2) the impact of the R&D expenditure on the technological innovation are different by appropriability level...
The following data was collected by a student performing the DETERMINING THE MOLAR ENTHALPY OF NEUTRALIZATION...
The following data was collected by a student performing the DETERMINING THE MOLAR ENTHALPY OF NEUTRALIZATION portion of the experiment. 50.00 mL of a 0.250 M acid is combined with 50.00 mL of 0.255 M NaOH. Before the reaction, the acid and base are at a temperature of 24.92 °C. After mixing, the neutralized solution reaches a maximum temperature of 26.50 °C in a calorimeter (Ccal=58.4 J/°C). The neutralized solution has a specific heat of 3.89 J/g°C and a density...
Use the data and Excel to answer this question. It contains the United States Census Bureau’s...
Use the data and Excel to answer this question. It contains the United States Census Bureau’s estimates for World Population from 1950 to 2014. You will find a column of dates and a column of data on the World Population for these years. Generate the time variable t. Then run a regression with the Population data as a dependent variable and time as the dependent variable. Have Excel report the residuals. (a) Based on the ANOVA table and t-statistics, does...
Use the data and Excel to answer this question. It contains the United States Census Bureau’s...
Use the data and Excel to answer this question. It contains the United States Census Bureau’s estimates for World Population from 1950 to 2014. You will find a column of dates and a column of data on the World Population for these years. Generate the time variable t. Then run a regression with the Population data as a dependent variable and time as the dependent variable. Have Excel report the residuals. (a) Based on the ANOVA table and t-statistics, does...
Logistic Regression In logistic regression we are interested in determining the outcome of a categorical variable....
Logistic Regression In logistic regression we are interested in determining the outcome of a categorical variable. In most cases, we deal with binomial logistic regression with the binary response variable, for example yes/no, passed/failed, true/false, and others. Recall that logistic regression can be applied to classification problems when we want to determine a class of an event based on the values of its features.    In this assignment we will use the heart data located at   http://archive.ics.uci.edu/ml/datasets/Statlog+%28Heart%29 Here is the...
We are interested in determining if the amount of weight a woman gains during pregnancy affects...
We are interested in determining if the amount of weight a woman gains during pregnancy affects the babies birth weight. The following data are provided: Moms Weight Gained Baby's Weight in Pounds 32 6.88 23 6.81 50 7.25 15 8.81 12 5.81 60 6.56 20 7.38 45 7.19 22 8.69 20 5.88 Which one is the independent variable and which one is the dependent variable? Find b1 Find b0. Find SST. Find SSR. Find SSE. Find the coefficient of determination....
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT