In: Statistics and Probability
Write an R Script that does the following for "sampleData1.dta"
a) Regress "earn," "adcc," and "tinc" on a constant.
b) Regress earnings on "higrade," "age," and "agesq." Display TSS, RSS, ESS and R Squared Value
c) Reestimate the regression from pt. b, this time omitting the constant term.
Before starting with the regression, you need to call the data in R.So, run the following code in the beginning.
CODE:
mydata=read.table("sampleData1.dta",header=T,sep=",")
Here, in the code within quotes provide the path where your data is stored. For example, if it is in the Documents folder in the C drive, then provide the code as "C:/Documents". For sep argument, provide a space or comma accordingly with which your data is separated. To check if your data is correctly read in R, you just run the following code:
View(data)
This will give a new window in R, showing your data as we see normally in an excel sheet.
PART A
Here, you want to regress "earn","adcc" and "tinc" on a constant term.So, I am assuming that there are 3 predictor variables with no explanatory variable, so considering only the constant term. In R script, run the following code:
model1=lm(cbind(earn,adcc,tinc)~1,data=mydata)
model1 # this provides the coefficients or just the intercept term
Further, if you want the TSS, RSS, ESS and so on. Run the following code:
summary(model1)
PART B
CODE:
model2=lm(earnings~higrade+age+agesq,data=mydata)
model2 #gives the regression coefficients
summary(model2) $r.squared #returns the R squared value
anova(model2) # gives the RSS, ESS for all the x variables. Adding these gives the TSS
PART C
You can run the same codes here from part (b) with the only change being in the model as below.
model3=lm(earnings~-1+higrade+age+agesq,data=mydata)
For the TSS,ESS,RSS and R Squared value run the codes from (b) changing the model to model3.