In: Statistics and Probability
Using R
df = read.table("../Documents/creditCardData.txt")
colnames(df )= c("Income","Size","Charged")
#2
model1 <- lm (Charged ~Income , data = df)
summary(model1)
#3
model2 <- lm (Charged ~Size , data = df)
summary(model2)
2) Develop an estimated regression equation, using annual income as the independent variable. Insert Regression equation estimation results here (excluding the ANOVA):
a. Interpret the estimated slope coefficient.
slope = 106.482
if income increases by 1000$, amount charged increases by 106.482$ on average
b. Interpret the R-square.
R^2 = 0.7427
this means 74.27 % of variation in amount charged is explained by this model
c. Interpret the p-value on the slope.
p-value = 0
this means that the slope is significantly different from 0
d. Interpret the 95% confidence interval.
95% confidence interval of slope is (92.3556,120.6085)
3) Develop an estimated regression equation, using household size as the independent variable. Insert Regression equation estimation results here (excluding the ANOVA):
a. Interpret the estimated slope coefficient.
similar to Q2
slope = 502.9
if Household size increases by 1 unit , amount charged increases by 502.9$ on average
b. Interpret the R-square.
R^2 = 0.06136
this means 6.136% of variation in amount charged is explained by this model
c. Interpret the p-value on the slope.
p-value = 0.0267
here p-value < 0.05( alpha)
hence we reject the null hypothesis
we conclude that the slope is significant here too
d. Interpret the 95% confidence interval.
95% confidence interval of slope is (59.50223,946.2219)
4) Which of the two models is the better predictor of annual credit card charges? Defend your decision.
Model 1 is better
as this has much higher R^2