In: Statistics and Probability
1. If a dependent variable Y and independent variable X have a positive correlation, what can we predict about the sign of the slope coefficient between X and Y in a simple regression? How about in a multiple regression?
2. What would you suspect if your multiple regression t-statistics were all insignificant but the F-statistic was significant?
3. Suppose you built a 95% confidence interval around a simple regression slope coefficient and got a lower bound of 5 and an upper bound of 10. Interpret the confidence interval.
4. Suppose you wanted to know whether one categorical variable is associated with another and did a hypothesis test to see whether the proportions in a chosen category of one variable are the same for the two levels of the other variable. What two other methods could you use to see whether there is association between the variables?
1) in Simple Linear Regression slope coefficient is positive because of correlation is postive. in contrast, Multiple Linear Regression regression coefficient negative but correlation coefficient positive.
2) Because this occurs when you have highly correlated predictor variables, i will explained you by using toy example.
RSS = 3:10 #Right shoe size
LSS = rnorm(RSS, RSS, 0.1) #Left shoe size - similar to RSS
cor(LSS, RSS) #correlation ~ 0.99
weights = 120 + rnorm(RSS, 10*RSS, 10)
##Fit a joint model
m = lm(weights ~ LSS + RSS)
##F-value is very small, but neither LSS or RSS are
significant
summary(m)
Call:
lm(formula = weights ~ LSS + RSS)
Residuals:
1 2 3 4 5 6 7 8
-16.04 9.80 6.89 7.58 -7.09 11.48 -12.94 0.33
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 141.99 14.04 10.11 0.00016 ***
LSS -1.74 54.13 -0.03 0.97563
RSS 9.29 54.04 0.17 0.87023
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.7 on 5 degrees of freedom
Multiple R-squared: 0.748, Adjusted R-squared: 0.647
F-statistic: 7.43 on 2 and 5 DF, p-value: 0.0318
##Fitting RSS or LSS separately gives a significant result.
summary(lm(weights ~ LSS))
Call:
lm(formula = weights ~ LSS)
Residuals:
Min 1Q Median 3Q Max
-15.43 -8.58 3.29 7.36 13.20
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 141.42 12.49 11.33 0.000028 ***
LSS 7.56 1.80 4.21 0.0057 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 11.6 on 6 degrees of freedom
Multiple R-squared: 0.747, Adjusted R-squared: 0.704
F-statistic: 17.7 on 1 and 6 DF, p-value: 0.00565
3) suppose we have 95 percent confidence interval around slope coefficient say beta_1
beta_1 belong to (5,10)
which means that 95 times the beta_1 values lies in between 5 to 10, out of 100 run.
4) To test to see whether the proportions in a chosen category of one variable are the same for two levels of the other variable. there are different methods to check is there association between the variables.
1) chi-square test of independence.
2) Kruskal valis test
The Kruskal-Wallis test will tell you if there are any significant differences among the medians of two or more groups. It is an extension of the Mann-Whitney U test, and will give you the same results as the Mann-Whitney U test if you just compare two groups