In: Statistics and Probability
Wageweb conducts surveys of salary data and presents summaries on its website. Based on salary data as of October 1, 2002, Wageweb reported that the average annual salary for sales vice presidents was $142,111, with an average annual bonus of $15,432. Assume the following data are a sample of the annual salary and bonus for 10 sales vice presidents. Data are in thousands of dollars.
Vice President |
Salary |
Bonus |
1 |
135 |
12 |
2 |
115 |
14 |
3 |
146 |
16 |
4 |
167 |
19 |
5 |
165 |
22 |
6 |
176 |
24 |
7 |
98 |
7 |
8 |
136 |
17 |
9 |
163 |
18 |
10 |
119 |
11 |
Compute SST, SSR, and SSE
Compute the coefficient of determination r2. Comment on the goodness of fit.
What is the value of the sample correlation coefficient?
Develop the null and alternative hypothesis to test the linear relationship between salary and bonus
At the .05 level of significance, determine whether salary and bonus are linearly related. Use the t test.
Solve the problem in Excel and compare your results.
f) Compute SST, SSR, and SSE
RELATIONSHIPAMONG SST, SSR, AND SSE
SST=SSR+SSE
Where SST - total sum of squares, SSR - sum of squares due to regression, SSE - sum of squares due to error. SSR can be thought of as the explained portion of SST, and SSE can be thought of as the unexplained portion of SST.
g) Compute the coefficient of determination r2. Comment on the goodness of fit.
The ratio SSR/SST, which will take values between zero and one, is used to evaluate the goodness of fit for the estimated regression equation. This ratio is called the coefficient of determination and is denoted by r2. When we express the coefficient of determination as a percentage, r2 can be interpreted as the percentage of the total sum of squares that can be explained by using the estimated regression equation.
Solution:
The least square line provided a very good fit; 85% of the variability in y has been explained by the least squares line y=-10.16+0.18x.
h) What is the value of the sample correlation coefficient?
The correlation coefficient as a descriptive measure of the strength of linear association between two variables, x and y. Values of the correlation coefficient are always between -1 and +1. A value of -1 indicates that the two variables x and y are perfectly related in a positive linear sense. That is, all data points are on a straight line that has a positive slope. A value of -1 indicates that x and y are perfectly related in a negative linear sense, with all data points on a straight line that has a negative slope. Values of the correlation coefficient close to zero indicate that x and y are not linearly related. The sign for the sample correlation coefficient is positive if the estimated regression equation has a positive slope (b1>0) and negative if the estimated regression equation has a negative slope (b1 < 0).
Solution:
i) Develop the null and alternative hypothesis to test the linear relationship between salary and bonus.
We would like to test whether there exist significant linear relationship between X and Y. The simple linear regression model is y=β0+ β1x+e. If x and y are linearly related, we must have β1≠0. The purpose of the t test is to see whether we can conclude that β1≠0. We will use the sample data to test the following hypotheses about the parameter β1.
H0:β1=0
Ha:β1≠0
If H0 is rejected, we will conclude that β1≠0 and that a statistically significant relationship exists between the two variables. However, if H0 cannot be rejected, we will have insufficient evidence to conclude that a significant relationship exists. The properties of the sampling distribution of b1, the least squares estimator of β1, provide the basis for the hypothesis test.
Solution:
j) At the .05 level of significance, determine whether salary and bonus are linearly related.
To find p-value we should use Excel. P-value=TDIST (t, df, number of tails)
t≈6.8, df=n-2=10-2=8.This is two tail test. p-value==TDIST(6.8769,8,2)≈ 0.00013
[p-value ≈ 0.00013] < [α = .05]
Since p-value < α we reject Ho. So alternative hypothesis is a true: β1 ≠ 0. It means there is a relationship between Salary and Bonus.
Where do you find tstat and its associated p-value on the Excel printout?
In the table below the ANOVA table, in the row labeled with the name you used for X. tstat is given in the column labeled t Stat and the p-value is given in the column labeled P-value.
Testing method 2: F-test
An F test, based on the F probability distribution, can also be used to test for significance in regression. With only one independent variable, the F test will provide the same conclusion as the t test; that is, if the t test indicates β1≠ 0 and hence a significant relationship, the F test will also indicate a significant relationship. But with more than one independent variable, only the F test can be used to test for an overall significant relationship.
The logic behind the use of the F test for determining whether the regression relationship is statistically significant is based on the development of two independent estimates of σ2. We explained how MSE provides an estimate of σ2. If the null hypothesis H0: β1=0 is true, the sum of squares due to regression, SSR, divided by its degrees of freedom provides another independent estimate of σ2. This estimate is called the mean square due to regression, or simply the mean square regression, and is denoted MSR. In general,
For the models we consider in this text, the regression degrees of freedom is always equal to the number of independent variables in the model:
Because we consider only regression models with one independent variable in the chapter 12, we have MSR= SSR/1= SSR.
If the null hypothesis (H0: β1=0) is true, MSR and MSE are two independent estimates of σ2 and the sampling distribution of MSR/MSE follows an F distribution with numerator degrees of freedom equal to one and denominator degrees of freedom equal to n - 2. Therefore, when β1=0, the value of MSR/MSE should be close to 1. However, if the null hypothesis is false (β1≠0), MSR will overestimate σ2 and the value of MSR/MSE will be inflated; thus, large values of MSR/MSE lead to the rejection of H0 and the conclusion that the relationship between x and y is statistically significant.
Solution:
Use as the test statistic which, should the model assumptions be valid and H0 be true, has over repeated sampling the F-distribution with 1 numerator df and n-2 denominator df. (k=1 for simple linear regression). Rejection rule: Reject H0 if p-value < α
critical value: Reject H0 if F-stat>F-critical
The value of F we can calculate manually or we can find F and its associated p-value on the Excel printout. This is ANOVA table. Test statistic F is given in the F column. The p-value is given in the Significance [of] F column.
Numerator Degree of Freedom is 1. Denominator degree of freedom is n-2.
df1=1, df2=8.
Excel
p-value =FDIST(F, numerator degrees of freedom df1, denominator degrees of freedom df2)
for our case p-value = FDIST(47,1,8)≈0. 0.00013
Since p-value < α we reject Ho. Alternative hypothesis is a true: β1 ≠ 0.
It means Salary and Bonus are related.
k) Solve the problem in Excel and compare your results.
Solution: use file Regression Excel tutorials
General form of the ANOVA Table
For Simple Linear Regression
*****Please please please LIKE THIS ANSWER, so that I can get a small benefit, Please****