In: Statistics and Probability
Our societal values: do taller basketball players get better paid? Consider the data set labeled NBA 2008-2009 Data.
(a) Select 25 basketball players (use random.org as explained in Problem 2), and record their heights and annual salary in two columns. Display your data values. There should be 25 data values in each column.
Player # |
Height |
Annual salary |
13 |
81 |
$10,000,000 |
21 |
73 |
$2,700,000 |
49 |
84 |
$711,517 |
53 |
83 |
$2,295,480 |
57 |
77 |
$1,931,160 |
79 |
80 |
$1,600,000 |
83 |
76 |
$21,372,000 |
121 |
79 |
$25,000 |
127 |
77 |
$442,114 |
270 |
84 |
$2,986,080 |
283 |
82 |
$1,141,838 |
290 |
80 |
$1,173,480 |
294 |
81 |
$9,226,550 |
350 |
82 |
$1,542,600 |
363 |
78 |
$12,222,221 |
386 |
79 |
$1,983,453 |
392 |
83 |
$5,784,480 |
397 |
81 |
$3,395,760 |
400 |
79 |
$4,000,000 |
418 |
82 |
$7,348,018 |
419 |
75 |
$4,250,000 |
428 |
83 |
$21,372,000 |
433 |
78 |
$1,081,440 |
451 |
75 |
$998,398 |
452 |
84 |
$5,500,000 |
(b) You would like to see whether there is a correlation between the players’ height and annual salary. Let height be the explanatory (X) variable, and annual salary be the response (Y) variable. Use appropriate software to obtain a full regression output.
(c) Identify the intercept and slope, and write the regression equation. Identify the coefficient of determination, and interpret the result.
(d) Calculate the coefficient of correlation, and interpret the result.
(e) Find a 95% confidence interval for the population slope. Does the population slope exceed 0? To answer this question, state the hypotheses, identify the p-value, and interpret the result.
(f) Provide a scatter diagram produced by EXCEL.
(g) Identify and interpret the greatest positive residual. Provide the complete list of residuals.
b) You would like to see whether there is a correlation between the players’ height and annual salary. Let height be the explanatory (X) variable, and annual salary be the response (Y) variable. Use appropriate software to obtain a full regression output.
1. Put the data in excel as shown below.
2. We use the regression option from the data analysis tab
(found under Data)
3. Update the dialogue box as
shown below.
4. The output will be generated as given below.
5. The regression equation is obtained from the output (highlighted
in green )
(c) Identify the intercept and slope, and write the
regression equation. Identify the coefficient of determination, and
interpret the result.
Intercept = -977566.5
Slope = 74911.19
The regression equation
Annual Salary = -977566.51 + 74911.19 (Height)
The coefficient of determination is nothing but the R square
value. (Highlighted in Blue. It has a value between 0 and 1. Higher
the value better the model is.
The coefficient of determination explains the amount of variability
in the data explained by the model.
In this case the Rsquare = 0.001587082, which is very low and it not a good model at all.
(d) Calculate the coefficient of correlation, and interpret the result.
The coefficient of correlation is very very small, indicate that there is no relationship between annual salary and the height of the player.
(e) Find a 95% confidence interval for the population
slope. Does the population slope exceed 0? To answer this question,
state the hypotheses, identify the p-value, and interpret the
result.
Hypothesis:
H0: Beta coefficient of the independent variable(height) is equal
to zero.
H1: Beta coefficient of the independent variable is not equal to
zero.
From the regression output ( highlighted in orange)
we get
tstat = 0.191209
pvalue = 0.8500
From the regression output we see that the pvalue for independent variable is 0.8500, which is greater than 0.05, hence we fail to reject the null hypothesis and conclude that the Beta coefficient is not significant and the independent variable is a not significant predictor of y.
Confidence interval
Confidence interval (-732148.0057,881970.4024)
Since the confidence interval contains zero the variable is not
significant.
(f) Provide a scatter diagram produced by
EXCEL.
(g) Identify and interpret the greatest positive residual. Provide the complete list of residuals.