In: Statistics and Probability
# Reading the data into R: my.datafile <- tempfile() cat(file=my.datafile, " 71 15 74 19 70 11 71 15 69 12 73 17 72 15 75 19 72 16 74 18 71 13 72 15 73 17 72 16 71 15 75 20 71 15 75 19 78 22 79 23 72 16 75 20 76 21 74 19 70 13 ", sep=" ") options(scipen=999) # suppressing scientific notation simpbasketball <- read.table(my.datafile, header=FALSE, col.names=c("height", "goals"))
COMPUTER CALCULATIONS:
I need to know how to code in R for the solutions, not by hand.
2. Look at the data in Table 7.18 on page 368 of the textbook. These data are also
given in the SAS code labeled “SAS_basketball_goal_data” and R code labeled basketball goal data .
The dependent variable is goals and the independent variable is height of basketball players.
Complete a SAS /R program and answer the following questions about the data set:
(a) Does a scatter plot indicate a linear relationship between the two variables?
Is there anything disconcerting about the scatter plot? Explain.
(b) Fit the least-squares regression line (using SAS / R) and interpret the estimated slope
in the context of this data set. Does it make sense to interpret the estimated intercept? Explain.
(c) For these data, what is the unbiased estimate of the error variance? (Give a number.)
(d) Using the SAS / R output, test the hypothesis that the true slope of the regression line
is zero (as opposed to nonzero). State the appropriate null and alternative hypotheses,
give the value of the test statistic and give the appropriate P-value. (Use significance
level of 0.05.) Explain what this means in terms of the relationship between the two
variables.
(e) Using SAS / R, find a 95% confidence interval for the mean basketball goal for
a player with a height of 77 inches. In addition find a 95% prediction interval for
basketball goal for a player with a height of 77 inches.
(a)
>
height<-c(71,74,70,71,69,73,72,75,72,74,71,72,73,72,71,75,71,75,78,79,72,75,76,74,70)
> height
[1] 71 74 70 71 69 73 72 75 72 74 71 72 73 72 71 75 71 75 78 79 72
75 76 74 70
>
goals<-c(15,19,11,15,12,17,15,19,16,18,13,15,17,16,15,20,15,19,22,23,16,20,21,19,13)
> goals
[1] 15 19 11 15 12 17 15 19 16 18 13 15 17 16 15 20 15 19 22 23 16
20 21 19 13
> plot(height,goals)
Yes, the scatter plot indicates linear relationship.
No there is nothing disconcerting about this scatter plot.
(b) & (c)
> m<-lm(height~goals)
> summary(m)
Call:
lm(formula = height ~ goals)
Residuals:
Min 1Q Median 3Q Max
-0.6712 -0.4449 -0.2185 0.3288 1.5183
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 59.97113 0.72344 82.9 < 0.0000000000000002 ***
goals 0.77369 0.04228 18.3 0.00000000000000332 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.643 on 23 degrees of freedom
Multiple R-squared: 0.9357, Adjusted R-squared: 0.9329
F-statistic: 334.9 on 1 and 23 DF, p-value:
0.000000000000003317
> (0.643)^2
[1] 0.413449
The estimated slope is 0.77369 and estimated intercept is
59.97113.
The significance of the slope is that for one unit change in the
regressor, the regressand will change by 0.77369 unit. And in case
of no regressor, i.e. regressor = zero the regressand will be
equivalent to the intercept.
The unbiased estimate of error variance is square of residual
standard error, i.e. (0.643)^2 which is equal to 0.413449.
(d)
H0: Slope of Regression line is zero v/s H1: not H0
The F-statistic: 334.9 on 1 and 23 DF, p-value:
0.000000000000003317
Hence at alpha=0.05 level of significance we reject H0 since
p-value is less than alpha.
Thus in terms of relationship between the two variables, the test
implies that they are related, i.e. the regressor has a significant
role in estimating the regressand.