Question

In: Statistics and Probability

# Reading the data into R: my.datafile <- tempfile() cat(file=my.datafile, " 71 15 74 19 70...

# Reading the data into R:

my.datafile <- tempfile()
cat(file=my.datafile, "
 71 15 
 74 19 
 70 11 
 71 15 
 69 12 
 73 17 
 72 15 
 75 19 
 72 16 
 74 18 
 71 13 
 72 15 
 73 17 
 72 16 
 71 15 
 75 20 
 71 15 
 75 19 
 78 22 
 79 23 
 72 16 
 75 20 
 76 21 
 74 19 
 70 13 
", sep=" ")

options(scipen=999) # suppressing scientific notation

simpbasketball <- read.table(my.datafile, header=FALSE, col.names=c("height", "goals")) 

COMPUTER CALCULATIONS:

I need to know how to code in R for the solutions, not by hand.

2. Look at the data in Table 7.18 on page 368 of the textbook. These data are also

given in the SAS code labeled “SAS_basketball_goal_data” and R code labeled basketball goal data .

The dependent variable is goals and the independent variable is height of basketball players.

Complete a SAS /R program and answer the following questions about the data set:

(a) Does a scatter plot indicate a linear relationship between the two variables?

Is there anything disconcerting about the scatter plot? Explain.

(b) Fit the least-squares regression line (using SAS / R) and interpret the estimated slope

in the context of this data set. Does it make sense to interpret the estimated intercept? Explain.

(c) For these data, what is the unbiased estimate of the error variance? (Give a number.)

(d) Using the SAS / R output, test the hypothesis that the true slope of the regression line

is zero (as opposed to nonzero). State the appropriate null and alternative hypotheses,

give the value of the test statistic and give the appropriate P-value. (Use significance

level of 0.05.) Explain what this means in terms of the relationship between the two

variables.

(e) Using SAS / R, find a 95% confidence interval for the mean basketball goal for

a player with a height of 77 inches. In addition find a 95% prediction interval for

basketball goal for a player with a height of 77 inches.

Solutions

Expert Solution

(a)
> height<-c(71,74,70,71,69,73,72,75,72,74,71,72,73,72,71,75,71,75,78,79,72,75,76,74,70)
> height
[1] 71 74 70 71 69 73 72 75 72 74 71 72 73 72 71 75 71 75 78 79 72 75 76 74 70
> goals<-c(15,19,11,15,12,17,15,19,16,18,13,15,17,16,15,20,15,19,22,23,16,20,21,19,13)
> goals
[1] 15 19 11 15 12 17 15 19 16 18 13 15 17 16 15 20 15 19 22 23 16 20 21 19 13
> plot(height,goals)

Yes, the scatter plot indicates linear relationship.
No there is nothing disconcerting about this scatter plot.

(b) & (c)
> m<-lm(height~goals)
> summary(m)

Call:
lm(formula = height ~ goals)

Residuals:
Min 1Q Median 3Q Max
-0.6712 -0.4449 -0.2185 0.3288 1.5183

Coefficients:
Estimate Std. Error t value Pr(>|t|)   
(Intercept) 59.97113 0.72344 82.9 < 0.0000000000000002 ***
goals 0.77369 0.04228 18.3 0.00000000000000332 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.643 on 23 degrees of freedom
Multiple R-squared: 0.9357, Adjusted R-squared: 0.9329
F-statistic: 334.9 on 1 and 23 DF, p-value: 0.000000000000003317

> (0.643)^2
[1] 0.413449

The estimated slope is 0.77369 and estimated intercept is 59.97113.
The significance of the slope is that for one unit change in the regressor, the regressand will change by 0.77369 unit. And in case of no regressor, i.e. regressor = zero the regressand will be equivalent to the intercept.


The unbiased estimate of error variance is square of residual standard error, i.e. (0.643)^2 which is equal to 0.413449.

(d)
H0: Slope of Regression line is zero v/s H1: not H0
The F-statistic: 334.9 on 1 and 23 DF, p-value: 0.000000000000003317
Hence at alpha=0.05 level of significance we reject H0 since p-value is less than alpha.
Thus in terms of relationship between the two variables, the test implies that they are related, i.e. the regressor has a significant role in estimating the regressand.


Related Solutions

57, 69, 70, 71, 74, 77, 79, 80, 80, 80, 80, 81, 81, 82, 85, 85,...
57, 69, 70, 71, 74, 77, 79, 80, 80, 80, 80, 81, 81, 82, 85, 85, 86, 88, 91, 95, 95, 100 1) Create a relative frequency table for these data. 2) Create a histogram for these data and comment on the shape of the distribution (skewed, symmetric, etc). 3) Create a boxplot for the data (you’ll need to report the median, quartiles, and outliers).
Given the following file : -rw-r--r-- 1 malsaid faculty 74 Oct 20 16:50 a What will...
Given the following file : -rw-r--r-- 1 malsaid faculty 74 Oct 20 16:50 a What will be the file permissions after the following command: chmod ou+x,g+w,o-r a
Approximate the measures of center for following GFDT. Data Frequency 70 - 74 2 75 -...
Approximate the measures of center for following GFDT. Data Frequency 70 - 74 2 75 - 79 1 80 - 84 3 85 - 89 3 90 - 94 6 95 - 99 12 100 - 104 13 105 - 109 20 110 - 114 15 mode = median = mean =      Report mode and median accurate to one decimal place. Report the mean accurate to two decimal places (or enter as a fraction).
Math Reading 52 65 63 71 55 61 68 77 66 76 57 70 59 69...
Math Reading 52 65 63 71 55 61 68 77 66 76 57 70 59 69 76 77 69 76 59 70 74 78 73 80 62 66 59 66 72 76 61 67 55 61 73 77 Can you teach me how to solve the problem below An educator conducted an experiment to test whether new directed reading activities in the classroom will help elementary school pupils improve some aspects of their reading ability. She arranged for a third...
Download the file data.csv (comma separated text file) and read the data into R using the...
Download the file data.csv (comma separated text file) and read the data into R using the function read.csv(). Your data set consists of 100 measurements in Celsius of body temperatures from women and men. Use the function t.test() to answer the following questions. Do not assume that the variances are equal. Denote the mean body temperature of females and males by μFμF and μMμMrespectively. (a) Find the p-value for the test H0:μF=μMH0:μF=μM versus HA:μF≠μM.HA:μF≠μM. Answer (b) Are the body temperatures...
Data Analysis & Visualization Topic R vector and save the r code in a text file...
Data Analysis & Visualization Topic R vector and save the r code in a text file Problem 1. Create two vectors named v and w with the following contents:      v : 21,10,32,2,-3,4,5,6,7,4,-22      w : -18,72,11,-9,10,2,34,-5,18,9,2 A) Print the length of the vectors B) Print all elements of the vectors C) Print elements at indices 3 through 7. D) Print the sum of the elements in each vector. E) Find the mean of each vector. (Use R's mean() function)...
USE R AND SHOW CODES!! 3.a. In 1988, 71% of 15-44 year old women who have...
USE R AND SHOW CODES!! 3.a. In 1988, 71% of 15-44 year old women who have ever been married have used some form of contraception. What is the probability that, in a sample of 200 women in these childbearing years, fewer than 120 of them have used some form of contraception? 3.b. About 1 percent of women have breast cancer. A cancer screening method can detect 80 percent of genuine cancers with a false alarm rate of 10 percent. What...
19. If r = 15% and n = 5, the payment on a $5000 discount loan > $10,000.
True or False19. If r = 15% and n = 5, the payment on a $5000 discount loan > $10,000.20. If r = 6% and n = 6%, the FVIFA > 7.
Given n=10, x=20,∝=0.01: Based on the data the regression equation is given by y=70+1.03x, y ̅=74...
Given n=10, x=20,∝=0.01: Based on the data the regression equation is given by y=70+1.03x, y ̅=74 and r=.25. What is the best predicted value for x?
Please use R to solve part e and f The data file data2.txt gives a data...
Please use R to solve part e and f The data file data2.txt gives a data set with two variables x and y. The first column in the data set is just row numbers not useful for this question. (e) Use the Shapiro-Wilks test to test for Normality of the data. State your null and alternative hypotheses, p-value and conclusion. Use α = 0.05 (f) Apply the transformation y 0 = log(y) and run the regression on y 0 on...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT