In: Statistics and Probability
Question 2
With the growth of internet service providers, a researcher decides to examine whether there is a correlation between cost of internet service per month (rounded to the nearest dollar) and degree of customer satisfaction (on a scale of 1 - 10 with a 1 being not at all satisfied and a 10 being extremely satisfied). The researcher only includes programs with comparable types of services. A sample of the data is provided below.
dollars |
satisfaction |
11 |
6 |
18 |
8 |
17 |
10 |
15 |
4 |
9 |
9 |
5 |
6 |
12 |
3 |
19 |
5 |
22 |
2 |
25 |
10 |
a) Find the equation of the fitted regression line.
b) Estimate the standard deviation of the error term ei
c) Compute the correlation coefficient
c) What is the R2 for this model?
e) Calculate a 90% confidence interval for b1
f) Test the null hypothesis that b1 = 0 (perform a two-sided test), using α = 0.1. Is the model useful?
g) Perform the regression using SAS, and give the p-value to the test in part f ). Verify that the p-value agrees with your conclusion in part f ).
Soln
a)
dollars (X) |
satisfaction (Y) |
X*Y |
X2 |
Y2 |
|
11 |
6 |
66 |
121 |
36 |
|
18 |
8 |
144 |
324 |
64 |
|
17 |
10 |
170 |
289 |
100 |
|
15 |
4 |
60 |
225 |
16 |
|
9 |
9 |
81 |
81 |
81 |
|
5 |
6 |
30 |
25 |
36 |
|
12 |
3 |
36 |
144 |
9 |
|
19 |
5 |
95 |
361 |
25 |
|
22 |
2 |
44 |
484 |
4 |
|
25 |
10 |
250 |
625 |
100 |
|
Total |
153 |
63 |
976 |
2679 |
471 |
Let the regression equation be: Y = a + bX
Where
Slope(b) = {n*∑XY - ∑X *∑Y}/{n*∑X2 – (∑X)2 } = 0.-36
and a = ∑Y/n – b*∑X/n = 5.75
Hence,
Regression Equation
Satisfaction = 5.75 + 0.036 * dollars
b)
satisfaction (Y) |
Y Predicted |
Residual |
6 |
6.15 |
-0.15 |
8 |
6.40 |
1.60 |
10 |
6.36 |
3.64 |
4 |
6.29 |
-2.29 |
9 |
6.07 |
2.93 |
6 |
5.93 |
0.07 |
3 |
6.18 |
-3.18 |
5 |
6.43 |
-1.43 |
2 |
6.54 |
-4.54 |
10 |
6.65 |
3.35 |
Standard Deviation of the Error (Residual) is calculated using the below formula and above values:
Standard Deviation = 2.86
c)
Correlation Coefficient using the above formula and values calculated on part a = 0.0764
d)
R Square = Correlation Coefficient 2 = 0.0058
e)
90% CI for b1 = {-0.271, 0.343}
f)
alpha = 0.1
Null and Alternate Hypothesis
H0: b1 = 0
Ha: b1 <> 0
Test Statistic’
t = 0.22
p-value = TDIST(0.22, 10-2,2) = 0.8337
Result
Since the p-value is greater than 0.1, we fail to reject the null hypothesis
Conclusion
The model is not useful as the slope is not significant. Also, the R square of the model is too less (0.58%)
g)
P-value using SAS = 0.8337
It is inline with the value calculated in part f
SAS Code
%web_drop_table(WORK.IMPORT);
FILENAME REFFILE '/home/rkvimal0/sasuser.v94/Data_5Feb20.csv';
PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=WORK.IMPORT;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=WORK.IMPORT; RUN;
ods noproctitle;
ods graphics / imagemap=on;
proc reg data=WORK.IMPORT alpha=0.05 plots(only)=(diagnostics residuals
fitplot observedbypredicted);
model satisfaction=dollars/;
run;
quit;
Output