In: Math
General guidelines:
Use EXCEL or PHStat to do the necessary computer work.
Do all the necessary analysis and hypothesis test constructions, and explain completely.
Read the textbook Chapter 13. Imagine that you are managing a mobile phone company. You want to construct a simple linear regression model to capture and represent the relationship between the number of customers and the annual sales level for a year with 95% confidence. You had conducted a pilot study for the past fifteen years and collected yearly observations as given in the following data.Where the number of customers in a year is represented by the Profiled Customers variable, measured by million customers unit, and the sales level is represented by the Annual Sales variable, measured by million US-dollars unit.
1) Investigate the agreement between the model and the data set for:
A) LINEARITY.
A1) Construct the "Dot Plot", a.k.a. "Scatter Plot," for this data. Visually inspect for the linear relationship between the number of customers and the sales level. Make comments based on your observations.
A2) Conduct the F-Test for linearity.
A3) If you have seen evidence of linearity in the F-Test, then:
Conduct the t-Test for the partial slope.
Construct the 95% Confidence Interval Estimator for the partial slope.
Thus, make comments about the linear relationship between the Profiled Customers and the Annual Sales, based on the partial slope information.
B) NORMALITY.
Construct the "Normal Probability Plot" for the Annual Sales variable, and make comments about the normality of annual sales level, based on your observations.
C) HOMOSCEDASTICITY.
Construct the "Residual Plot" and make comments about the variance of annual sales level, based on your observations.
D) INDEPENDENCE.
This data set is a Time-Series. Hence, investigate for the independence of observations in this time-series, based on the Durbin-Watson test.
2) If there is evidence of agreement between the model and data, and independence of observations, then construct the simple linear regression equation for this data set, based on the least square error method.
2A) Construct the 95% confidence interval for the actual average annual sales level for all the years that you have 5 million customer in a year,
2B) Construct the 95% prediction interval for the actual annual sales level for one year that you have 5 million customers in that year.
Years | Profiled Customers | Annual Sales | |
1 | 3.7 | 5.7 | |
2 | 3.6 | 5.9 | |
3 | 2.8 | 6.7 | |
4 | 5.6 | 9.5 | |
5 | 3.3 | 5.4 | |
6 | 2.2 | 3.5 | |
7 | 3.3 | 6.2 | |
8 | 3.1 | 4.7 | |
9 | 3.2 | 6.1 | |
10 | 3.5 | 4.9 | |
11 | 5.2 | 10.7 | |
12 | 4.6 | 7.6 | |
13 | 5.8 | 11.8 | |
14 | 2.9 | 4.1 | |
15 | 3 | 4.1 |
1) Investigate the agreement between the model and the data set for:
A) LINEARITY.
A1) Construct the "Dot Plot", a.k.a. "Scatter Plot," for this data. Visually inspect for the linear relationship between the number of customers and the sales level. Make comments based on your observations.
A2) Conduct the F-Test for linearity.
Calculated F= 75.6355, P=0.0000 which is < 0.05 level of significance. There is significant relation between sales and Profiled Customers.
Simple Linear Regression Analysis |
||||||
Regression Statistics |
||||||
Multiple R |
0.9238 |
|||||
R Square |
0.8533 |
|||||
Adjusted R Square |
0.8420 |
|||||
Standard Error |
0.9778 |
|||||
Observations |
15 |
|||||
ANOVA |
||||||
df |
SS |
MS |
F |
Significance F |
||
Regression |
1 |
72.3079 |
72.3079 |
75.6355 |
0.0000 |
|
Residual |
13 |
12.4281 |
0.9560 |
|||
Total |
14 |
84.7360 |
||||
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
-1.3885 |
0.9371 |
-1.4817 |
0.1622 |
-3.4130 |
0.6359 |
Profiled Customers |
2.1098 |
0.2426 |
8.6969 |
0.0000 |
1.5857 |
2.6339 |
A3) If you have seen evidence of linearity in the F-Test, then:
Conduct the t-Test for the partial slope.
Calculated F= 8.6969, P=0.0000 which is < 0.05 level of significance. The slope is significant
Construct the 95% Confidence Interval Estimator for the partial slope.
95% CI for slope = (1.5857, 2.6339).
Thus, make comments about the linear relationship between the Profiled Customers and the Annual Sales, based on the partial slope information.
When Profiled Customers increases by 1(million customers unit), Annual Sales increases by 2.1098 (million US-dollars unit.)
B) NORMALITY.
Construct the "Normal Probability Plot" for the Annual Sales variable, and make comments about the normality of annual sales level, based on your observations.
The normal plot shows that the not violates normality assumption.
C) HOMOSCEDASTICITY.
Construct the "Residual Plot" and make comments about the variance of annual sales level, based on your observations.
There is no pattern in the Residual Plot. HOMOSCEDASTICITY assumption is not violated.
D) INDEPENDENCE.
This data set is a Time-Series. Hence, investigate for the independence of observations in this time-series, based on the Durbin-Watson test.
Durbin-Watson Calculations |
|
Sum of Squared Difference of Residuals |
36.4291 |
Sum of Squared Residuals |
12.4281 |
Durbin-Watson Statistic |
2.9312 |
The value of Durbin-Watson is 2.93 which I around the value of 2 means that there is no autocorrelation in the sample.
2) If there is evidence of agreement between the model and data, and independence of observations, then construct the simple linear regression equation for this data set, based on the least square error method.
Annual Sales = -1.3885+ 2.1098 *Profiled Customers
2A) Construct the 95% confidence interval for the actual average annual sales level for all the years that you have 5 million customer in a year,
95% CI = (8.2960, 10.02515) million US-dollars unit.
2B) Construct the 95% prediction interval for the actual annual sales level for one year that you have 5 million customers in that year.
95% PI = (6.8782, 11.44298) million US-dollars unit.
Confidence Interval Estimate |
|
Data |
|
X Value |
5 |
Confidence Level |
95% |
Intermediate Calculations |
|
Sample Size |
15 |
Degrees of Freedom |
13 |
t Value |
2.160369 |
XBar, Sample Mean of X |
3.72 |
Sum of Squared Differences from XBar |
16.244 |
Standard Error of the Estimate |
0.977755 |
h Statistic |
0.167529 |
Predicted Y (YHat) |
9.160576 |
For Average Y |
|
Interval Half Width |
0.8646 |
Confidence Interval Lower Limit |
8.2960 |
Confidence Interval Upper Limit |
10.02515 |
For Individual Response Y |
|
Interval Half Width |
2.2824 |
Prediction Interval Lower Limit |
6.8782 |
Prediction Interval Upper Limit |
11.44298 |