In: Statistics and Probability
The LA Times, a well known paper has been going through some difficulties due to the advent of online news from Facebook and other media sources. The paper decides to study other major papers in big cities within the US. The data analyst, Michael has realized that the key variables related to the number of subscriptions are the population of the area, the advertising budget, average income of the families as well as average age of the labor force in that city
Subscriptions |
= |
Number of subscriptions ( in thousands) |
Population |
= |
The metropolitan population ( in thousands) |
Advertising budget |
= |
The advertising budget of the paper ( in $ thousands) |
Income |
= |
The median family income in the metropolitan area ( in $ thousands) |
Subscriptions |
Population |
Advertising Budget |
Income |
median age |
38.5 |
591.3 |
29.4 |
36.2 |
36 |
37.95 |
588.9 |
13.2 |
35.1 |
35 |
37.66 |
585.3 |
13.2 |
34.7 |
37 |
37.55 |
566.3 |
19.8 |
34.8 |
37 |
38.78 |
642.9 |
17.6 |
35.1 |
37 |
37.67 |
624.2 |
17.6 |
34.6 |
37 |
38.23 |
603.9 |
15.4 |
34.8 |
36 |
36.9 |
571.9 |
11 |
34.7 |
33 |
38.28 |
584.3 |
28.6 |
35.3 |
40 |
38.95 |
605 |
28.6 |
35.1 |
29 |
39.27 |
676.3 |
17.6 |
35.6 |
39 |
38.3 |
587.4 |
17.6 |
34.9 |
38 |
38.84 |
576.4 |
22 |
35.4 |
38 |
38.14 |
570.8 |
17.6 |
35 |
38 |
38.39 |
586.5 |
15.4 |
35.5 |
37 |
37.29 |
544 |
11 |
34.9 |
37 |
39.15 |
611.1 |
24.2 |
35 |
39 |
38.29 |
643.3 |
17.6 |
35.3 |
39 |
38.09 |
635.6 |
19.8 |
34.8 |
38 |
37.83 |
598.9 |
15.4 |
35.1 |
39 |
39.37 |
657 |
22 |
35.3 |
42 |
37.81 |
595.2 |
15.4 |
35.1 |
42 |
37.42 |
520 |
19.8 |
35.1 |
42 |
38.83 |
629.6 |
22 |
35.3 |
42 |
38.33 |
680 |
24.2 |
34.7 |
42 |
40.24 |
651.2 |
33 |
35.8 |
40 |
41.22 |
653.1 |
34 |
36.21 |
29 |
38.12 |
640 |
19.3 |
34.2 |
33 |
40 |
641.3 |
19.9 |
34.4 |
44 |
39.2 |
656 |
21 |
35.9 |
44 |
Solution
we will solve it by using excel and the steps are
Enter the Data into excel
Click on Data tab
Click on Data Analysis
Select Regression
Select input Y Range as Range of dependent variable.(subscriptions )
Select Input X Range as Range of independent variable (population, advertising budget, income and median age)
click on labels if your selecting data with labels
click on ok.
So this is the output of Regression in Excel.
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.8417 | |||||||
R Square | 0.7085 | |||||||
Adjusted R Square | 0.6618 | |||||||
Standard Error | 0.5382 | |||||||
Observations | 30.0000 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 4.0000 | 17.5997 | 4.3999 | 15.1887 | 0.0000 | |||
Residual | 25.0000 | 7.2421 | 0.2897 | |||||
Total | 29.0000 | 24.8419 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 99.0% | Upper 99.0% | |
Intercept | 15.1060 | 8.8167 | 1.7133 | 0.0990 | -3.0522 | 33.2643 | -9.4699 | 39.6820 |
Population | 0.0106 | 0.0028 | 3.8014 | 0.0008 | 0.0049 | 0.0163 | 0.0028 | 0.0184 |
Advertising Budget | 0.0657 | 0.0216 | 3.0398 | 0.0055 | 0.0212 | 0.1101 | 0.0055 | 0.1259 |
Income | 0.4468 | 0.2517 | 1.7751 | 0.0881 | -0.0716 | 0.9653 | -0.2548 | 1.1485 |
median age | -0.0029 | 0.0277 | -0.1063 | 0.9162 | -0.0599 | 0.0540 | -0.0800 | 0.0741 |
subscriptions = 15.1060+0.0106*Population+0.0657*Advertising Budget+0.4468*Income - 0.0029*median age
ANOVA | |||||
df | SS | MS | F | Significance F | |
Regression | 4.0000 | 17.5997 | 4.3999 | 15.1887 | 0.0000 |
Residual | 25.0000 | 7.2421 | 0.2897 | ||
Total | 29.0000 | 24.8419 |
Since the P-value = 0.0000 < 0.01 level of significance so we reject the null hypothesis and conclude that is there a significant relationship between subscriptions and population, advertising budget, income and median age of subscribers.
the multiple regression model
subscriptions = 15.1060+0.0106*Population+0.0657*Advertising Budget+0.4468*Income - 0.0029*median age
subscriptions = 15.1060+0.0106*450+0.0657*25+0.4468*33 - 0.0029*37
subscriptions = 36.16