In: Statistics and Probability
Label both axes with words.
In order to analyze the data we will use the line of best fit.
12) Can the regression line be used for prediction?
Quarter |
Number in millions |
|
Q4 '08 |
100 |
|
Q1 '09 |
197 |
|
Q2 '09 |
242 |
|
Q3 '09 |
305 |
|
Q4 '09 |
360 |
|
Q1 '10 |
431 |
|
Q2 '10 |
482 |
|
Q3 '10 |
550 |
|
Q4 '10 |
608 |
|
Q1 '11 |
680 |
|
Q2 '11 |
739 |
|
Q3 '11 |
800 |
|
Q4 '11 |
845 |
|
Q1 '12 |
901 |
|
Q2 '12 |
955 |
|
Q3 '12 |
1,007 |
|
Q4 '12 |
1,056 |
|
Q1 '13 |
1,110 |
|
Q2 '13 |
1,155 |
|
Q3 '13 |
1,189 |
|
Q4 '13 |
1,228 |
|
Q1 '14 |
1,276 |
|
Q2 '14 |
1,317 |
|
Q3 '14 |
1,350 |
|
Q4 '14 |
1,393 |
|
Q1 '15 |
1,441 |
|
Q2 '15 |
1,490 |
|
Q3 '15 |
1,545 |
|
Q4 '15 |
1,591 |
|
Q1 '16 |
1,654 |
|
Q2 '16 |
1,712 |
|
Q3 '16 |
1,788 |
|
Q4 '16 |
1,860 |
|
Q1 '17 |
1,936 |
Answer:
Dependent variable: Number in millions
Independent variable: Period (Quarterly)
Explanation:
Since we want to predict the number of users (in million) based on the quarterly time, the number depends on the period. Hence the number of users is a dependent variable and the quarterly time is the independent variable.
Scatterplot
Answer:
Explanation:
The scatterplot is obtained in excel by following these steps,
Step 1: Write the data values in excel. The screenshot is shown below,
Step 2: Select the time and Number of users columns then go to the graph area then INSERT > Recommended Chart > XY Scatter > OK.
Line of best fit.
Answer:
From the scatterplot, we can observe that there is a positive linear trend between and the number of users and the quarterly time. There is a strong association between the two variables which means the data values are aligned in a straight line that is why the linear trendline fits the data values very well.
y-intercept, a= 127.604
Slope to 3 decimal places, b= 52.024
The correlation coefficient, r = 0.9976
n = 34
The correlation of determination, r^2 = 0.995
line of the best-fit equation
There is a very strong positive linear correlation between the time and the number of users.
The t-test from the parameter estimation table shows that the slope coefficient is statistically significant at a 5% significance level (i.e. p-value = 1.31E-18 < 0.05) which indicates that there is a significant positive association between the time and the number of users.
The coefficient of determination (R-square value) tells, how well the regression model fits the data values. The R-square value of the model is 0.9952 which means, the model explains approximately 99.52% of the variance of the data value. Based on this evidence we can conclude the model is a good fit.
Hence we can predict the number of users based on the time very accurately.
Explanation:
The best fit line and the regression analysis is done in excel by following these steps,
Step 1: Write the data values in excel. The screenshot is shown below,
Step 2: DATA > Data Analysis > Regression > OK. The screenshot is shown below,
Step 3: Select Input Y Range: 'Number of users' column, Input X Range: 'Time' column, and select the Line Fit Plots then OK. The screenshot is shown below,
The result is obtained. The screenshot is shown below,
Best-fit line
the regression equation is defined as,
where a = intercept and b = slope of the best-fit equation
From the regression output summary,
Correlation coefficient
From the regression output summary,
Multiple R | 0.997573 |
Correlation of determination
From the regression output summary,
R Square | 0.995151 |
Prediction
Answer:
For first-quarter of 2008 => total monthly users = 23.556 millions
For first-quarter of 2020 => total monthly users = 2520.722 millions
Explanation:
The regression equation is,
For first-quarter of 2008 => Time = -2
For first-quarter of 2020 => Time = 46