In: Statistics and Probability
( PARTS 5-8 Only )
1.Generate a scatter plot for CREDIT BALANCE vs. SIZE, including the graph of the "best fit" line. Interpret.
2.Determine the equation of the "best fit" line, which describes the relationship between CREDIT BALANCE and SIZE. Interpret the values for slope and intercept.
3.Determine the coefficient of correlation. Interpret.
4.Determine the coefficient of determination. Interpret.
5. Test the utility of this regression model (use a two tail test with α=.05) by setting up the appropriate test of hypothesis. Interpret your results, including the p-value.
6. Based on your findings in 1-5, what is your opinion about using SIZE to predict CREDIT BALANCE? Explain.
7.Compute the 98% confidence interval for β1 (the population slope). Interpret this interval.
8. What can we say about the credit balance for a customer that has a household size of 9 ? Explain your answer.
Location | Income ($1000) |
Size | Years | Credit Balance ($) |
Urban | 54 | 3 | 12 | 4,016 |
Rural | 30 | 2 | 12 | 3,159 |
Suburban | 32 | 4 | 17 | 5,100 |
Suburban | 50 | 5 | 14 | 4,742 |
Rural | 31 | 2 | 4 | 1,864 |
Urban | 55 | 2 | 9 | 4,070 |
Rural | 37 | 1 | 20 | 2,731 |
Urban | 40 | 2 | 7 | 3,348 |
Suburban | 66 | 4 | 10 | 4,764 |
Urban | 51 | 3 | 16 | 4,110 |
Urban | 25 | 3 | 11 | 4,208 |
Urban | 48 | 4 | 16 | 4,219 |
Rural | 27 | 1 | 19 | 2,477 |
Rural | 33 | 2 | 12 | 2,514 |
Urban | 65 | 3 | 12 | 4,214 |
Suburban | 63 | 4 | 13 | 4,965 |
Urban | 55 | 6 | 15 | 4,412 |
Urban | 21 | 2 | 18 | 2,448 |
Rural | 44 | 1 | 7 | 2,995 |
Urban | 37 | 5 | 5 | 4,171 |
Suburban | 62 | 6 | 13 | 5,678 |
Urban | 21 | 3 | 16 | 3,623 |
Suburban | 55 | 7 | 15 | 5,301 |
Rural | 42 | 2 | 19 | 3,020 |
Urban | 41 | 7 | 18 | 4,828 |
Suburban | 54 | 6 | 14 | 5,573 |
Rural | 30 | 1 | 14 | 2,583 |
Urban | 48 | 2 | 8 | 3,866 |
Urban | 34 | 5 | 5 | 3,586 |
Suburban | 67 | 4 | 13 | 5,037 |
Rural | 50 | 2 | 11 | 3,605 |
Urban | 67 | 5 | 1 | 5,345 |
Urban | 55 | 6 | 10 | 5,370 |
Urban | 52 | 2 | 11 | 3,890 |
Urban | 62 | 3 | 2 | 4,705 |
Urban | 64 | 2 | 6 | 4,157 |
Suburban | 22 | 3 | 18 | 3,899 |
Urban | 29 | 4 | 4 | 3,890 |
Suburban | 39 | 2 | 18 | 2,972 |
Rural | 35 | 1 | 11 | 3,121 |
Urban | 39 | 4 | 15 | 4,183 |
Suburban | 54 | 3 | 9 | 3,730 |
Suburban | 23 | 6 | 18 | 4,127 |
Rural | 27 | 2 | 1 | 2,921 |
Urban | 26 | 7 | 17 | 4,603 |
Suburban | 61 | 2 | 14 | 4,273 |
Rural | 30 | 2 | 14 | 3,067 |
Rural | 22 | 4 | 16 | 3,074 |
Suburban | 46 | 5 | 13 | 4,820 |
Suburban | 66 | 4 | 20 | 5,149 |
Rural | 53 | 1 | 7 | 2845 |
Urban | 44 | 6 | 5 | 3962 |
Suburban | 74 | 7 | 12 | 5394 |
Urban | 25 | 3 | 15 | 3442 |
Suburban | 66 | 7 | 14 | 5036 |
1.Generate a scatter plot for CREDIT BALANCE vs. SIZE, including the graph of the "best fit" line. Interpret.
Select Size and credit balance data and go to insert tab --> select scatter plot diagram
- select one of the data point and right click " add trend line"
- In the right side format trendline option "select display an equation chart " and select r-squared value"
2.Determine the equation of the "best fit" line, which describes the relationship between CREDIT BALANCE and SIZE. Interpret the values for slope and intercept.
we have already brought best fit line and equation of slope and intercept
y = 385.06x + 2620.3
3.Determine the coefficient of correlation. Interpret.
=CORREL(C3:C57,D3:D57)
r =0.7632
4.Determine the coefficient of determination. Interpret.
R2 value also we got it in the scatterplot diagram.
simple way to calculate is "r *r"
r-squared = 0.7632 * 0.7632
= 0.5825
5. Test the utility of this regression model (use a two tail test with α=.05) by setting up the appropriate test of hypothesis. Interpret your results, including the p-value.
Significance F is the p value
A p-value is the probability of seeing something as extreme as was observed, if the model were true." In hypothesis testing, when your p-value is less than the alpha level you selected (typically 0.05), you'd reject the null hypothesis in favor of thealternative hypothesis.
6. Based on your findings in 1-5, what is your opinion about using SIZE to predict CREDIT BALANCE? Explain.
R-squared value 0.5 < r < 0.7 this value is generally considered a Moderate effect size. So r -squared value 0.58 would tell us the size and credit balance are having moderate effect size.
Best fit also clearly says the trend is between sze and credit balance would be positive linear relationship
7.Compute the 98% confidence interval for β1 (the population slope). Interpret this interval.
find mean, standard deviation and sample size for the credit balance
What can we say about the credit balance for a customer that has a household size of 9 ?
Since the maximum value of the predictor variable (size) used to formulate the given regression model is 7, which is much less than 9, hence we cannot use the given regression model to accurately estimate the credit balance for a customer that has a household size of 9.