In: Math
The consumer food database contains five variables: Annual Food Spending per Household, Annual Household Income, Non-Mortgage Household Debt, Geographic Region of the U.S. of the Household, and Household Location. There are 200 entries for each variable in this database representing 200 different households from various regions and locations in the United States. Annual Food Spending per Household, Annual Household Income, and Non-Mortgage Household Debt are all given in dollars. The variable Region tells in which one of four regions the household resides. In this variable, the Northeast is coded as 1, the Midwest is coded 2, the South is coded as 3, and the West is coded as 4. The variable Location is coded as 1 if the household is in a metropolitan area and 2 if the household is outside a metro area. The data in this database were randomly derived and developed based on actual national norms.The consumer food database contains five variables: Annual Food Spending per Household, Annual Household Income, Non-Mortgage Household Debt, Geographic Region of the U.S. of the Household, and Household Location. There are 200 entries for each variable in this database representing 200 different households from various regions and locations in the United States. Annual Food Spending per Household, Annual Household Income, and Non-Mortgage Household Debt are all given in dollars. The variable Region tells in which one of four regions the household resides. In this variable, the Northeast is coded as 1, the Midwest is coded 2, the South is coded as 3, and the West is coded as 4. The variable Location is coded as 1 if the household is in a metropolitan area and 2 if the household is outside a metro area. The data in this database were randomly derived and developed based on actual national norms.
Provide a 1,600-word detailed, statistical report including the following:
This assignment is broken down into four parts:
Part 1 - Preliminary Analysis (3-4 paragraphs)
Generally, as a statistics consultant, you will be given a problem and data. At times, you may have to gather additional data. For this assignment, assume all the data is already gathered for you.
State the objective:
Describe the population in the study clearly and in sufficient detail:
Discuss the types of data and variables:
Part 2 - Descriptive Statistics (3-4 paragraphs)
Examine the given data.
Present the descriptive statistics (mean, median, mode, range, standard deviation, variance, CV, and five-number summary).
Identify any outliers in the data.
Present any graphs or charts you think are appropriate for the data.
Note: Ideally, we want to assess the conditions of normality too. However, for the purpose of this exercise, assume data is drawn from normal populations.
Part 3 - Inferential Statistics (2-3 paragraphs)
Use the Part 3: Inferential Statistics document.
Hint: A final conclusion saying "reject the null hypothesis" by itself without explanation is basically worthless to those who hired you. Similarly, stating the conclusion is false or rejected is not sufficient.
Part 4 - Conclusion and Recommendations (1-2 paragraphs)
Include the following:
Preliminary Analysis
The purpose of this case study is to statistically explain the data provided by the University of Phoenix in regards to consumer food spending throughout 4 regions of the United States with emphasis on the Midwest categorized as region 2 within the data set. The objectives of the case study will be tested using 5 variables containing a 200-sample data set. The focus of the case study will be centered around three objectives: 1.) Test to determine if the average annual food spending for a household in the Midwest region of the U.S. is more than $8,000 using a 1% level of significance, 2.) Test to determine if there is a significant difference between households in a metro area and households outside metro areas in annual food spending using
α = 0.01, and 3.) Perform three different one-way ANOVA's—one for each of the three dependent variables (Annual Food Spending, Annual Household Income, Non-Mortgage Household Debt) using Region as an independent variable with four classification levels (four regions of the U.S.). Find all significant differences by region.
The parameters around the case study will be used to solve the question, “Is the average annual food spending for a household located in the Midwest region of the United States greater than $8000.00”? The population of the case study is comprised of independent variables which are qualitative data such as North East, Mid-West, South, West regions. The breakdown of the qualitative data is coded in U.S. regions such as 1- Northeast, 2 - Midwest, 3 – South, and 4 - West. The location variable of the data set is identified as number 1 only if the household is in a metropolitan area and number 2 only if the household is outside the metropolitan area.
The data set is also made up of quantitative data that will be used as dependent variables in the case study classified as Annual Household Spending per Household, Annual Household Income, and Non-Mortgage Household Debts which will be measured in US currency. The case study data set contains sample data within Annual Food Spending, Annual Household Income per Household, and Non-Mortgage Household Debt characterized by regions and locations. The independent variable in the data set is the qualitative data called regions divided into four parts of the United States. The calculations have shown a variation amongst the regions and in this case study, the level of measurement will be utilized as a ratio variable. The level of measurement as a ratio will be utilized to solve the question based on a monetary variable.
Descriptive Statistics
The use of descriptive statistics is very important tools that help describe certain features within data sets. The use of data sets provides the user with complete summaries about sample means and also sample measures. The overall function of descriptive statistics is to describe what data is and what data shows. Descriptive statistics help us to simplify large amounts of data in a sensible way ("Descriptive Statistics", 2017). The data presented in this section of the case study will provide a descriptive analysis of the data set derived from using the data analysis function using the analysis tool pack within Microsoft Excel 2016. Within the data presented the identification of outliers were present. The first noticeable outlier was found within the data set for Annual Food Spending with a value of 17740 which was out of range from the upper bound of 16974. The second noticeable outlier was discovered in the data set of Annual Household Income with a value of 96132 which was out of the range of the upper bound. The data set Non-Mortgage Household Debt had no identified outliers within the data set. The data tables compiled below presents the descriptive statistics mean, median, mode, range, standard deviation, variance, CV, and five-number summary.
A.) Descriptive Analysis for Consumer Food data: Annual Food Spending
B.) Descriptive Analysis for Consumer Food data: Annual Household Income
C.) Descriptive Analysis for Consumer Food data: Non-Mortgage Household Debt
Inferential Analysis
In this part of the case study, there will be several tests ran using inferential analysis where predictions from the data will be made taken from the samples provided in the case study. The first test will provide if the average annual food spending for a household in the Midwest region of the U.S. is more than $8,000 using the Midwest region data and a 1% level of significance to test this hypothesis. The second test will be conducted testing to determine if there is a significant difference between households in a metro area and households outside metro areas in annual food spending by letting α = 0. The third test will analyze the quantitative factors of annual food spending, annual household income, and non-mortgage household debt by regions to determine if there are any significant findings.
Test 1
To test whether the average Annual Food Spending per Households in the Midwest region of U.S. is more than $8,000, the data were sorted according to region 2 which is the Midwest Region using the descriptive statistics for the annual household food spending data. The One Sample Z Test was executed to test the null hypothesis the average household spending in Midwest region is equal to $8,000, against the alternative hypothesis that this average was greater than $8,000. The test rejected the null hypothesis and there is also a statistical difference from the calculation means.
Test Hypothesis:
H0: µ = 8000
H1: µ > 8000 = H1: 8660 > 8000
Test Statistics:
z-Test: Two Sample for Means |
||
Annual Food Spending |
Test |
|
Mean |
8659.688889 |
8000 |
Known Variance |
5449631 |
5449631 |
Observations |
45 |
45 |
Hypothesized Mean Difference |
0 |
|
z |
1.34043846 |
|
P(Z<=z) one-tail |
0.09005142 |
|
z Critical one-tail |
2.326347874 |
|
P(Z<=z) two-tail |
0.180102839 |
|
z Critical two-tail |
2.575829304 |
Test 2
The second test was performed to determine if there is a significant difference between households in a metro area and households outside metro areas in annual food spending with α = 0.01. The data were organized according to locations named metro and outside the metro within the annual food spending data set that was obtained using the descriptive analysis excel function. The test performed was a Two Sample Z Test used to test the null hypothesis. The test discovered that there is a significant difference between households in metro and outside the metro. The test rejected the null hypothesis against the alternative hypothesis being there was a significant difference in households between households in metro and outside the metro.
Test Hypothesis:
H0: µ metro = µ outside metro
H1: µ metro ≠ µ outside metro
Test Statistics:
t-Test: Two-Sample Assuming Unequal Variances |
||
1 Inside Metro |
2 Outside Metro |
|
Mean |
9435.933333 |
8261.2625 |
Variance |
10526695.37 |
7904552.956 |
Observations |
120 |
80 |
Hypothesized Mean Difference |
0 |
|
df |
185 |
|
t Stat |
2.719835073 |
|
P(T<=t) one-tail |
0.003576947 |
|
t Critical one-tail |
2.34667322 |
|
P(T<=t) two-tail |
0.007153893 |
|
t Critical two-tail |
2.602665303 |
Test 3
The third test determined whether each of the 3 variables is significantly affected by regional differences amongst the four different regional areas. A One-way ANOVA analysis for each variable was used to test the null Hypothesis that regional means were equal, against the alternative hypothesis that regional means were not equal. The interpretation of the data determined that the null hypothesis was rejected and the alternative hypothesis was accepted. The data reveals that there is a significance difference amongst the regions and within the three different data sets.
Test Hypothesis:
H0: µ NE = µ MW = µ South = µ West
H1: µ NE ≠ µ MW ≠ µ South ≠ µ West
Test Analysis:
The ANOVA calculations display a difference amongst all four regions for Annual Food Spending, but the Northeast Region 1 and West Region 4 have similar annual food spending averaging at $545,084.50. Region Midwest 2 and Region South 3 Annual Food Spending were similar as well with an average of $351,522.00 annually for food spending. The Annual Household Income per Household ranged from a low of $50,508.15 to a high of $58,141.72. However, the ANOVA calculations compared provided an average among all four regions to be $55,117.60. The data from the case study also observed that Non-Mortgage Household Debt appeared not to be a major factor amongst the regions due to the amount of Debt seen in the four different regions. Data showed an Annual Non-Mortgage Debt in Northeast (Region 1) having $824,556.30, Midwest (Region 2) calculating to be $575,322.10, South (Region 3) being $748,678.20, and the West (Region 4) with a $971,274.90 annual debt other than mortgages. The Annual Non-Mortgage Debt calculations have more emphasis on consumer spending other than consumer food spending. The data tables below represent three different one-way ANOVA calculations for the three data sets of dependent variables which will be used as the quantitative data.
ANOVA Tables
ANOVA Table A: Single Factor |
||||||
Region 1 |
||||||
SUMMARY |
||||||
Groups |
Count |
Sum |
Average |
Variance |
||
Annual Food Spending ($) |
60 |
568079 |
9467.98 |
13937489.34 |
||
Annual Household Income ($) |
60 |
3441731 |
57362.2 |
288077734.2 |
||
Non mortgage household debt ($) |
60 |
824556.3 |
13742.6 |
64029624.43 |
||
ANOVA |
||||||
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
84295915653 |
2 |
4.2E+10 |
345.4327364 |
8E-62 |
4.727093 |
Within Groups |
21596646029 |
177 |
1.2E+08 |
|||
Total |
1.05893E+11 |
179 |
ANOVA Table B: Single Factor |
||||||
Region 2 |
||||||
SUMMARY |
||||||
Groups |
Count |
Sum |
Average |
Variance |
||
Annual Food Spending ($) |
45 |
389686 |
8659.69 |
5449631 |
||
Annual Household Income ($) |
45 |
2E+06 |
54458.4 |
1.8E+08 |
||
Non mortgage household debt ($) |
45 |
576322 |
12807.2 |
4.7E+07 |
||
ANOVA |
||||||
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
5.77E+10 |
2 |
2.9E+10 |
364.159 |
1.86E-54 |
4.769637 |
Within Groups |
1.05E+10 |
132 |
7.9E+07 |
|||
Total |
6.82E+10 |
134 |
ANOVA Table C: Single Factor |
||||||
Region 3 |
||||||
SUMMARY |
||||||
Groups |
Count |
Sum |
Average |
Variance |
||
Annual Food Spending ($) |
40 |
313358 |
7834 |
7410059 |
||
Annual Household Income ($) |
40 |
2E+06 |
50508 |
1.72E+08 |
||
Non mortgage household debt ($) |
40 |
748678 |
18717 |
99289894 |
||
ANOVA |
||||||
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
3.934E+10 |
2 |
2E+10 |
211.9474 |
1.26E-39 |
4.791 |
Within Groups |
1.086E+10 |
117 |
9E+07 |
|||
Total |
5.019E+10 |
119 |
ANOVA Table C: Single Factor |
||||||
Region 4 |
||||||
SUMMARY |
||||||
Groups |
Count |
Sum |
Average |
Variance |
||
Annual Food Spending ($) |
55 |
522090 |
9492.545 |
9378327.9 |
||
Annual Household Income ($) |
55 |
3197795 |
58141.73 |
172415144 |
||
Non mortgage household debt ($) |
55 |
971274.9 |
17659.54 |
69306094 |
||
ANOVA |
||||||
Source of Variation |
SS |
df |
MS |
F |
P-value |
F crit |
Between Groups |
7.47E+10 |
2 |
3.73E+10 |
445.98594 |
1.32E-66 |
4.738598 |
Within Groups |
1.36E+10 |
162 |
83699855 |
|||
Total |
8.82E+10 |
164 |
Conclusion
The mean Annual Household Food Spending in the Midwest region did not drastically appear to be significantly different from $8,000. However, the calculations did calculate a mean greater than $8,000 which could predict that the difference in calculations could have happened by chance based on what seasons, available produce, opening, and closing of restaurants, household incomes, etc. The Annual Household Spending test is for the inside the metro location calculated to be significantly different from its location outside the metro. Therefore, the life of living inside the city r metro location is more expensive rather than locations outside the city. The cost of living is skyrocketed based on availability and convenience. Residents moving to the metro area can also be advised to prepare for more expenditure than before; prospective investors can also be advised to prepare for extra expenditure. However, the comparison of the different variables by regions determines the similarities when it comes to Annual Household Spending, but annual incomes vary throughout the various regions. The statistical analysis conducted from this case study attest that the predictions made in the analysis don't extend farther than the means of living life.
Furthermore, the use of this type of information can determine the type of restaurants, health services, stores or even schools that would be beneficial for certain parts of the United States. The proper use of statistics along with sufficient probability that a given variance amongst various groups works on the positive influence of other variables.