In: Statistics and Probability
Scenario: Upon successful completion of the MBA program, imagine you work in the analytics department for a consulting company. Your assignment is to analyze one of the following databases:
Select one of the databases based on the information in the Signature Assignment Options.
Provide a 1,600-word detailed, four part, statistical report with the following sections:
Part 1 - Preliminary Analysis
Generally, as a statistics consultant, you will be given a problem and data. At times, you may have to gather additional data. For this assignment, assume all the data is already gathered for you.
State the objective:
Describe the population in the study clearly and in sufficient detail:
Discuss the types of data and variables:
Part 2 - Descriptive Statistics
Examine the given data.
Present the descriptive statistics (mean, median, mode, range, standard deviation, variance, CV, and five-number summary).
Identify any outliers in the data.
Present any graphs or charts you think are appropriate for the data.
Note: Ideally, we want to assess the conditions of normality too. However, for the purpose of this exercise, assume data is drawn from normal populations.
Part 3 - Inferential Statistics
Use the Part 3: Inferential Statistics document.
Hint: A final conclusion saying "reject the null hypothesis" by itself without explanation is basically worthless to those who hired you. Similarly, stating the conclusion is false or rejected is not sufficient.
Part 4 - Conclusion and Recommendations
Include the following:
Part 1 – Preliminary Analysis
The Consumer Food scenario offers the opportunity to test and determine if the annual household spending on food in the Midwest Region (Region 2), is more than $8,000. To perform this test, we will let 0.01 represent the significance level of alpha. We will assume that the annual food spending is normally distributed among the population. The test will be conducted to determine the differences if any, between the household spending in a metro area compared to non-metro areas. To perform the hypothesis testing, data containing annual household food spending compiled out of 200 samples from four regions will be utilized. The data is broken down by Annual Household Food Spending, Annual Household Income, Non-Mortgage Household Debt, Region, and Location. The Region identifies which part of the US the household is located i.e., Northeast, coded as 1, Midwest is 2, South is 3, and the West 4 (see figure 1). Also, the Location variable is characterized by 1 for metropolitan areas, and 2 for non-metropolitan areas.
The data to be analyzed in the scenario is quantitative data. This is because we are measuring household spending. The distinction between quantitative and qualitative is important to note because it allows one to determine what is/are the level(s) of measurement. Level(s) of measurement is the classification that describes the nature of the information within the numbers assigned to variables and can be described by four different scales: nominal, ordinal, interval, and ratio. The level of measurement utilized for this scenario is Ratio. By utilizing α = 0.01, it is apparent that the difference between households in a metro area and outside metro areas is not that significant when measuring the amount of annual food spending.
Part II – Examination of Descriptive Statistics
Descriptive statistics are important tools to utilize because they give brief descriptive coefficients used in summarizing a given data set. These data sets represent either the whole population or just a sample of it. Generally, Descriptive Statistics is used to measure the central tendency or measure of variability (spread). It is important to note what Descriptive Statistics is utilized because it helps to differentiate standard variations (standard deviations), from the mean, it shows the median, mode, range, CV, and five-number summaries.
The consumer food scenario provides data sampled from 200 households. The following data applies to the Annual Food Spending: The Mean of this data is 8,966, Median is 8,932, Mode is 6,314, Range is 15,154, Standard Deviation is 3125.01, Variance of 9765674.92, and Coefficient Variance is 0.35. The Five number summary for the Annual Food Spending is as follows: Minimum is 2587, Maximum is 17740, Median is 8932, Q1 is 6933.75, and Q3 is 10950. The Annual Household Income data is as follows: Mean is 55,552, Median is 54,957, no Mode, Range is 74,486, Standard Deviation of 14661.36, Variance of 214955478.81, and the Coefficient Variance is 0.26. The Five-number summary for the Annual Household Income is as follows: Minimum is 21647, Maximum is 96132, Median is 54957, Q1 is 46162.96, and Q3 is 64933.54. The following data applies to the Non-Mortgage Household Debt: The Mean is 15,604, Median is 16,100, Mode is 0, Range is 36,374, Standard Deviation is 8583.54, Variance is 73677143.95, and Coefficient Variance is 0.55. The Five-number summary for Non-Mortgage Household Debt is as follows: Minimum is 0, Maximum is 36,374, Median is 16,100, Q1 is 9191.93, and Q3 is 21259.13.
Part III – Examination of Inferential Statistics
The use of inferential statistics is useful for all types of businesses. Inferential Statistics is used in generalizing based on samples of the population and based on the data from which it is drawn. The method in which is it used is in the form of Hypothesis testing. This is a testable statement pertaining to a given parameter or sample. When refereeing to statistics, a Hypothesis test is one that involves observing processes based in random variables. Hypothesis testing is a method used to infer data. To gain a better understanding of the chosen scenario, the author will conduct the first hypothesis testing to determine if the average annual food spending in the Midwest region (region 2), is greater than $8,000. Inferring this information from the given sample data will create opportunities for better marketing and expansionary insight. The second hypothesis test will be performed to determine whether there is a difference in average annual food spending in the metro region versus outside the metro region. Collecting this data through this test will help, over time, to develop trends that can be used to make well informed managerial decisions, i.e., prices of consumer foods. Lastly, the third hypothesis test serves to compare the quantitative factors of annual food spending, annual household income, and non-mortgage household debt, and the effects on the regions involved. The results from the test will provide direction on necessary steps that must be taken to improve the quality of life and serve as an analysis of the differences between the results.
Midwest Household Spending
Data from four different regions of the US were gathered to determine whether the average annual food spending for households in the Midwest region of the U.S. is greater than $8,000. For this hypothesis, a one-sample z test was utilized. This test was to establish whether the null hypothesis (the average household spending in the Midwest region is equal to 8000), versus the alternative hypothesis (the average household spending in the Midwest region is greater than 8000). The hypothesis test is written as:
.
Metro vs Non-Metro Households
To determine whether there were any significant differences between metro and non-metro annual household spending, excel was utilized to first sort the data between the two areas, then the mean, median, mode, min, max, range, standard deviation, variance, and coefficient variance was calculated (see figure 3). This was performed to obtain descriptive statistics. The hypothesis test is written as:
.
For this hypothesis, a two-sample z test was conducted to test the null hypothesis.
Three Variable Comparison
To determine whether each of the three variables (annual spending, annual income, non-mortgage debt), are significantly affected by the regions (NE, MW, S, W), all three were then split into their respective regions. Once the data was calculated by respective regions, a one-way Analysis of Variance (ANOVA), was conducted on each variable to determine whether the null hypothesis (regional means are equal), against the alternative hypothesis (regional means are not equal). The hypothesis test is written as:
.
Observations: Midwest Hypothesis
The Midwest region’s mean household spending was calculated to be 8660, and the standard deviation is 2334.444 (see figure 4). These values affirm the assumption that the average spending is above $8000. To determine whether the observations were by chance, the probability was utilized. Considering the sample size is larger than 30, central limit theorems were assumed. A z score of 1.895666 was obtained, with a p-value of 0.029002. A significance level alpha of 0.01 was used to accept the null hypothesis.
Observation: Metro versus Non-Metro
For the second test, the average household spending in metro areas was calculated at 9436, with a standard deviation of 3244.487. The average household spending for non-metro areas was calculated to be 8261, with a standard deviation of 2811.504. The z score for this test was calculated to be 2.719835, with a p-value of 0.007154. At a 0.01 significance level alpha, the results showed that there is a significant difference between household spending in metro areas versus non-metro areas, thus, the null hypothesis is rejected.
Observation: Three Variable Comparison
Referencing the ANOVA tables, the Annual Food Spending groups produced a fisher (F) value of 3.019608 with a (p) value of 0.030955. This tells us that there is no significant difference in mean spending based on region. Annual Household Income groups produced an F value of 2.599594 and a p-value of 0.053419. This also indicates that there is not a significant difference across the groups. Finally, the Non-Mortgage Household Debt produced an F value of 5.717824, and a p-value of 0.000902, indicating that there is a clear and significant difference in non-mortgage household debt between regions.
Part IV: Conclusion and Recommendations
The data suggest that the mean annual household spending in the Midwest region was not significantly different from $8,000. Although the raw calculations did, in fact, produce a mean that was greater than $8,000, the perception that the population mean was greater than $8,000 can be attributed to probability. The test conducted to determine whether annual household spending is significantly different from the metro area versus non-metro areas found that there were significant differences. The results from that test affirm that the cost of living is more expensive in a metro area than that of non-metro areas. Additionally, based on these results, it is the authors recommend that the prices on commodities be fairly controlled to protect residents from being exploited for large profit gains. The review of hardship allowances and salaries for persons in these areas is also recommended to implement and maintain equality between the two areas.
Performing the ANOVA on each location for the three variables identified that the mean annual spending was not significantly different between the four regions. This suggests that the cost of living by region is similar. This information can be useful to anyone who is afforded the opportunity to obtain different jobs to significantly maximize one’s annual income. It could also be useful for investors interested in the four regions. It would be suggested in the area with the highest payout since living costs are similar. Since annual household income is similar between the regions, businesses and suppliers should price their products at a similar price. Having similar annual household incomes indicate the people in these areas could be indifferent to the cost. Also, non-mortgage household debts were found to be significantly different between the four regions. Recommendations to financial institutes would be to impose restrictions when lending to the regions containing higher annual mean non-mortgage household debt. These regions were identified using the ANOVA and Fisher’s Least Significant Difference (LSD) technique. Households with higher debt should be encouraged with incentives to improve their situations within these regions.