In: Statistics and Probability
Decide on the two grocery stores to use in this activity
Decide on the 15 products you want to compare.
The brand name, product, and size have to be exactly the same at each store. Therefore, do not compare generic brands as they have different names at different stores.
You may have to wait until your visit to the first store to determine the “size” as you may not be aware of the different size packages for different products.
Use a variety of products to get a good representation of all items at the stores.
At each store, record the price of each product on your list. (A question always comes up whether to use a sales prices or a club card price. You should use the price of the item that you’d pay on the particular day you visit the store.)
If you didn’t record the prices in an electronic spreadsheet (such as an Excel spreadsheet) at each store, do so after you collect all your data.
Questions to answer after collecting your data
The question of interest is, “Are the items at one of the two grocery stores in your study more expensive, on average, than the other store?”
Answer these questions to answer the question of interest. (R tutorial 2 may be helpful in answering some of these questions.)
1. (1 point) Give the two stores you are comparing and a personal motivation on why you chose those two stores.
2. (2 points) Give a brief summary of how you chose the 15 items you used in the study. Do you feel these items are representative of all items at the store? (In other words, do you feel that you’ll be able to answer the question of interest based the items in your sample?) Why or why not?
3. (3 points) What method of inference you used and why? (Include a check of the conditions to use that particular method. If you use a graph to assess any condition, include the graph) (Hint: think about the samples you took – are the samples independent or dependent?)
4. (3 points) State the null and alternative hypotheses in statistical notation. Define any parameters used.
5. (2 points) Obtain and include an appropriate graphical display that will allow you to make an initial guess as to whether you feel the null hypothesis will be rejected or not. (Hint: think about what method you will be using to perform the hypothesis test.) Comment on whether or not you feel the null hypothesis will be rejected and why or why not.
6. (1 point) Perform the analysis in R. Report the test-statistic (with degrees of freedom) and p-value.
7. (3 points) State a conclusion in the context of the problem that answers the question of interest supported with the p-value obtained in #6.
8. (3 points) Use R to construct a 95% confidence interval for the average difference in prices between the two stores. Include and interpret the confidence interval in the context of the problem. (3 pts)
9. (2 points) Which store would you shop at? Why?
10. (2 points) Provide a copy of your data.
DATA:
Vons:
Almond Milk 3.49
Strawberry pop tarts 2.59
1 lb Bananas .69
Head lettuce 1.69
Pace Salsa 3.39
Ball park beef franks 4.49
Ball park buns 2.49
Kraft American Cheese 5.99
Crest toothpaste 4.00
Strawberries 3.50
Special K 4.99
Hidden Valley Ranch 3.99
Core Water 1.99
Jif Peanut Butter 3.09
Egglands best 3.99
Smiths:
Almond Milk 3.19
Strawberry pop tarts 2.29
1 lb Bananas .59
Pace Salsa 3.29
Head lettuce .99
Ball Park Beef Franks 4.99
Ball Park Buns 2.99
Kraft American Cheese 3.19
Crest toothpaste 2.99
Strawberries 2.50
Special K 2.49
Core Water 1.50
Jif Peanut Butter 2.79
Hidden Valley Ranch 3.29
Egglands Best 2.89
I begin to answer from question no. 3 onwards as questions 1 and 2. only you can answer.
3. to test equality of means of prices at two store we use two sample paired t test with unknown variances. Before performing this test we need to make sure that the the each sample comes from a population which is normally distributed. We also need to test if the two population (all the products in two stores) have same variance.
Now see the Normal probabilty plot of the prices for two stores. All the points lie within 95% confidence interval.It means prices are normally distributed. You can also conclude by looking at p-values of Anderson-Darling test shown on the right side of the graph. They are large (not less than 0.05),hence, prices are normally distributed
Now following is the result of test for equality of variances.
Null hypothesis Sigma(Vons) / Sigma(Smiths) = 1
Alternative hypothesis Sigma(Vons) / Sigma(Smiths) not = 1
Significance level Alpha = 0.05
Statistics
Variable N StDev Variance
Vons 15 1.345 1.808
Smiths 15 1.058 1.120
Ratio of standard deviations = 1.270
Ratio of variances = 1.614
Tests
Method DF1 DF2 Statistic P-Value
F Test (normal) 14 14 1.61 0.381
Levene's Test (any continuous) 1 28 0.87 0.358
We can observe that p-values are not small enough for the rejection of te null hypothesis in both the above test. Hence we conclude that population variance is same.
4.The means of the two sample are 3.358 and 2.665 for Vons ans Smiths respectively. Therefore we will test the mean price for the smith store is statistically smaller.Hence, null and alternative hypothesis is defined as follows.
H0 : Mv = Ms against H1 : Mv is greater than or equal to Ms
5. If you observe the following graph of product prices for the two stores we can except for some points vons prices are always higher than Smoths price. therefore we expect the null hypothesis to be rejected.
6. following is command used in R for the
analysis.
von=c(3.49,2.59,0.69,1.69,3.39,4.49,2.49,5.99,4.00,3.50,4.99,3.99,1.99,3.09,3.99)
smith=c(3.19,2.29,0.59,3.29,0.99,4.99,2.99,3.19,2.99,2.50,2.49,1.50,2.79,3.29,2.89)
t.test(von,smith,alternative="greater",paired=TRUE,var.equal=TRUE)
and output is following:
Paired t-test
data: von and smith
test statistic t = 1.9705 with degrees of
freedom = 14, p-value = 0.03445
alternative hypothesis: true difference in means is greater than
0
95 percent confidence
interval:
0.07359414 Inf
sample estimates:
mean of the differences
0.6933333
7.Conclusion: looking at the p-value , as it is smaller than 0.05, we conclude that Smith's products are cheaper than Von's product on an average with confidence level 95%.
8. The confidence interval is given in the result produced in 6. Its underlined.
9. Obviously , as Smith's store turns out to be cheaper, one would like to shop at Smith's store.