How is statistical software useful?
In: Statistics and Probability
Exxon is developing a new faster pump design. They've narrowed the development to two design options and are wondering how the pump design might affect daily gas sales (Sales). In a test in 31 stations, they try out the new design A in some stations (code = 2), the new design B in other stations (code = 3) and for control they have stations that have not had a change (code = 1). From prior research, Exxon knows that three other factors are crucial in predicting gas sales for any particular station: advertising amount in the market (Ad), relative pricing (relprice), and the number of competing stations and their density (compet). For any changes, 95% confidence
output: regression results tables
a) run a multiple regression analysis to assess the effect of the new pump designs & Interpret b) are the new designs better than the current design (ie leading to higher sales) c) Assume a 1% profit margin and an investment of $1M for 100 stations for the change to a new design; will any change be profitable within the first year?
Data:
Store | Pump Design | Sales | ad | relprice | compet |
1 | 2 | 29100 | 25410 | 1.18 | 9.4 |
2 | 3 | 25620 | 26400 | 1.14 | 9.4 |
3 | 1 | 23850 | 25950 | 1.18 | 9.7 |
4 | 1 | 25200 | 27010 | 1.20 | 11.9 |
5 | 2 | 21420 | 27850 | 1.24 | 13.4 |
6 | 3 | 21300 | 25090 | 1.46 | 9.6 |
7 | 1 | 21900 | 25700 | 1.54 | 9.2 |
8 | 1 | 23700 | 26670 | 1.48 | 13.6 |
9 | 2 | 22080 | 28780 | 1.48 | 14.4 |
10 | 3 | 21960 | 28350 | 1.48 | 15.3 |
11 | 1 | 17580 | 28970 | 1.48 | 15.1 |
12 | 1 | 19440 | 27440 | 1.66 | 11.8 |
13 | 2 | 20940 | 25820 | 1.76 | 12.8 |
14 | 3 | 19110 | 26130 | 1.88 | 12.4 |
15 | 1 | 20310 | 25290 | 2.00 | 9.3 |
16 | 1 | 20460 | 25440 | 2.14 | 7.9 |
17 | 2 | 25020 | 26330 | 2.08 | 7.8 |
18 | 3 | 22380 | 28780 | 2.02 | 8.4 |
19 | 1 | 23940 | 30510 | 1.88 | 9.1 |
20 | 1 | 25860 | 32740 | 1.70 | 8.8 |
21 | 2 | 28980 | 35940 | 1.58 | 9.2 |
22 | 3 | 24480 | 37740 | 1.50 | 9.8 |
23 | 1 | 24600 | 38610 | 1.50 | 10.3 |
24 | 1 | 26460 | 39190 | 1.44 | 8.8 |
25 | 2 | 29880 | 40400 | 1.48 | 8.2 |
26 | 3 | 29670 | 41330 | 1.46 | 7.5 |
27 | 1 | 24390 | 43030 | 1.42 | 7.1 |
28 | 1 | 25980 | 43930 | 1.40 | 7.2 |
29 | 2 | 30450 | 45600 | 1.42 | 8.9 |
30 | 3 | 32130 | 45870 | 1.42 | 7.7 |
31 | 1 | 26850 | 47160 | 1.38 | 7.4 |
In: Statistics and Probability
Use this scenario to answer questions below.
The Collins Research Crew (CRC) is interested in examining the number of vape/smoking stores (i.e. stores that sell vaping and cigarette/cigar smoking products) in low-income neighborhoods compared to other types of neighborhoods. CRC's research question is, "Do low-income neighborhoods have more vape/smoke shops than other types of neighborhoods?" Low-income neighborhoods were defined as those where the median household income is less than the U.S. federal poverty line. Non-low-income neighborhoods are those that the median household income is greater than the U.S. federal poverty line.
CRC employed a team of undergraduate researchers to go out and count the number of vape/smoke shops in a random selection of low-income and non-low-income neighborhoods. They define the population as all neighborhoods in King County.
They found a significant difference in the number of vape/smoke shops across neighborhoods. Specifically, low-income neighborhoods had a greater number of vape/smoke shops compared to non-low-income neighborhoods.
Match the null hypothesis, directional hypothesis, and non-directional hypothesis with their most appropriate statement.
Null Hypothesis
[ Choose ]
There is a relationship between the average number of vape/smoke shops and neighborhood type..
There is no relationship between the number of vape/smoke shops and neighborhood type.
Low-income neighborhoods have more vape/smoke shops than non-low-income neighborhoods.average.
Directional Hypothesis
[ Choose ]
There is a relationship between the average number of vape/smoke shops and neighborhood type..
There is no relationship between the number of vape/smoke shops and neighborhood type.
Low-income neighborhoods have more vape/smoke shops than non-low-income neighborhoods.average.
Non-directional hypothesis
[ Choose ]
There is a relationship between the average number of vape/smoke shops and neighborhood type..
There is no relationship between the number of vape/smoke shops and neighborhood type.
Low-income neighborhoods have more vape/smoke shops than non-low-income neighborhoods.average.
Given the research question asked in the scenario above, the best research hypothesis is a non-directional hypothesis.
True OR False
"Specifically, low-income neighborhoods had a greater number of vape/smoke shops compared to non-low-income neighborhoods." What is this sentence indicating?
A. Low-income people vape/smoke at higher levels than the average King County resident
B. Any difference is due to chance and not some systematic influence
C. Any difference is due to some systematic influence and not by chance
D. There is no difference in the number of vape/smoke shops
In: Statistics and Probability
Name the pros and cons of spreadsheets in statistics.
In: Statistics and Probability
Students of a large university spend an average of $7 a day on lunch. The standard deviation of the expenditure is $2. A simple random sample of 25 students is taken. What is the probability that the sample mean will be at least $4? Jason spent $15 on his lunch. Explain, in terms of standard deviation, why his expenditure is not usual. Explain what information is given on a z table. For example, if a student calculated a z value of 2.77, what is the four-digit number on the z table that corresponds with that value? What exactly is that 4-digit number telling us? Explain why we use z formulas. Why don't we just leave the data alone? Why do we convert? must show work
In: Statistics and Probability
A research project has been tracking the health and cognitive functions of the elderly population in Arizona. The table below shows the memory test scores from 10 elderly residents, tested first when they were 65 years old and again when they were 75 years old. The researcher wants to know if there is a significant decline in memory functions from age 65 to age 75 based on this sample. In other words, it is hypothesized that the memory score at age 75 is significantly lower than the memory score at age 65. So the null and alternative hypotheses should be directional. The alpha level was set at α = .05 for a one-tailed hypothesis test.
Memory score |
||
Subject |
Age 65 |
Age 75 |
1 |
62 |
60 |
2 |
95 |
88 |
3 |
55 |
56 |
4 |
90 |
89 |
5 |
98 |
90 |
6 |
73 |
75 |
7 |
73 |
70 |
8 |
71 |
75 |
9 |
82 |
80 |
10 |
66 |
62 |
d. Calculate the difference score by subtracting each “Age 65” score from the associated “Age 75” score for each subject. Fill in the column in the table below for “difference score.” (1 point total: deduct .5 for each error up to 1 point.)
Hint: The difference score is calculated as (age 75 minus age 65), so a negative number indicates a decline in memory performance, which is the researcher’s hypothesis.
Subject |
Difference score (Age 75 – Age 65) |
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
|
8 |
|
9 |
|
10 |
e. Calculate the mean from the sample of difference scores
f. Estimate the standard deviation of the population of difference scores
g. Calculate the standard error (standard deviation of the sampling distribution)
h. Calculate the t statistic for the sample of difference scores
i. Figure out the degree of freedom, and then determine the critical t value(s) based on the type of test and the preset alpha level.
j. Compare the t statistic with the critical t value. Is the calculated t statistic more extreme or less extreme than the critical t value? Then make a decision about the hypothesis test, stating explicitly “reject” or “fail to reject” accordingly. (2 points total: 1 for each answer)
k. Interpret the result in 1-2 sentences to answer the research question
l. Calculate the standardized effect size of this hypothesis test
In: Statistics and Probability
Please answer the following questions:
Check all outliers of this data of book
costs(dollars).
|
|
In the following data set of candy bag weights(lbs), determine
the z-score of 20.
|
z-score = | |
[three decimal places] |
A distribution of book costs(dollars) has the following 5-number
summary. What percentage of data is between 51 and 65 ?
25 | 51 | 65 | 93 | 118 |
Min | Q1 | Median | Q3 | Max |
Percentage is | % |
[do no include the % sign] |
In: Statistics and Probability
A data set is given below. (a) Draw a scatter diagram. Comment on the type of relation that appears to exist between x and y. (b) Given that x̅ = 3.6667, sx = 2.0656, ŷ = 4.2000, sy = 1.4805, and r = −0.9287, determine the least-squares regression line. (c) Graph the least-squares regression line on the scatter diagram drawn in part (a).
x y
1 5.2
2 5.8
3 5.4
4 3.8
6 2.4
6 2.6
(a) Choose the correct graph below.
Graph B
There appears to be a linear, negative relationship.
(b)
ŷ =__?__x+(__?__)
(Round to three decimal places as needed.)
In: Statistics and Probability
In this exercise, we examine the effect of combining investments
with positively correlated risks, negatively correlated risks, and
uncorrelated risks. A firm is considering a portfolio of assets.
The
portfolio is comprised of two assets, which we will call ''A" and
"B." Let X denote the annual rate of return from asset A in the
following year, and let Y denote the annual rate of return from
asset B in the following year. Suppose that
E(X) = 0.15 and E(Y) = 0.20,
SD(X) = 0.05 and SD(Y) = 0.06,
and CORR(X, Y) = 0.30.
(a) What is the expected return of investing 50% of the portfolio
in asset A and 50% of the portfolio in asset B? What is the
standard deviation of this return?
(b) Replace CORR(X, Y) = 0.30 by CORR(X, Y) = 0.60 and answer the
questions in part (a). Do the same for CORR(X, Y) = 0.60, 0.30, and
0.0.
(c) (Spreadsheet Exercise). Use a spreadsheet to perform the
following analysis. Suppose that the fraction of the portfolio that
is invested in asset B is f, and so the fraction of the portfolio
that is invested in asset A is (1 f). Letting f vary from f = 0.0
to f = 1.0 in increments of 5% (that is, f = 0.0, 0.05, 0.10, 0.15,
. . . ), compute the mean and the standard deviation of the annual
rate of return of the portfolio (using the original data for the
problem). Notice that the expected return of the portfolio varies
(linearly) from 0.15 to 0.20, and the standard deviation of the
return varies (non-linearly) from 0.05 to 0.06. Construct a chart
plotting the standard deviation as a function of the expected
return.
(d) (Spreadsheet Exercise). Perform the same analysis as in part
(c) with CORR (X, Y) = 0.30 replaced by CORR(X, Y) = 0.60, 0.0,
0.30, and 0.60.
In: Statistics and Probability
For the accompanying data set, (a) draw a scatter diagram of the data, (b) compute the correlation coefficient, and (c) determine whether there is a linear relation between x and y.
Data set
x |
7 |
6 |
6 |
7 |
9 |
|
---|---|---|---|---|---|---|
y |
3 |
2 |
6 |
9 |
5 |
Critical Values for Correlation Coefficient
n |
|
---|---|
3 |
0.997 |
4 |
0.950 |
5 |
0.878 |
6 |
0.811 |
7 |
0.754 |
8 |
0.707 |
9 |
0.666 |
10 |
0.632 |
11 |
0.602 |
12 |
0.576 |
13 |
0.553 |
14 |
0.532 |
15 |
0.514 |
16 |
0.497 |
17 |
0.482 |
18 |
0.468 |
19 |
0.456 |
20 |
0.444 |
21 |
0.433 |
22 |
0.423 |
23 |
0.413 |
24 |
0.404 |
25 |
0.396 |
26 |
0.388 |
27 |
0.381 |
28 |
0.374 |
29 |
0.367 |
30 |
0.361 |
Compute the correlation coefficient.
The correlation coefficient is
r=__?__.
(Round to three decimal places as needed.)
In: Statistics and Probability
Test the hypothesis using the P-value approach. Be sure to verify the requirements of the test. H0: p=0.55 versus H1: p<0.55
n=150, x=72, α=0.01
Is np01−p0≥10?
No
Yes
In: Statistics and Probability
Please find the percentage for all the questions below.
Scores for professional golfers on 18-hole courses are
bell-shaped with a mean of 72 strokes and a standard deviation of 5
strokes. Using the Empirical Rule, what is the approximate
percentage of golfer scores between 62 and 82 strokes?
Percentage is:
A child's piano practice times are normally distributed with a
mean of 22 minutes and a standard deviation of 6 minutes. Using the
Empirical Rule, what is the approximate percentage of practice
times running between 16 and 28 minutes?
Percentage is:
Weights of Old English Sheepdogs are normally distributed with a
mean of 59 pounds and a standard deviation of 6 pounds. Using the
Empirical Rule, what is the approximate percentage of sheepdogs
weighing between 41 and 77 pounds?
Percentage is:
In: Statistics and Probability
The data below are the survival times after treatment (in days) of some advanced colon cancer patients who were treated with ascorbate.
248, 377, 189, 1843, 180, 537, 519, 455, 406, 365, 942, 776, 372, 163, 101, 20, 283
(a)
i) What is the point estimate for the average or mean of this data?
ii) Report an appropriate 96% confidence interval estimate for the mean survival time after treatment of all advanced colon cancer patients who might be treated with ascorbate. Assume the survival times are normally distributed and take Standard Deviation to be 427.17.
iii) Interpret the confidence interval in words and in context.
(b) Based on the data at hand, would 600 days be considered a reasonable guess as to the average survival time?
(c) What was the margin of error (ME) of the confidence interval?
(d) If we wanted to be 99% confident that our estimate was within 60 days of the population mean survival time, how many patients should we observe?
(e) If the sample included more patients, would the 96% confidence interval have been narrower or wider?
In: Statistics and Probability
One of our dealerships has the following average unit sales per team member. We are concerned that the dealership variance and the population variance are different. We would like to be 95% confident about our findings. Historically the population variance has been about 390,000. Use the following data as a basis for making inferences about the population variance. Prepare the information about a Confidence Interval and Hypothesis Testing.
Team Member | Average Dollar Sale | xbar | x-xbar | x-xbar^2 |
1 | 25000 | |||
2 | 25500 | |||
3 | 23750 | |||
4 | 25250 | |||
5 | 24250 | |||
6 | 24750 | |||
7 | 25750 | |||
8 | 24500 | |||
9 | 25375 | |||
10 | 24625 |
Find:
1. N = ?
2. Sample variance?
3. Confidence coefficient?
4. Level of significance?
5. Chi-square value (Lower tail)?
6. Chi-square value (Upper tail)?
7. Point estimate?
8. Lower limit?
9. Upper limit?
10. Sample Meah?
11. Hypothesized value?
12. Test statistic?
13. P-value (two tail 4 decimals)?
14. Conclusion
We want to compare the variances in two dealerships sales of our A and B dealership. Use the following table to develop a comparison about the two population variances. Prepare the information for Hypothesis Testing.15. Mean of first set?
16. Mean of second set?
17. Variance of first set?
18. Variance of second set?
19. Observations of the first set?
20. Observations of the second set?
21. Degrees of freedom first set?
22. Degrees of freedom second set?
23. Calculated test statistic? (4 decimal places)
24. P=Value? (4 decimal places)
25. Critical value? (4 decimal places)
In: Statistics and Probability
If Washington state has more cases of hep C reported in 2010, 105,800 than OR 79,800. Why is the reported prevalence rate higher in OR (3.05) than WA (2.30) per 100
assuming OR population at that time 2964621 and WA 5143186
In: Statistics and Probability