A clinical trial is run to assess the effects of different forms of regular exercise on HDL levels in persons between the ages of 18 and 29. Participants in the study are randomly assigned to one of three exercise groups - Weight training, Aerobic exercise, or Stretching/Yoga – and instructed to follow the program for 8 weeks. Their HDL levels are measured after 8 weeks and are summarized below. Weight Training Aerobic Exercise Stretching/Yoga 48 38 56 49 41 57 51 42 60 53 37 55 45 39 58 47 44 64 54 43 61 49 42 62 50 36 60 47 34 56 48 33 55 45 35 59 44 38 61 55 40 62 52 34 63 55 32 67 50 35 64 Is there a significant difference in mean HDL levels among the exercise groups? Run the test at a 5% level of significance. (10 points) a) Hypotheses (1 point) HO: 1= 2= 3 vs. HA : At least 1 is different b) Summary statstics (2 points) Group Exercise Group N Mean Std Dev 1 Weight Training 17 49.5294 3.4481 2 Aerobic Exercise 17 37.8235 3.7289 3 Stretching/Yoga 17 60 3.5 c) MSB (1 point) MSB= d) MSW (1 point) MSW= e) Compute F-stat (3 points) F-stat= f) P-value (1 point) g) Conclusion (1 point)(circle one ) Accept H0 Reject H0
In: Statistics and Probability
The University of California, Berkeley (Cal) and Stanford University are athletic archrivals in the Pacific 10 conference. Stanford fans claim Stanford's basketball team is better than Cal's team; Cal fans challenge this assertion. In 2004, Stanford University's basketball team went nearly undefeated within the Pac 10. Stanford's record, and those of Cal and the other eight teams in the conference, are listed in In all, there were 89 games played among the Pac 10 teams in the season. Stanford won 17 of the 18 games it played; Cal won 9 of 18. We would like to use these data to test the Stanford fans' claim that Stanford's team is better than Cal's. That is, we would like to determine whether the difference between the two teams' performance reasonably could be attributed to chance, if the Stanford and Cal teams in fact have equal skill.
To test the hypothesis, we shall make a number of simplifying assumptions. First of all, we shall ignore the fact that some of the games were played between Stanford and Cal: we shall pretend that all the games were played against other teams in the conference. One strong version of the hypothesis that the two teams have equal skill is that the outcomes of the games would have been the same had the two teams swapped schedules. That is, suppose that when Washington played Stanford on a particular day, Stanford won. Under this strong hypothesis, had Washington played Cal that day instead of Stanford, Cal would have won.
A weaker version of the hypothesis is that the outcome of Stanford's games is determined by independent draws from a 0-1 box that has a fraction p_{C} of tickets labeled "1" (Stanford wins the game if the ticket drawn is labeled "1"), that the outcome of Berkeley's games is determined similarly, by independent draws from a 0-1 box with a fraction p_{S} of tickets labeled "1," and that p_{S} = p_{C}. This model has some shortcomings. (For instance, when Berkeley and Stanford play each other, the independence assumption breaks down, and the fraction of tickets labeled "1" would need to be 50%. Also, it seems unreasonable to think that the chance of winning does not depend on the opponent. We could refine the model, but that would require knowing more details about who played whom, and the outcome.)
Nonetheless, this model does shed some light on how surprising the records would be if the teams were, in some sense, equally skilled. This box model version allows us to use Fisher's Exact test for independent samples, considering "treatment" to be playing against Stanford, and "control" to be playing against Cal, and conditioning on the total number of wins by both teams (26).
Q1) On the assumption that the null hypothesis is true, the bootstrap estimate of the standard error of the sample percentage of games won by Stanford is ?
Q2) On the assumption that the null hypothesis is true, the bootstrap estimate of the standard error of the sample percentage of games won by Cal is ?
Q3) The approximate P-value for z test against the two-sided alternative that the Stanford and Berkeley teams have different skills is ?
Note :*The z-score for the difference in sample percentages is 2.97685*
In: Statistics and Probability
A social scientist would like to analyze the relationship between educational attainment (in years of higher education) and annual salary (in $1,000s). He collects data on 20 individuals. A portion of the data is as follows:
Salary | Education |
34 | 1 |
52 | 4 |
83 | 2 |
56 | 5 |
75 | 1 |
60 | 3 |
121 | 7 |
52 | 0 |
30 | 2 |
43 | 3 |
93 | 5 |
50 | 6 |
60 | 7 |
58 | 8 |
147 | 10 |
56 | 0 |
81 | 2 |
74 | 6 |
136 | 5 |
25 | 0 |
a. Find the sample regression equation for the model: Salary = β_{0} + β_{1}Education + ε. (Round answers to 2 decimal places.)
Salary^= + Education
b. Interpret the coefficient for Education.
As Education increases by 1 unit, an individual’s annual salary is predicted to increase by $6,530.
As Education increases by 1 unit, an individual’s annual salary is predicted to decrease by $6,530.
As Education increases by 1 unit, an individual’s annual salary is predicted to increase by $8,590.
As Education increases by 1 unit, an individual’s annual salary is predicted to decrease by $8,590.
c. What is the predicted salary for an individual who completed 7 years of higher education? (Round coefficient estimates to at least 4 decimal places and final answer to the nearest whole number.)
Salaryˆ= $
In: Statistics and Probability
3. A researcher is trying to determine the average SAT score for 2018 SAT test-takers. (which is known to follow normal distribution). He gathers a sample of 11 students and calculates x ̄ = 990 and s = 230. What is the 95% confidence interval for the mean IQ of Canadians, based off of this sample?
In: Statistics and Probability
A transect is an archaeological study area that is 1/5 mile wide and 1 mile long. A site in a transect is the location of a significant archaeological find. Let x represent the number of sites per transect. In a section of Chaco Canyon, a large number of transects showed that x has a population variance σ^{2} = 42.3. In a different section of Chaco Canyon, a random sample of 26 transects gave a sample variance s^{2} = 46.7 for the number of sites per transect. Use a 5% level of significance to test the claim that the variance in the new section is greater than 42.3. Find a 95% confidence interval for the population variance.
(a) What is the level of significance?
State the null and alternate hypotheses.
H_{o}: σ^{2} = 42.3; H_{1}: σ^{2} < 42.3
H_{o}: σ^{2} = 42.3; H_{1}: σ^{2} > 42.3
H_{o}: σ^{2} > 42.3; H_{1}: σ^{2} = 42.3
H_{o}: σ^{2} = 42.3; H_{1}: σ^{2} ≠ 42.3
(b) Find the value of the chi-square statistic for the sample.
(Round your answer to two decimal places.)
What are the degrees of freedom?
What assumptions are you making about the original
distribution?
We assume a uniform population distribution.
We assume a binomial population distribution.
We assume a normal population distribution.
We assume a exponential population distribution.
(c) Find or estimate the P-value of the sample test
statistic.
P-value > 0.1000
.050 < P-value < 0.100
0.025 < P-value < 0.0500
.010 < P-value < 0.0250
.005 < P-value < 0.010
P-value < 0.005
(d) Based on your answers in parts (a) to (c), will you reject or
fail to reject the null hypothesis?
Since the P-value > α, we fail to reject the null hypothesis.
Since the P-value > α, we reject the null hypothesis.
Since the P-value ≤ α, we reject the null hypothesis.
Since the P-value ≤ α, we fail to reject the null hypothesis.
(e) Interpret your conclusion in the context of the
application.
At the 5% level of significance, there is insufficient evidence to conclude conclude that the variance is greater in the new section.At the 5% level of significance, there is sufficient evidence to conclude conclude that the variance is greater in the new section.
(f) Find the requested confidence interval for the population
variance. (Round your answers to two decimal places.)
lower limit | |
upper limit |
Interpret the results in the context of the application.
We are 95% confident that σ^{2} lies below this interval.
We are 95% confident that σ^{2} lies outside this interval.
We are 95% confident that σ^{2} lies within this interval.
We are 95% confident that σ^{2} lies above this interval.
In: Statistics and Probability
Q8 Researcher A recruited a random sample of 100 pandas. Researcher B also recruited a random sample of 100 hippos. Suppose the health scores for the panda population has a mean of 100 and a standard deviation of 20. The health scores for the hippo population has a mean of 100 and a standard deviation of 50. Which researcher is more likely to get a sample mean of 80 or lower? A. Researcher A B. Researcher B
Q9 A sample of 100 people was taken from a population with a mean of 50 and a standard deviation of 20. What is the typical amount that the sample mean deviates from the population mean? A. 1 B. 1.5 C. 2 D. 2.5
In: Statistics and Probability
AT&T would like to test the hypothesis that the proportion of 18- to 34-year-old Americans that own a cell phone is less than the proportion of 35- to 49-year-old Americans. A random sample of 200 18- to 34-year-old Americans found that 126 owned a smartphone. A random sample of 175 35- to 49-year-old Americans found that 119 owned a smartphone. If Population 1 is defined as 18- to 34-year-old Americans and Population 2 is defined as 35- to 49-year-old Americans, and using LaTeX: \alpha α = 0.01, the conclusion for this hypothesis test would be that because the test statistic is _______________________________________________________________. less than the critical value, AT&T can conclude that the proportion of 18- to 34-year-old Americans that own a cell phone is less than the proportion of 35- to 49-year-old Americans less than the critical value, AT&T cannot conclude that the proportion of 18- to 34-year-old Americans that own a cell phone is less than the proportion of 35- to 49-year-old Americans more than the critical value, AT&T can conclude that the proportion of 18- to 34-year-old Americans that own a cell phone is less than the proportion of 35- to 49-year-old Americans more than the critical value, AT&T cannot conclude that the proportion of 18- to 34-year-old Americans that own a cell phone is less than the proportion of 35- to 49-year-old Americans none of these answers are correct
In: Statistics and Probability
The relationship between hospital patient-to-nurse ratio and various characteristics of job satisfaction and patient care has been the focus of a number of research studies. Suppose x = patient-to-nurse ratio is the independent variable. For each of the following potential dependent variables, indicate whether you expect the slope of the least-squares line to be positive or negative and give a brief explanation for your choice.
(a) y = a measure of nurse's job satisfaction (higher values indicate higher satisfaction)
We expect the slope to be negative. With more patients to nurses, the nurses will have more stress and less job satisfaction. We expect the slope to be negative. With more patients to nurses, the nurses will have less stress and more job satisfaction. We expect the slope to be positive. With more patients to nurses, the nurses will have less stress and more job satisfaction. We expect the slope to be positive. With more patients to nurses, the nurses will have more stress and less job satisfaction.
(b) y = a measure of patient satisfaction with hospital care (higher values indicate higher satisfaction)
We expect the slope to be positive. With more patients to nurses, the nurses will have less time with patients and the patients will have less satisfaction with their care. We expect the slope to be positive. With more patients to nurses, the nurses will have more time with patients and the patients will have more satisfaction with their care. We expect the slope to be negative. With more patients to nurses, the nurses will have less time with patients and the patients will have less satisfaction with their care. We expect the slope to be negative. With more patients to nurses, the nurses will have more time with patients and the patients will have more satisfaction with their care.
(c) y = a measure of patient quality of care
We expect the slope to be positive. With more patients to nurses, the nurses will have more time with patients and the quality of care will increase. We expect the slope to be negative. With more patients to nurses, the nurses will have less time with patients and the quality of care will decrease. We expect the slope to be negative. With more patients to nurses, the nurses will have more time with patients and the quality of care will increase. We expect the slope to be positive. With more patients to nurses, the nurses will have less time with patients and the quality of care will decrease.
In: Statistics and Probability
A machine is used to fill containers with a liquid product. Fill volume can be assumed to be normally distributed. A random sample of ten containers is selected, and the net contents (oz) are as follows: 12.03, 12.01, 12.04, 12.02, 12.05, 11.98, 11.96, 12.02, 12.05, and 11.99.
(d) Predict with 95% confidence the value of the 11th filled container.
(e) Predict with 95% confidence the interval containing 90% of the filled containers from the process.
*please calculate by hand (not in R)
In: Statistics and Probability
Two hundred and eighty boys completed a test that measures the distance that the subject can walk on a flat, hard surface in 6 minutes. For each age group shown in the table, the median distance walked by the boys in that age group is also given. Age Group Representative Age (Midpoint of Age Group) Median Six-Minute Walk Distance (meters) 3–5 4 544.3 6–8 7 584.0 9–11 10 667.3 12–15 13.5 701.1 16–18 17 727.6 This experiment also reported the 6-minute walk distances (in meters) for 248 girls age 3 to 18 years. The median 6-minute walk distances for girls for the five age groups, respectively, were 494.4 580.3 657.8 655.6 662.9.
Find the equation of the least-squares regression line that describes the relationship between median distance walked in 6 minutes and representative age for girls. (Round your values to three decimal places.)
(d)
Compute the five residuals. (Round your answers to three decimal places.)
Representative Age (x) |
Residual (meters) |
---|---|
4 | |
7 | |
10 | |
13.5 | |
17 |
C)Find the equation of the least-squares regression line that describes the relationship between median distance walked in 6 minutes and representative age for girls. (Round your values to three decimal places.)
ŷ = ____ + (_____)x
The researchers decided to use a curve rather than a straight line to describe the relationship between median distance walked in 6 minutes and age for girls. What aspect of the residual plot supports this decision?
A.The decision to use a curve is supported by the clear negative linear pattern in the residual plot.
B.The decision to use a curve is not supported by the residual plot as there are no unusual patterns.
C. The decision to use a curve is supported by the clear positive linear pattern in the residual plot.
D.The decision to use a curve is supported by the clear curved pattern in the residual plot.
E.The decision to use a curve is supported by the outliers in the residual plot.
In: Statistics and Probability
Please TYPE your answers:
(a) Name the types of NONsampling error and define them in a sentence or two.
(b) Name the types of sampling error and define them in a sentence or two.
In: Statistics and Probability
During the first 13 weeks of the television season, the Saturday evening 8:00 P.M. to 9:00 P.M. audience proportions were recorded as ABC 30%, CBS 28%, NBC 22%, and independents 20%. A sample of 300 homes two weeks after a Saturday night schedule revision yielded the following viewing audience data: ABC 93 homes, CBS 70 homes, NBC 82 homes, and independents 55 homes. Test with =.05 to determine whether the viewing audience proportions changed. Round your answers to two decimal places.
In: Statistics and Probability
interpreting conclusions of hypothesis tests:
In: Statistics and Probability
Define the following terms: (a) simple random sampling, (b) systematic sampling, (c) systematic random sampling, (d) haphazard sampling, and (e) block sampling. What are specific situations when it would be appropriate to use each and is it ever a good idea to use more than one of these?
In: Statistics and Probability
Please TYPE your answer:
What is the primary feature of a good sample?
In: Statistics and Probability