In: Statistics and Probability
NASA has been working with utility companies to set up expensive power generating windmills. In order for a site to be effective, the average wind speed must be more than 20 mph. At a specific site, NASA took a sample of forty-five wind speed readings. The sample average wind speed is 21.7 mph with a standard deviation of 4.2 mph. NASA will only build the windmill if they are sure that the true average wind speed is greater than 20 mph.
a. Should they build the windmill? Justify by conducting a hypothesis test at α = 0.05.
b. Describe in the words of the problem what making a Type I error would be and its likely consequences
c. Describe in the words of the problem what making a Type II error would be and its likely consequences.
In: Statistics and Probability
We have two possible models in two-way ANOVA: the fixed effects model and the random effects model. What are the differences between these two models and how does the analysis differ?
In: Statistics and Probability
Suppose that 32% of people have a dog, 27% of people have a cat, and 12% have both. What is the probability that someone owns a dog but not a cat?
In: Statistics and Probability
Recorded in the table below are the blood pressure measurements (in millimeters) for a sample of 12 adults. Does there appear to be a linear relationship between the diastolic and systolic blood pressures? At the 5% significance level, test the claim that systolic blood pressure and diastolic blood pressure have a linear relationship.
Systolic |
Diastolic |
107 |
71 |
110 |
74 |
133 |
91 |
115 |
83 |
118 |
88 |
134 |
87 |
123 |
77 |
154 |
94 |
119 |
69 |
130 |
76 |
108 |
69 |
112 |
75 |
Data Table: Blood Pressure 7
Hypotheses:
H0: Slope and Correlation are both zero
H1: Slope and Correlation are both not zero
Results:
What is the correlation coefficient? Use 4 decimal places in
answer.
r = __________
What percent of the variation of absences are explained by the
model? Round to nearest hundredth percent (i.e. 65.31%).
R2=____________
What is the equation for the regression line? Use 2 decimal places
in answers.
Diastolic = (Systolic) + _________
State the p-value. Round answer to nearest hundredth percent (i.e.
2.55%).
p-value = ________
Conclusion:
We ___________ sufficient evidence to support the claim that the
correlation coefficient and slope of the regression line are both
statistically different than zero (p_____ 0.05).
(Use “have” or “lack” for the first blank and “<” or “>” for
the second blank.)
In: Statistics and Probability
Three fair coins are tossed simultaneously 10 times. Find the probability that "2 heads and one tail" will show up (a) at least once and (b) at most once.
In: Statistics and Probability
Given the scores on a certain exam are normally distributed with a mean of 75 and a standard deviation of 5
a. Calculate the z-score for 80. Find the percentage of students
with scores above 80
b. Calculate the z-score for 60. Find the percentage of students
with scores below 60.
c. Calculate the z-scores for 70 and 90. Find the percentage of
students with scores between 70 and 90.
d. What is the median?
e. What test score value has a Z-score of -2.25?
f. What test score is the 85th percentile?
In: Statistics and Probability
An experiment was performed on a certain metal to determine if
the strength is a function of heating time (hours). Results based
on 25 metal sheets are given below. Use the simple linear
regression model.
∑X = 50
∑X2 = 200
∑Y = 75
∑Y2 = 1600
∑XY = 400
Find the estimated y intercept and slope. Write the equation of the
least squares regression line and explain the coefficients.
Estimate Y when X is equal to 4 hours. Also determine the standard
error, the Mean Square Error, the coefficient of determination and
the coefficient of correlation. Check the relation between
correlation coefficient and Coefficient of Determination. Test the
significance of the slope.
In: Statistics and Probability
In: Statistics and Probability
Describe the conditions in which a nonparametric test would be a better selection than a parametric test. Illustrate your ideas with a specific example of when you would use each type of test using similar variables for each example.
In: Statistics and Probability
A data set of 27 male African elephants shows that their weights are normally distributed, and have an average weight of 111 kg, with a 95% confidence interval of (104, 119). I was asked to find an appropriate hypothesis test for this data set, and do the calculations and interpret the data, but if I feel that there are no natural hypothesis testing to carry out, then I should state why not. I personally feel there is no natural hypothesis test to do on this data because the average weight is normally distributed, and no outliers, despite the sample size being only 27, I feel that 111 kg is a reasonable weight for baby elephants. Please let me know if my answer is correct, if not please explain to me how to arrive to the right answer, thank you.
In: Statistics and Probability
The Book of R (Question 20.2) Please answer using R code.
Continue using the survey data frame from the package MASS for the next few exercises.
Now, turn back to the ready-to-use mtcars data set. One of the variables in this data frame is qsec , described as the time in seconds it takes to race a quarter mile; another is gear , the number of forward gears (cars in this data set have either 3, 4, or 5 gears).
In: Statistics and Probability
In 2012 the Centers for Disease Control and Prevention reported that in a sample of 4,349 African Americans 31% were Vitamin D deficient. A 90% confidence interval based on this sample is (0.30, 0.32). It is believed that among the general population of Americans 8% suffer from Vitamin D deficiency.
Define the appropriate parameter and state the appropriate hypotheses for testing the claim that, among African Americans, Vitamin D deficiency occurs at a rate other than 8%.
Does this confidence interval provide evidence that among African Americans Vitamin D deficiency occurs at a rate other than 8%? What significance level is being used to make this decision? Briefly justify your answer.
Using the definition of a p-value, explain why the area in the tail of a randomization distribution is used to compute a p-value.
In a test of the hypotheses vs. , the observed sample results in a p-value of 0.0256. Would you expect a 95% confidence interval for based on this sample to contain 0? Briefly explain why or why not.
In: Statistics and Probability
Suppose a a medical researcher claims that µ, the mean concentration of lead (in mcg/g, micrograms of lead per gram of medicine)is less than 16 mcg/g. Express in symbolic form the null and alternative hypotheses needed to test the researcher's claim.
H0 : µ ≥ 16 mcg/g
HA : µ < 16 mcg/g
H0 : µ = 16 mcg/g
HA : µ < 16 mcg/g
H0 : µ = 16 mcg/g
HA : µ ≠ 16 mcg/g
H0 : µ > 16 mcg/g
HA : µ = 16 mcg/g
To perform the hypothesis test in Question 4, the researcher selected a simple random samples of the medicine and measured the lead concentration in each. (The sample data are in the StatCrunch data set for this problem.) Use the data set and the results from Question 4 to calculate the p-value for the hypothesis test. Assume lead concentrations are approximately normally distributed. Round your answer to three decimal places; add trailing zeros as needed.
The p-value = [LeadPValue].
DATA
Lead (mcg/g) var2
18
6.5
22
19.5
11.5
16.5
5.5
3
13.5
4
In: Statistics and Probability
A programming team is in the process of testing a new software module. As part of the effort, they need to estimate the success rate of the module when used with a particular operating system. To do this, they plan to run the module on a randomly selected set of computers, record how many individual runs execute properly, and use that result to calculate the sample success rate (p-hat, the number of successes divided by the total number of tests). Assuming a confidence level of 99%, calculate n, the number of computers they need to use for the test in order to ensure a 0.03 margin of error in the success rate. Calculate n for the following two cases: (1) no assumption is made about the value of the sample success rate, and (2) in a recent test of a similar software module, that module ran successfully in 94% of the tests. Round your answers upward to the next higher integer.
(1) If no assumptions are made about the sample success rate, the sample size required to ensure a margin of error of 0.03 is n = .
(2) If it is assumed that the new module will run successfully roughly in 94% of the tests, the required sample size required to ensure a margin of error of 0.03 is n =
In: Statistics and Probability