In a 2008 survey, people were asked their opinions on astrology - whether it was very scientific, somewhat scientific, or not at all scientific. Of 1436 who responded,71 said astrology was very scientific.
a. Find the proportion of people in the survey who believe astrology is very scientific. Answer ______ (Round to four decimal places as needed)
b. Find a 95% confidence interval for the population proportion with this belief. Answer (____, and _____)
c. Suppose a TV news anchor said that 5% of people in the general population think astrology is very scientific. Would you say that is plausible? Which one is correct?
Choose the correct answer below A,B,C or D Answer ______
A.This is not plausible because 5% is outside the interval.
B.This is not plausible because 5% is inside the interval.
C.This is plausible because 5% is inside the interval.
D.This is plausible because 5% is outside the interval.
In: Statistics and Probability
A statistics professor is at a supermarket waiting in line to buy some groceries. While waiting for the line to move he listened to several people complaining about the long delays. From the different conversions going on he quickly gathers some inform and estimates from the large sample that the mean waiting time is about 12 minutes. He then estimates the population standard deviation to be 1.5 minutes.
1. (a) Explain in detail using full sentences how he should go about finding a 90% confidence interval for the time he has to wait in line to be served. Enumerate each step.
2. Let’s assume that from the question in part (1) it is determined that its margin of error is approximately 3.5 minutes. (a) What does this mean exactly for a 90% confidence interval?
3. Let’s assume now that the professor finished his computation and determined that the 90% confidence interval will be within (8.7 – 15.3) minutes. (a) If he now wants a 95% confidence interval, will the range be bigger or smaller than (8.7 – 15.3) minutes? Explain.
4. It is normal to think that more is better. In the question from part (3) (a) Is the 95% confidence interval better than the 90% interval? Explain. (b) If that is the case why do we not use a 100% confidence interval? Explain.
In: Statistics and Probability
Provide an appropriate answer for each of the mean confidence interval problems.
1) Construct a 94% confidence interval for the population mean, μ. Assume the population has a normal distribution. A sample of 40 part-time workers had mean annual earnings of $3120 with a standard deviation of $677. Round to the nearest dollar. (Show work)
2) In order to set rates, an insurance company is trying to estimate the number of sick days that full time workers at a local bank take per year. Based on earlier studies it is known that the standard deviation is 12.3 days per year. How large a sample must be selected if the company wants to be 95% confident that their estimate is within 3 days of the true mean? Provide an appropriate answer for each of the proportion confidence interval problems.
3) A survey of 500 non-fatal accidents showed that 122 involved uninsured drivers. Construct a 96% confidence interval for the proportion of fatal accidents that involved uninsured drivers. (Show work)
In: Statistics and Probability
Cotinine level(ng/ml) was measured in the meconium of newborns of mothers who were active, passive or nonsmokers. There were consecutive women arriving for delivery at one hospital. The alkaloid, cotinine is the main metabolite of nicotine. with a half-life of around 20 hours and detectable for several days after exposure, it is a biomarker for exposure to tobacco smoke.
Cotinine level (ng/ml)
Active Smokers (490, 418, 405, 328, 700, 292, 295, 272, 240, 232)
Passive Smokers ( 254, 219, 287, 257, 271, 282, 148, 273, 350, 293)
Nonsmoker ( 158, 163, 153, 207, 211, 159, 199, 187, 200, 213)
1. Create your own Reditol file to store, analyze adn graph this data set as called for in the questions below. Save this R program file as smoking Elba.R if your name is Elba, else use your name. I should be able to execute your code to produce the answers and the graph you submitted for this problem.
Descriptive statistics for Cotinine level for these 3 smoking groups. Round-off to appropriate levels.
smoking group N Mean SD 95% CI Mean (by t-distribution)
Active Smokers 10 ______ _______ _____________
Passive Smokers 10 ________ _______ __________________
Nonsmokers 10 ________ _________ __________________
2. Perform one-way Anova. Report p-value to ful resolution; round-off the other statistics to appropriate levels.
source df SS MS F P
Age group 2 ______ _______ _______ _____
Unexplained 27 ______ ________ ______ ______
3. Perform Kruskal-Wallis test.
P= ______
4. R-square : Among groups= _______%
5. Cohen's D= ___________( standardized effect size)
6. using plotmeans() in the gplots package, construct a publication quality graph of the means with their 95% confidence inervals calculated using the t-distribution.
7. In several sentences suitable for scientific journal, express the results of the one-way Anova and associated analysis including insight from effect sizes, CI, multiple comparisons and graphical visualization.
In: Statistics and Probability
In a section of an English 201 class, the professor decides to be “generous” with the students and will grade the next exam in a unique way. Grades will be assigned according to the following rule: The top 10% receive A’s, the next 20% receive B’s, the middle 40% receive C’s, the next 20% receive D’s, and the bottom 10% receive F’s. Some may refer to this type of grading as “curving” which gave rise to the phrase, “Professor, do you curve the grades?”
1. (a) Where did the term “curving” come from? (b) Which curve is this referring to? (c) What do you think is the purpose of grading exams on a “curve”.
2. Students usually like when professors grade on a curve even though it is likely they do not understand what that involves. (a) Who do you think benefits from grading exams in a curve? Do you think all students will like this method? Explain.
3. Do you think this method of grading is fair to all students? Under what circumstances would some students NOT like this method of grading? Explain.
4. Under pressure from the students to grade an exam on a curve a Math Professor proposes the following curving method: The top 5% receive A’s, the next 10% receive B’s, the middle 30% receive C’s, the next 35% receive D’s, and the bottom 20% receive F’s. (a) What is the difference between this curving method and the one from the method specified at outset? (b) Do you think the students will accept this “curving” approach? Explain.
In: Statistics and Probability
15 17 15 18 13 13 15 18 17 11 (i) Use a calculator with sample mean and standard deviation keys to find x and s. (Round your answers to two decimal places.) x = s =
In: Statistics and Probability
5. Let n = 60, not a product of distinct prime numbers.
Let Bn= the set of all positive
divisors of n. Define addition and multiplication to be lcm and gcd
as well. Now show
that Bn cannot consist of a Boolean algebra under those two
operators.
Hint: Find the 0 and 1 elements first. Now find an element of Bn
whose complement
cannot be found to satisfy both equalities, no matter how we define
the complement
operator.
In: Statistics and Probability
A retail company has started a new advertising campaign in order to increase sales. In the past, the mean spending in both the 18–35 and 35+ age groups was at most $70.00.
a. Formulate a hypothesis test to determine if the mean spending has statistically increased to more than $70.00.
b. After the new advertising campaign was launched, a marketing study found that the sample mean spending for 400 respondents in the 18–35 age group was $73.65, with a sample standard deviation of $56.60. Is there sufficient evidence to conclude that the advertising strategy significantly increased sales in this age group with significance level of 5%?
c. For 600 respondents in the 35+ age group, the sample mean and
sample standard deviation were $73.42 and $45.44, respectively. Is
there sufficient evidence to conclude that the advertising strategy
significantly increased sales in this age group with significance
level of 5%?
In: Statistics and Probability
In an outpatient clinic, a nurse practitioner observes a high prevalence of obesity in young female patients. The nurse practitioner further observes that there appears to be an association between the economic status of the patient and their obesity. The nurse practitioner reviews the related literature but does not find a study that directly relates obesity and economic status. The nurse practitioner decides to conduct a study to determine if a relationship exists between obesity in younger females, ages 15 to 18 years, and their economic status. The nurse practitioner elicits help from four outpatient clinics and receives data on the Body Mass Index (BMI) for all female patients from age 15 to 18 years. The nurse practitioner also attains information concerning if the patient is above or below the federal poverty line for their respective family. The nurse practitioner uses a bivariate correlation between BMI and the dichotomous variable of economic status. The results did not reflect a statistically significant relationship.
In: Statistics and Probability
The overhead reach distances of adult females are normally distributed with a mean of
205 cm
and a standard deviation of
8 cm
a. Find the probability that an individual distance is greater than
217.502 17.50
cm.b. Find the probability that the mean for
20
randomly selected distances is greater than 203.50 cm.
c. Why can the normal distribution be used in part (b), even though the sample size does not exceed 30?
a. The probability is 0.0591.
(Round to four decimal places as needed.)
b. The probability is ..........
An engineer is going to redesign an ejection seat for an airplane. The seat was designed for pilots weighing between
130 lb and
171 lb. The new population of pilots has normally distributed weights with a mean of 135 lb
and a standard deviation of 30.1 lb
a. If a pilot is randomly selected, find the probability that his weight is between
130 lb and 171 lb. The probability is approximately....... (Round to four decimal places as needed.)
In: Statistics and Probability
A deficiency of the trace element selenium in the diet can negatively impact growth, immunity, muscle and neuromuscular function, and fertility. The introduction of selenium supplements to dairy cows is justified when pastures have low selenium levels. Authors of a research paper supplied the following data on milk selenium concentration (mg/L) for a sample of cows given a selenium supplement (the treatment group) and a control sample given no supplement, both initially and after a 9-day period.
Treatment | Control |
---|---|
11.3 | 9.1 |
9.6 | 8.7 |
10.1 | 9.7 |
8.5 | 10.8 |
10.4 | 10.9 |
10.6 | 10.6 |
11.9 | 10.1 |
9.9 | 12.3 |
10.8 | 8.8 |
10.4 | 10.4 |
10.2 | 10.9 |
11.3 | 10.4 |
9.2 | 11.6 |
10.6 | 10.9 |
10.9 | |
8.2 |
Treatment | Control |
---|---|
138.3 | 9.4 |
104 | 8.9 |
96.4 | 8.9 |
89 | 10.1 |
88 | 9.6 |
103.8 | 8.6 |
147.3 | 10.3 |
97.1 | 12.3 |
172.6 | 9.4 |
146.3 | 9.5 |
99 | 8.3 |
122.3 | 8.7 |
103 | 12.5 |
117.8 | 9.1 |
121.5 | |
93 |
(a) Use the given data for the treatment group to determine if there is sufficient evidence to conclude that the mean selenium concentration is greater after 9 days of the selenium supplement. (Use α = 0.05. Use a statistical computer package to calculate the P-value. Use μd = μinitial − μ9-day. Round your test statistic to two decimal places, your df down to the nearest whole number, and your P-value to three decimal places.)
t=
df
=P-value=
(b) Are the data for the cows in the control group (no selenium supplement) consistent with the hypothesis of no significant change in mean selenium concentration over the 9-day period? (Use α = 0.05. Use a statistical computer package to calculate the P-value. Use μd = μinitial − μ9-day. Round your test statistic to two decimal places, your df down to the nearest whole number, and your P-value to three decimal places.)
t=
df=
P-value=
In: Statistics and Probability
Please use R
"Team","WINS","HR","BA","ERA"
"Anaheim Angels",99,152,.282,3.69
"Baltimore Orioles",67,165,.246,4.46
"Boston Red Sox",93,177,.277,3.75
"Chicago White Sox",81,217,.268,4.53
"Cleveland Indians",74,192,.249,4.91
"Detroit Tigers",55,124,.248,4.93
"Kansas City Royals",62,140,.256,5.21
"Minnesota Twins",94,167,.272,4.12
"New York Yankees",103,223,.275,3.87
"Oakland Athletics",103,205,.261,3.68
"Seattle Mariners",93,152,.275,4.07
"Tampa Bay Devil Rays",55,133,.253,5.29
"Texas Rangers",72,230,.269,5.15
"Toronto Blue Jays",78,187,.261,4.8
"Arizona Diamondbacks",98,165,.267,3.92
"Atlanta Braves",101,164,.26,3.13
"Chicago Cubs",67,200,.246,4.29
"Cincinnati Reds",78,169,.253,4.27
"Colorado Rockies",73,152,.274,5.2
"Florida Marlins",79,146,.261,4.36
"Houston Astros",84,167,.262,4
"Los Angeles Dodgers",92,155,.264,3.69
"Milwaukee Brewers",56,139,.253,4.73
"Montreal Expos",83,162,.261,3.97
"New York Mets",75,160,.256,3.89
"Philadelphia Phillies",80,165,.259,4.17
"Pittsburgh Pirates",72,142,.244,4.23
"St. Louis Cardinales",97,175,.268,3.7
"San Diego Padres",66,136,.253,4.62
"San Francisco Giants",95,198,.267,3.54
data on the following variables for the 30 major league baseball teams during the 2002 season: • WINS: number of games won • HR: number of home runs hit • BA: average batting average • ERA: earned run average
(a) Using WINS as the dependent variable, run the regression relating the three predictor variables to WINS. Report the fitted regression line.
(b) Construct the ANOVA table of the above model.
(c) Plot the residuals ei against the fitted values ybi . What departures from the regression model assumptions can be studied from this plot? What are your findings? (Note: If you are not sure about the validity of any of the assumptions, perform a formal test to verify your answer.) 1
(d) Prepare a normal probability plot (QQ plot) of the residuals. Which assumption can be tested from this plot and what do you conclude? (Note: You can also use the formal test to reinforce your conclusion).
(e) If there is no problem with any of the assumptions, you can safely continue on making inference. Test for the significance of the regression using a 0.05 significance level.
(f) What percentage of the variability in y is explained by the regression?
(g) Using the individual t-tests, comment on the significance of each predictor variable, using a 0.05 significance level.
Hint: data=read.table(‘hmw6_prob2.txt’, header=T, sep=‘,’) y=data$WINS x1=data$HR x2=data$BA x3=data$ERA
In: Statistics and Probability
"FATALS","CUTTING"
270,15692
183,16198
319,17235
103,18463
149,18959
124,19103
62,19618
298,20436
330,21229
486,18660
302,17551
373,17466
187,17388
347,15261
168,14731
234,14237
68,13216
162,12017
27,11845
40,11905
26,11881
41,11974
116,11892
84,11810
43,12076
292,12342
89,12608
148,13049
166,11656
32,13305
72,13390
27,13625
154,13865
44,14445
3,14424
3,14315
153,13761
11,12471
9,10960
17,9218
2,9054
5,9218
63,8817
41,7744
10,6907
3,6440
26,6021
52,5561
31,5309
3,5320
19,4784
10,4311
12,3663
88,3060
0,2779
41,2623
2,2058
5,1890
2,1535
0,1515
0,1595
23,1803
4,1495
0,1432
The above contains data on the following two variables
• FATALS: the annual number of fatalities from gas and dust explosions in coal mines for years 1915 to 1978.
• CUTTING: the number of cutting machines in use
(a) Fit the regression model using FATALS as the dependent variable and CUTTING as the independent variable.
(b) Using appropriate residual plots and formal tests, investigate the violation of any assumptions. Do any assumptions of the linear regression model appear to be violated? If so, which one (or ones)?
(Hint: Plot of residuals versus fitted values can be used for linearity, zero mean, and constant variance. Normal probability plot of the residuals can be used for normality. We also have formal tests for the constant variance and normality assumptions that you can do in R).
Hint: data=read.table(‘hmw6_prob3.txt’, header=T, sep=‘,’) y=data$FATALS x=data$CUTTING
In: Statistics and Probability
The data worksheet entitled "FUELCON4" contains the following variables for all 50 states plus the District of Columbia.
FUELCON (y): | Per capita fuel consumption in gallons |
DRIVERS (x1): | The ratio of licensed drivers to private and commercial motor vehicles registered |
HWYMILES (x2): | The number of miles of federally funded highways |
GASTAX (x3) : | The tax per gallon of gasoline in cents |
INCOME (x4): | The average household income in dollars |
Run the regression analysis with FUELCON as the dependent
variable and the other four variables as independent variables and
obtain the appropriate model diagnostic statistics: Use the
Shapiro-Wilk teststatistic to test the assumption
of the normality of the model residuals.
(a) What is being tested here? (Choose one)
The assumption of linearity.The assumption of normally-distributed disturbances. Whether there is a linear relationship between x and y.The assumption of constant variance.The assumption of independence.Whether all of the x variables are important in predicting y.
(b) Which hypotheses are being tested? (Choose one)
H0: β1 = 1.0
Ha: β1 ≠ 1.0H0:
All of the x variables in the model are not
important
Ha: Atleast one of the x variables is
important H0: The model
variance is constant
Ha: The model variance is not
constantH0: Disturbances are normal
Ha: Disturbances are
non-normalH0: β1 = 0
Ha: β1 ≠ 0
(c) State the decision rule.
Reject H0 if p < 0.10.
Do not reject H0 if p ≥ 0.10.Reject
H0 if p > 0.10.
Do not reject H0 if p ≤
0.10. Reject H0 if
p < 0.05.
Do not reject H0 if p ≥ 0.05.Reject
H0 if p > 0.05.
Do not reject H0 if p ≤ 0.05.
(d) What is the name of the test statistic? (Choose one)
Anderson-Darling's A2Shapiro-Wilk's W Test of Constant VarianceKolmogorov-Smirnov's DThe Partial F TestTest of Independence
(e) State the appropriate test statistic name, test statistic
value, and the associated p-value (Enter the test
statistic value to three decimal places, and the p-value
to four decimal places).
---Select--- z A D F t W = ,
p ---Select--- < ≥ ≤ >
=
(f) What conclusion can be drawn from the test result?
Do not reject H0.The assumption of
normally-distributed disturbances has been met.Do not reject
H0. The assumption of independence has been
met. Reject H0. There is
a linear relationship between x and y.Reject
H0. The assumption of independence has not been
met.Do not reject H0. There is not a linear
relationship between x and y.Reject
H0. The assumption of constant variance has not
been met.Do not reject H0. The assumption of
constant variance has been met.Reject H0. The
assumption of normally-distributed disturbances has not been
met.
FUELCON | DRIVERS | HWYMILES | GASTAX | INCOME |
547.92 | 0.85 | 11,849 | 18 | 24426 |
440.38 | 0.81 | 4,532 | 8 | 30997 |
456.9 | 0.9 | 9,455 | 18 | 25479 |
530.08 | 1.07 | 7,949 | 21.7 | 22912 |
426.21 | 0.76 | 32,478 | 18 | 32678 |
474.78 | 0.71 | 11,015 | 22 | 32957 |
432.44 | 0.92 | 3,820 | 25 | 41930 |
492.97 | 0.88 | 1,260 | 23 | 32121 |
461.55 | 0.91 | 17,272 | 13.6 | 28493 |
564.82 | 0.81 | 16,950 | 7.5 | 28438 |
336.97 | 0.92 | 1,089 | 16 | 28554 |
484.83 | 0.69 | 6,466 | 25 | 24257 |
406.99 | 0.8 | 19,700 | 19 | 32755 |
524.01 | 0.74 | 10,261 | 15 | 27532 |
532.39 | 0.61 | 10,037 | 20 | 27283 |
483.31 | 0.81 | 10,494 | 21 | 28507 |
532.77 | 0.77 | 10,302 | 16.4 | 25057 |
513.8 | 0.77 | 8,954 | 20 | 24084 |
472.68 | 0.94 | 3,474 | 22 | 36385 |
463.46 | 0.89 | 6,387 | 23.5 | 34950 |
436.57 | 0.9 | 7,264 | 21 | 38845 |
504.95 | 0.84 | 16,942 | 19 | 29538 |
532.52 | 0.66 | 12,509 | 20 | 32791 |
541.06 | 0.97 | 8,747 | 18.4 | 21643 |
549.16 | 0.92 | 13,580 | 17 | 28029 |
549.35 | 0.68 | 10,456 | 27 | 23532 |
503.1 | 0.79 | 8,067 | 24.5 | 28564 |
448.81 | 1.13 | 5,976 | 24.75 | 29860 |
541.67 | 0.87 | 2,405 | 19.5 | 33928 |
465.52 | 0.89 | 9,150 | 10.5 | 38153 |
504.77 | 0.89 | 9,654 | 18.5 | 23162 |
296.44 | 1.1 | 18,998 | 22 | 35884 |
510.05 | 0.97 | 13,632 | 24.1 | 27418 |
580.32 | 0.66 | 7,415 | 21 | 25538 |
458.31 | 0.74 | 16,807 | 22 | 28619 |
523.89 | 0.68 | 11,123 | 17 | 24787 |
439.09 | 0.85 | 10,138 | 24 | 28000 |
417.36 | 0.87 | 18,448 | 26 | 30617 |
382.82 | 0.88 | 1,037 | 29 | 29984 |
557.53 | 0.92 | 9,272 | 16 | 24594 |
577.84 | 0.7 | 7,753 | 22 | 26301 |
506.3 | 0.83 | 12,036 | 20 | 26758 |
502.17 | 0.93 | 49,678 | 20 | 28486 |
430.53 | 0.87 | 7,310 | 24.5 | 24202 |
555.78 | 0.99 | 2,138 | 20 | 27992 |
529.52 | 0.81 | 14,453 | 17.5 | 32295 |
446.63 | 0.83 | 10,802 | 23 | 31582 |
466.31 | 0.94 | 5,390 | 25.65 | 22725 |
466.08 | 0.83 | 13,088 | 27.3 | 28911 |
715.55 | 0.67 | 7,841 | 14 | 28807 |
289.99 | 1.38 | 391 | 20 | 40498 |
In: Statistics and Probability
Consider the probability that no less than 92 out of 157 registered voters will vote in the presidential election. Assume the probability that a given registered voter will vote in the presidential election is 64%
Approximate the probability using the normal distribution. Round your answer to four decimal places.
In: Statistics and Probability