In: Statistics and Probability
alpha value is 0.05 if it is not specified in the problem.
**Everything should be in r code base.
4. Back to the iris dataset one last time! We wish to estimate the probability of a flower’s species based on the available measurements.
a. Build a Multinomial model to predict Species based on Sepal.Length. Use it to estimate the probability of each species for a flower with a sepal 6.3 cm long.
b. Build a Multinomial model to predict Species based on Petal.Length. Use it to estimate the probability of each species for a flower with a petal 5.1 cm long.
c. Compare both the residual deviances and the AICs for the two previous models. Which appears to be the “better” model based on these metrics?
d. Build a Multinomial model to predict Species based on all four of the measurements (no interactions). Use the model to estimate the probability of each species for a flower with a Sepal.Length of 6.3 cm, a Sepal.Width of 2.8 cm, a Petal.Length of 5.1 cm, and a Petal.Width of 1.5 cm.
e. The flower described in part d is actually in the dataset. What species was it?
In: Statistics and Probability
If there are 1296 different ways that 4 (6 sided) dice can be rolled, how many of the 1296 possibilities have less than 2 fives rolled?
In: Statistics and Probability
Hank would like to know how many customers are entering his propane store within a given timeframe. Prior data indicate that on average 8 customers arrive in a given hour.
a. Create the appropriate probability distribution below for 0-12 arrivals.
b. What is the probability that 8 or fewer customers will arrive in the next hour?
c. What is the probability that exactly 10 customers arrive in the next hour?
d. What is the probability that more than 12 customers will arrive in the next hour?
e. How likely is it that Hank has a "large crowd" entering his store in the next hour?
In: Statistics and Probability
The table below gives the number of hours five randomly selected students spent studying and their corresponding midterm exam grades. Using this data, consider the equation of the regression line, yˆ=b0+b1xy^=b0+b1x, for predicting the midterm exam grade that a student will earn based on the number of hours spent studying. Keep in mind, the correlation coefficient may or may not be statistically significant for the data given. Remember, in practice, it would not be appropriate to use the regression line to make a prediction if the correlation coefficient is not statistically significant.
Hours Studying | 11 | 22 | 33 | 44 | 55 |
---|---|---|---|---|---|
Midterm Grades | 7070 | 7777 | 8484 | 8888 | 9595 |
1. Find the estimated slope. Round your answer to three decimal places.
2.Find the value of the coefficient of determination. Round your answer to three decimal places.
3.Find the estimated y-intercept. Round your answer to three decimal places
4.Determine the value of the de[endent variable of ^y at x=0
5.According to the equation of the regression line, if the independent variable is increased by one unit what is the change in the dependent variable y?
6.Not all points predicted by the linear model fall on the same line True or False
7.Substitute the values found in 1 and 2 in to the equation in the regression line to find the linear model.According to this model, if the value of the independent variable is increased by one unit, then find the dependent variable y.
In: Statistics and Probability
What two z scores cut off the middle 95% of the normal distribution?
In: Statistics and Probability
In order to analyze water samples using a spectrophotometer or plate reader, it is necessary to turn the molecules of nitrate into a dye molecule that can be quantified. The first step in turning nitrate (NO3-) into a dye molecule is reducing it to a molecule of nitrite (NO2-). This is done by reacting the NO3- with cadmium.
After the reduction reaction, the NO2- is reacted with two additional reagents. The first reagent, Reagent A, is a solution of sulfanilamide and hydrochloric acid. The second reagent, Reagent B, is a solution of N-(1-naphthyl)-ethylenediamine, called NNED for short. The compounds are mixed with the water sample and produce a purple color. The intensity of the purple color is directly related to the concentration of nitrite in the water sample. We can measure how purple the water turns as absorbance on a spectrophotometer and then convert the absorbance to concentration of nitrate.
To make Reagent A, we will need to make a solution of 10.0 g of sulfanilamide in 1 L of 2.4 molar hydrochloric acid (HCl).
The stock solution of HCl is 6 molar HCl. How many milliliters (mL) of 12 M HCl would you add to produce 0.15 liters (L) of HCl? mL HCl
After creating 0.15 L of 2.4 molar HCl solution, how many grams of sulfanilamide will be added? g sulfanilamide
After reacting the nitrate with cadmium to produce nitrite, the nitrite is then reacting with sulfanilamide and N-(1-naphthyl)-ethylenediamine, to produce a purple dye molecule that can be quantified on a spectrophotometer.
The N-(1-naphthyl)-ethylenediamine, called NNED for convenience, reagent is made by mixing 1 gram of NNED in 1 liter of water. However, we don't always want to make an entire liter of solution because the NNED solution only lasts about 1 month before going bad and turning brown.
How many milligrams of NNED will need to be added to make 0.125 liters of solution?
After converting the nitrate into a purple dye, and measuring the absorbance of the purple dye on a spectrophotometer, a standard curve is used to convert the absorbance into concentration.
To make a standard curve, samples with known concentrations of NO3- are run on the spectrophotemeter. The samples with known concentrations are called standards. A linear regression is then performed to relate the concentration of NO3- to measured absorbance values.
Here is a link to a spreadsheet containing a simulated data set. There are standards and their related absorbance values, and there are samples from two sites that were diluted, prior to processing and measuring their absorbances. The groundwater originates from the upslope site, and the hope is that the microbes in the soil are removing the NO3- from the groundwater before it reaches the downslope site.
Using the given data create a standard curve in Excel, and use Trendline to add a linear regression with the equation. Then use the standard curve and the dilutions to determine the concentration of NO3- in all the samples. Using the data analysis tool pack, perform the appropriate t-test to deduce if the nitrate concentration upslope is less than or greater than the nitrate concentration downslope. When performing a t-test using the data analysis tool pack, the output will include the means for both groups.
What is the average NO3- concentration at the upslope site?
Report your answer, from the data analysis tool pak output, to 3 decimal places
What is the average NO3- concentration at the downslope site?
Report your answer, from the data analysis tool pak output, to 3 decimal places
Given the EPA drinking water quality standard is 10 mg/L of nitrate, is the upslope site safe to drink based only on nitrate content? (Enter yes or no)
Is the downslope site safe to drink, based only on NO3- concentration? (Enter yes or no)
Assuming the two sites are hydrologically well connected, the transit time between the two sites is fast, and the two sites cannot be treated as independent samples, what kind of t-test should be performed to show that the upslope site is greater than the downslope site? Enter the letter of your answer choice in the answer blank
A. one-tailed unpaired t-test
B. two-tailed unpaired t-test
C. one-tailed paired t-test
D. two-tailed paired t-test
What is the calculated t statistic, rounded to 4 decimal places?
Is the calculated t statistic greater or less than the critical t value reported by the data analysis tool pack? (enter greater or less)
Is the nitrate concentration at the upslope site significantly greater than the downslope site? (Enter yes or no)
Based on this statistical result, and assuming no diffusion or dilution occurs between the upslope and downslope site, do you think microbes are removing NO3- from the ground water? (Enter yes or no)
DATA
mg N per L | Abs | Sample ID | Upslope Absorbance | Dilution | mg N | Downslope Absorbance | Dillution |
0 | 0 | 1 | 0.449 | 0.01 | 0.316 | 0.5 | |
0.1 | 0.12 | 2 | 0.243 | 0.01 | 0.251 | 0.5 | |
0.2 | 0.225 | 3 | 0.331 | 0.01 | 0.256 | 1 | |
0.4 | 0.432 | 4 | 0.45 | 0.1 | 0.2 | 1 | |
0.6 | 0.585 | 5 | 0.551 | 0.01 | 0.563 | 1 | |
6 | 0.561 | 0.01 | 0.316 | 0.5 | |||
7 | 0.541 | 0.02 | 0.951 | 1 | |||
8 | 0.244 | 0.01 | 0.317 | 1 | |||
9 | 0.532 | 0.01 | 0.2 | 0.5 | |||
10 | 0.5 | 0.02 | 0.269 | 1 | |||
11 | 0.332 | 0.01 | 0.2 | 0.5 | |||
12 | 0.443 | 0.02 | 0.313 | 0.5 | |||
13 | 0.655 | 0.1 | 0.2 | 1 | |||
14 | 0.675 | 0.01 | 0.745 | 1 | |||
15 | 0.5 | 0.1 | 0.119 | 0.5 | |||
16 | 0.39 | 0.01 | 0.103 | 1 | |||
17 | 0.5 | 0.02 | 0.149 | 1 | |||
18 | 0.532 | 0.01 | 0.311 | 0.5 | |||
19 | 0.5 | 0.1 | 0.918 | 1 | |||
20 | 0.108 | 0.01 | 0.328 | 1 | |||
21 | 0.119 | 0.1 | 0.2 | 0.5 | |||
22 | 0.689 | 0.01 | 0.206 | 1 | |||
23 | 0.5 | 0.02 | 0.2 | 0.5 | |||
24 | 0.329 | 0.1 | 0.508 | 0.5 | |||
25 | 0.753 | 0.01 | 0.256 | 0.5 | |||
26 | 0.511 | 0.01 | 0.294 | 0.5 | |||
27 | 0.839 | 0.02 | 0.417 | 0.5 | |||
28 | 0.543 | 0.01 | 0.149 | 1 | |||
29 | 0.392 | 0.02 | 0.118 | 0.5 | |||
30 | 0.444 | 0.01 | 0.201 | 1 |
In: Statistics and Probability
Assume that a simple random sample has been selected and test the given claim. Identify the null and alternative hypotheses, test statistic, P-value, and state the final conclusion that addresses the original claim. Listed below are brain volumes in cm^3 of unrelated subjects used in a study. Use a 0.05 significance level to test the claim that the population of brain volumes has a mean equal to 1099.8cm ^3.
964
1028
1273
1080
1070
1173
1067
1347
1099
1203
In: Statistics and Probability
A test for diabetes classifies 99% of people with the disease as diabetic and 10% of those who don't have the disease as diabetic. It is known that 12% of the population is diabetic.
a) what are the false positive and false negative rates?
b) what is the probability that someone classified as diabetic does in fact have the disease?
i) solve the problem by drawing up a contingency table and
ii) solve the problem using conditional probability and the law of total probability
In: Statistics and Probability
Suppose that in one region of the country, the mean amount of credit card debt per household in households having credit card debt is $8,000, with standard deviation $1,000. Find the probability that the mean amount of credit card debt in a sample of 400 such households will be within $7,925 and $$8,100.
In: Statistics and Probability
Give descriptive statistics about the current COVID-19 crisis (Mean, Mode, Median, Variance, Correlation) with stating your sources.
In: Statistics and Probability
Question one
A researcher in a large supermarket wishes to study sickness
absences among its employees. The
organisation has branches in all the provinces, each branch keeps
full records of sickness leave. A random sample of ten such
branches produced the following data showing the number of
days
of sickness per branch in the year 2017.
18 23 26 30 32 35 39 45 48 54
Required:
a) Using the above data
a). Calculate (manually and using the computer software such as
EXCEL, SPSS etc), a
95% confidence interval for the mean amount of sickness days per
branch.
b). Estimate the number of branches that should be included in a
simple random sample so that a 95% confidence interval for the mean
number of days sickness should not have a width greater than 4 days
5 marks
c)) After the sample was collected, it became apparent that the
branches fell into three natural groups in terms of sales-small,
medium and large. From the data on all of the branches in the
provinces, the researcher found that of 210 randomly selected
staff, 90 worked in small branches, 36 in medium sized branches,
and the rest worked in large branches. In total, 96 of the selected
staff had no days off for sickness, of which 52 worked in small
branches, and 29 worked in large sized branches.
i) Form a table showing the information clearly. 5 marks
ii) Carry out an appropriate statistical test to investigate
whether the size of branch influences the occurrence of sickness
absence, interpret your results clearly.
In: Statistics and Probability
The margin of error in a confidence intervals does not account for all types of error.
(a) What kind of error does the margin of error in a CI account for?
(b) Give an example of a kind of error which the margin of error does NOT account for.
In: Statistics and Probability
Forty-minute workouts of one of the following activities three days a week will lead to a loss of weight. The following sample data show the number of calories burned during -minute workouts for three different activities.
Swimming | Tennis | Cycling |
415 | 385 | 408 |
380 | 485 | 250 |
425 | 450 | 295 |
400 | 420 | 402 |
427 | 530 | 268 |
Use a .05 level of significance. Use Table 1 of Appendix B.
a. What is the sum of the ranks for Swimming, Tennis and Cycling (to the nearest whole number)?
Sum of Rank Swimming | |
Sum of Rank Tennis | |
Sum of Rank Cycling |
b. What is the value of the test statistic (to 2 decimals)?
How many degrees of freedom?
c. What is the -value?
- Select your answer -less than .005between .005 and .01between .01 and .025between .025 and .05between .05 and .10greater than .10
Do these data indicate differences in the amount of calories burned for the three activities?
- Select your answer - Yes No
What is your conclusion?
- Select your answer -Conclude that the populations of calories burned by the three activities are identical.Conclude that the populations of calories burned by the three activities are not identical
In: Statistics and Probability
The following data show the brand, price ($), and the overall score for six stereo headphones that were tested by a certain magazine. The overall score is based on sound quality and effectiveness of ambient noise reduction. Scores range from 0 (lowest) to 100 (highest). The estimated regression equation for these data is
ŷ = 21.258 + 0.327x,
where x = price ($)and y = overall score.
Brand | Price ($) | Score |
---|---|---|
A | 180 | 76 |
B | 150 | 69 |
C | 95 | 63 |
D | 70 | 54 |
E | 70 | 38 |
F | 35 | 24 |
(a)
Compute SST (Total Sum of Squares), SSR (Regression Sum of Squares), and SSE (Error Sum of Squares). (Round your answers to three decimal places.)
SST=SSR=SSE=
(b)
Compute the coefficient of determination
r2.
(Round your answer to three decimal places.)
r2
=
Comment on the goodness of fit. (For purposes of this exercise, consider a proportion large if it is at least 0.55.)
The least squares line provided a good fit as a large proportion of the variability in y has been explained by the least squares line.The least squares line did not provide a good fit as a small proportion of the variability in y has been explained by the least squares line. The least squares line provided a good fit as a small proportion of the variability in y has been explained by the least squares line.The least squares line did not provide a good fit as a large proportion of the variability in y has been explained by the least squares line.
In: Statistics and Probability