Question 1. Given the following table answer the questions.
Score
Respondent X Y x-x (x-x) 2 y-y ( y-y) 2 ( x-x) ( y-y)
A. Rose 18 92
B. Bush 36 65
C. Novicevic 24 91
D. Vitell 28 85
E. Walker 25 70
a. Calculate Pearson’s Product Movement Correlation Coefficient (r). Show your work.
b. Based on the correlation coefficient which you calculated, in two words how would you describe the relationship between the two variables in the “test”?
Question 2. PHR Score
Score
Respondent X Y x-x (x-x) 2 y-y ( y-y) 2 ( x-x) ( y-y)
A. Selber 96 92
B. Franklin 56 65
C. Nichols 84 91
D. Vitello 88 85
E. Grado 72 70
a. Calculate Pearson’s Product Movement Correlation Coefficient (r). Show your work.
b. Based on the correlation coefficient which you calculated, in two words how would you describe the relationship between the two variables in the “test”?
Question 3. Which of the two “tests” above would you choose? Why?
In: Statistics and Probability
The types of browse favored by deer are shown in the following table. Using binoculars, volunteers observed the feeding habits of a random sample of 320 deer. Type of Browse Plant Composition in Study Area Observed Number of Deer Feeding on This Plant Sage brush 32% 108 Rabbit brush 38.7% 113 Salt brush 12% 39 Service berry 9.3% 32 Other 8% 28 Use a 5% level of significance to test the claim that the natural distribution of browse fits the deer feeding pattern. (a) What is the level of significance? State the null and alternate hypotheses. H0: The distributions are the same. H1: The distributions are the same. H0: The distributions are the same. H1: The distributions are different. H0: The distributions are different. H1: The distributions are different. H0: The distributions are different. H1: The distributions are the same. (b) Find the value of the chi-square statistic for the sample. (Round the expected frequencies to at least three decimal places. Round the test statistic to three decimal places.) Are all the expected frequencies greater than 5? Yes No What sampling distribution will you use? Student's t normal binomial uniform chi-square What are the degrees of freedom? (c) Estimate the P-value of the sample test statistic. P-value > 0.100 0.050 < P-value < 0.100 0.025 < P-value < 0.050 0.010 < P-value < 0.025 0.005 < P-value < 0.010 P-value < 0.005 (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis that the population fits the specified distribution of categories? Since the P-value > α, we fail to reject the null hypothesis. Since the P-value > α, we reject the null hypothesis. Since the P-value ≤ α, we reject the null hypothesis. Since the P-value ≤ α, we fail to reject the null hypothesis. (e) Interpret your conclusion in the context of the application. At the 5% level of significance, the evidence is sufficient to conclude that the natural distribution of browse does not fit the feeding pattern. At the 5% level of significance, the evidence is insufficient to conclude that the natural distribution of browse does not fit the feeding pattern.
In: Statistics and Probability
Suppose we wanted to predict the selling price of a house, using its size, in a certain area
of a city. A random sample of six houses were selected from the area. The data is
presented in the following table with size given in hundreds of square feet, and sale price
in thousands of dollars.:
Temperature (oF): Xi |
16 |
28 |
13 |
22 |
25 |
19 |
Number of Calls: Yi |
95 |
120 |
70 |
115 |
130 |
85 |
We are interested in fitting the following simple linear regression model: Y = Xβ + ε
a) Calculate X′X, (X′X)-1 and X′Y and then calculate the least squares estimates of β0 and β1.
In: Statistics and Probability
The type of household for the U.S. population and for a random sample of 411 households from a community in Montana are shown below. Type of Household Percent of U.S. Households Observed Number of Households in the Community Married with children 26% 106 Married, no children 29% 101 Single parent 9% 31 One person 25% 102 Other (e.g., roommates, siblings) 11% 71 Use a 5% level of significance to test the claim that the distribution of U.S. households fits the Dove Creek distribution. (a) What is the level of significance? State the null and alternate hypotheses. H0: The distributions are the same. H1: The distributions are different. H0: The distributions are different. H1: The distributions are the same. H0: The distributions are different. H1: The distributions are different. H0: The distributions are the same. H1: The distributions are the same. (b) Find the value of the chi-square statistic for the sample. (Round the expected frequencies to two decimal places. Round the test statistic to three decimal places.) Are all the expected frequencies greater than 5? Yes No What sampling distribution will you use? normal Student's t binomial chi-square uniform What are the degrees of freedom? (c) Find or estimate the P-value of the sample test statistic. (Round your answer to three decimal places.) (d) Based on your answers in parts (a) to (c), will you reject or fail to reject the null hypothesis that the population fits the specified distribution of categories? Since the P-value > α, we fail to reject the null hypothesis. Since the P-value > α, we reject the null hypothesis. Since the P-value ≤ α, we reject the null hypothesis. Since the P-value ≤ α, we fail to reject the null hypothesis. (e) Interpret your conclusion in the context of the application. At the 5% level of significance, the evidence is sufficient to conclude that the community household distribution does not fit the general U.S. household distribution. At the 5% level of significance, the evidence is insufficient to conclude that the community household distribution does not fit the general U.S. household distribution.
In: Statistics and Probability
(1 point) The table below lists the weights of some colleage students in September and later in February of their freshman year. September weight 62 54 71 65 53 58 74 49 66 56 69 74 61 56 71 51 70 52 70 78 59 60 73 60 February weight 67 53 67 71 56 56 77 54 67 55 64 70 56 59 74 56 69 53 65 72 59 57 77 54 At the 0.05 significance level, test the claim of no difference between September weights and February weights. The test statistic is
The critical value is
In: Statistics and Probability
Take a look at the four requirements for binomial probability distributions:
1. Fixed number of single observations (trials)
2. Each trial is independent
3. Each trial must have outcomes that fall into one of two categories (success, failure)
4. The probability of success remains the same for every trial.
Come up with an example scenario in which you would have a binomial probability distribution to work with.
In: Statistics and Probability
Fit a Beta distribution to the following data. Start from initial guesses shape1=3, shape2=6.
0.18573722 0.41073334 0.56355831 0.06673358 0.43762574 0.45158744 0.27369696
0.27787527 0.27522373 0.22834730 0.28829185 0.21342879 0.24438748 0.37434856
0.53836814 0.34561632 0.28320219 0.22540641 0.23575321 0.38607398 0.28625720
0.29384326 0.44312820 0.25625404 0.15563416 0.46424265 0.21000100 0.36114007
0.22198265 0.56719777
What is the estimated shape2 ?
In: Statistics and Probability
For expert using R
I try to solve this question((USING DATA FAITHFUL)) but each time I solve it, I have error , I try it many times. So,everything you write will be helpful..
Modify the EM-algorithm functions to work for a general K component Gaussian mixtures. Please use this function to fit a K= 1;2;3;4 modelto the old faithful data available in R (You need to initialize the EM-algorithm First ).
Which modelseems to t the data better? (Hint: use BIC to compare models.)
Here what I try to use
## EM algorithm for univariate normal mixture
# The E-step
E.step <- function(x, pi, Mu, S2){
K <- length(pi)
n <- length(x)
tau <- matrix(rep(NA, n * K), ncol = K)
for (i in 1:n){
for (k in 1:K){
tau[i,k] <- pi[k] * dnorm(x[i], Mu[k], sqrt(S2[k]))
}
tau[i,] <- tau[i,] / sum(tau[i,])
}
return(tau)
}
#The M-step
M.step <- function(x, tau){
n <- length(x)
K <- dim(tau)[2]
tau.sum <- apply(tau, 2, sum)
pi <- tau.sum / n
Mu <- t(tau) %*% x / tau.sum
S2[1] <- t(tau[,1]) %*% (x - Mu[1])^2 / tau.sum[1]
S2[2] <- t(tau[,2]) %*% (x - Mu[2])^2 / tau.sum[2]
return(list(pi = pi, Mu = Mu, S2 = S2))
}
## The log-likelihood function
logL <- function(x, pi, Mu, S2){
n <- length(x)
ll <- 0
for (i in 1:n){
ll <- ll + log(pi[1] * dnorm(x[i], Mu[1], sqrt(S2[1])) +
pi[2] * dnorm(x[i], Mu[2], sqrt(S2[2])))
}
return(ll)
}
## The algorithm
EM <- function(x, pi, Mu, S2, tol){
t <- 0
ll.old <- -Inf
ll <- logL(x, pi, Mu, S2)
repeat{
t <- t + 1
if ((ll - ll.old) / abs(ll) < tol) break
ll.old <- ll
tau <- E.step(x, pi, Mu, S2)
M <- M.step(x, tau)
pi <- M$pi
Mu <- M$Mu
S2 <- M$S2
ll <- logL(x, M$pi, M$Mu, M$S2)
cat("Iteration", t, "logL =", ll, " ")
}
return(list(pi = M$pi, Mu = M$Mu, S2 = M$S2, tau = tau, logL = ll))
}
## generate data
set.seed(1)
pi <- c(0.3, 0.7)
Mu <- c(5, 10)
S2 <- c(1, 1)
n <- 1000
n1 <- rbinom(1, n, pi[1])
n2 <- n - n1
x1 <- rnorm(n1, Mu[1], sqrt(S2[1]))
x2 <- rnorm(n2, Mu[2], sqrt(S2[2]))
x <- c(x1, x2)
hist(x, freq = FALSE, ylim = c(0, 0.2))
# pick initial values
pi.init <- c(0.5, 0.5)
Mu.init <- c(3, 10)
S2.init <- c(0.4, 2)
#Run EM
A <- EM(x, pi.init, Mu.init, S2.init, tol = 10^-6)
#plot
t <- seq(0, 15, by = 0.01)
y <- pi[1] * dnorm(t, Mu[1], sqrt(S2[1])) +
pi[2] * dnorm(t, Mu[2], sqrt(S2[2]))
y.est <- A$pi[1] * dnorm(t, A$Mu[1], sqrt(A$S2[1])) +
A$pi[2] * dnorm(t, A$Mu[2], sqrt(A$S2[2]))
points(t, y, type = "l")
points(t, y.est, type = "l", col = 2, lty = 2)
# assign observations to components - clustering
d <- function(x) which(x == max(x))
apply(A$tau, 1, d)
apply(A$tau, 1, which.max)
# assess misclassification
table(apply(A$tau, 1, which.max), c(rep(1, n1), rep(2, n2)))
In: Statistics and Probability
D.48 Predicting Percent Body Fat
Data 10.1 introduces the dataset BodyFat. Computer output is shown for using this sample to create a multiple regression model to predict percent body fat using the other nine variables.
Predictor
Coef |
SE Coef |
T |
P |
||
The regression equation is Bodyfat = − 23.7 + 0.0838 Age − 0.0833 Weight + 0.036 Height + 0.001 Neck − 0.139 Chest + 1.03 Abdomen + 0.226 Ankle + 0.148 Biceps − 2.20 Wrist |
|||||
Constant |
−23.66 |
29.46 |
−0.80 |
0.424 |
|
Age |
0.08378 |
0.05066 |
1.65 |
0.102 |
|
Weight |
−0.08332 |
0.08471 |
−0.98 |
0.328 |
|
Height |
0.0359 |
0.2658 |
0.14 |
0.893 |
|
Neck |
0.0011 |
0.3801 |
0.00 |
0.998 |
|
Chest |
−0.1387 |
0.1609 |
−0.86 |
0.391 |
|
Abdomen |
1.0327 |
0.1459 |
7.08 |
0.000 |
|
Ankle |
0.2259 |
0.5417 |
0.42 |
0.678 |
|
Biceps |
0.1483 |
0.2295 |
0.65 |
0.520 |
|
Wrist |
−2.2034 |
0.8129 |
−2.71 |
0.008 |
|
S = 4.13552 |
R-Sq = 75.7% |
R-Sq(adj) = 73.3% |
|||
Analysis of Variance |
|||||
Source |
DF |
SS |
MS |
F |
P |
Regression |
9 |
4807.36 |
534.15 |
31.23 |
0.000 |
Residual Error |
90 |
1539.23 |
17.10 |
||
Total |
99 |
6346.59 |
(a) Interpret the coefficients of Age and Abdomen in context. Age is measured in years and Abdomen is abdomen circumference in centimeters.
(b) Use the p-value from the ANOVA test to determine whether the model is effective.
(c) Interpret R2 in context.
(d) Which explanatory variable is most significant in the model? Which is least significant?
(e) Which variables are significant at a 5% level?
In: Statistics and Probability
Let x = age in years of a rural Quebec woman at the time of her first marriage. In the year 1941, the population variance of x was approximately σ2 = 5.1. Suppose a recent study of age at first marriage for a random sample of 31 women in rural Quebec gave a sample variance s2 = 2.4. Use a 5% level of significance to test the claim that the current variance is less than 5.1. Find a 90% confidence interval for the population variance.
(a) What is the level of significance?
State the null and alternate hypotheses.
Ho: σ2 = 5.1; H1: σ2 > 5.1
Ho: σ2 = 5.1; H1: σ2 ≠ 5.1
Ho: σ2 = 5.1; H1: σ2 < 5.1
Ho: σ2 < 5.1; H1: σ2 = 5.1
(b) Find the value of the chi-square statistic for the sample.
(Round your answer to two decimal places.)
What are the degrees of freedom?
What assumptions are you making about the original
distribution?
We assume a exponential population distribution.
We assume a binomial population distribution.
We assume a uniform population distribution.
We assume a normal population distribution.
(c) Find or estimate the P-value of the sample test
statistic.
P-value > 0.100
0.050 < P-value < 0.100
0.025 < P-value < 0.050
0.010 < P-value < 0.025
0.005 < P-value < 0.010
P-value < 0.005
(d) Based on your answers in parts (a) to (c), will you reject or
fail to reject the null hypothesis?
Since the P-value > α, we fail to reject the null hypothesis.
Since the P-value > α, we reject the null hypothesis.
Since the P-value ≤ α, we reject the null hypothesis.
Since the P-value ≤ α, we fail to reject the null hypothesis.
(e) Interpret your conclusion in the context of the
application.
At the 5% level of significance, there is insufficient evidence to conclude that the variance of age at first marriage is less than 5.1.At the 5% level of significance, there is sufficient evidence to conclude that the that the variance of age at first marriage is less than 5.1.
(f) Find the requested confidence interval for the population
variance. (Round your answers to two decimal places.)
lower limit | |
upper limit |
Interpret the results in the context of the application.
We are 90% confident that σ2 lies within this interval.We are 90% confident that σ2 lies outside this interval. We are 90% confident that σ2 lies above this interval.We are 90% confident that σ2 lies below this interval
In: Statistics and Probability
In a clinic, the systolic blood pressures in mmHg of a random sample of 10 patients with a certain metabolic disorder were collected. Assume blood pressure is normally-distributed in the population. The mean of this sample was 103.2 mmHg with a (sample) standard deviation of 15.0 mmHg. Test the hypothesis that the mean blood pressure of this sample of patients differs from the known population mean sysolic blood pressure of 121.2 mmHg.
Show your working including null hypothesis, alternative hypothesis, test statistic and p-value, and interpret your p-value. Also give a 95% confidence interval. What experimental design is this? What do we mean by “a random sample” in the question?
If the population standard deviation was actually known to be 15.0 mmHg exactly (from previous large studies), compute your p-value in this case.
In: Statistics and Probability
A special diet is intended to reduce systolic blood pressure among patients diagnosed with stage 2 hypertension. If the diet is effective, the target is to have the average systolic blood pressure of this group be below 150. After six months on the diet, an SRS of 28 patients had an average systolic blood pressure of ¯ x = 143 with standard deviation s = 21 . Is this sufficient evidence that the diet is effective in meeting the target? Assume the distribution of the systolic blood pressure for patients in this group is approximately Normal with mean μ . Given a P ‑value between 0.01 and 0.05, what conclusion should you draw at the 5% level of significance? No conclusion can be drawn without knowing the exact P ‑value. Accept the null hypothesis, because the P ‑value is less than the level of significance. Reject the null hypothesis, because the P ‑value is less than the level of significance. Fail to reject the null hypothesis, because the P ‑value is less than the level of significance.
In: Statistics and Probability
The average number of sugar in a generic brand of cereal is 660mg, and the standard deviation is 35mg. Assume the variable is normally distributed.
a.) if a single cereal is selected, find the probability that the sugar content will be more than 670mg.
b.) if a sample of 10 cereals is selected, find the probability that the mean of the sample will be larger than 670mg.
c.)Why is the probability for part (a) greater than for part (b)?
In: Statistics and Probability
The quality control manager at a light bulb factory needs to estimate the mean life of a large shipment of light bulbs. The standard deviation is 108 hours. A random sample of 64 light bulbs indicated a sample mean life of 410 hours. Complete parts (a) through (d) below.
a. Construct a 95% confidence interval estimate for the population mean life of light bulbs in this shipment.
The 95% confidence interval estimate is from a lower limit of ___ hours to an upper limit of ___ hours.
In: Statistics and Probability
Your instructor randomly chose a coin with probability 0.5 and asks you to decide which coin he chose according to the outcome of 3 tosses: Tossing coin 1 yields a head with a probability P(X1 = H) = .3 (and tail with P(X1 = T) = .7). Tossing coin 2 yields a head with a probability P(X2 = H) = .6 (and tail with P(X2 = T) = .4). You earn $1 if you correctly guessed the coin and $0 otherwise. Design the optimum decision rule and estimate your average earning.
In: Statistics and Probability