Gaussian Mixture Model:
the initial means and variances of two clusters in a GMM are as follows: ?(1)=−3, ?(2)=2, ?21=?22=4. Let ?1=?2=0.5.
Let ?(1)=0.2, ?(2)=−0.9, ?(3)=−1, ?(4)=1.2, ?(5)=1.8 be five points that need to cluster.
Need to find
1) p(1|1)
2) p(1|2)
3) p(1|3)
4) p(1|4)
5) p(1|5)
In: Statistics and Probability
An eating disorders clinic would like to evaluate the effectiveness of a mindfulness training program on binge eating disorder (BED). A psychologist was therefore hired to conduct a small study to examine the potential effect of mindfulness training on number of binges per month. The psychologist recruited 10 patients in the clinic to participate in a mindfulness training program and 12 patients in the clinic with similar demographics who do not participate in the mindfulness program. Afterward, all participants reported the number of binges from the previous month. The data are listed in the table below. The psychologist is not predicting a particular direction of the potential differences between the two groups and she sets the alpha level at .05 for the hypothesis test.
In: Statistics and Probability
A national survey conducted in 2005 on Canadian undergraduate students with a questioner asking them whether they do part-time jobs. 1250 students participated in the survey and 802 students said they do part-time jobs. In 2020, researcher claims that there is an increase in undergraduate students doing part-time jobs due to increase in tuition fees. In January 2020, he found 963 students out of randomly selected 1420 students do part-time jobs. Do a hypothesis test at 7% significance level to test the researcher’s claim. Answer the following to do the test:
State null and alternative hypotheses.
State your decision rule.
Calculate the test statistic.
State your conclusion.
Find the p-value of the test.
In: Statistics and Probability
An eating disorders clinic would like to evaluate the effectiveness of a mindfulness training program on binge eating disorder (BED). A psychologist was therefore hired to conduct a small study to examine the potential effect of mindfulness training on number of binges per month. The psychologist recruited 10 patients in the clinic to participate in a mindfulness training program and 12 patients in the clinic with similar demographics who do not participate in the mindfulness program. Afterward, all participants reported the number of binges from the previous month. The data are listed in the table below. The psychologist is not predicting a particular direction of the potential differences between the two groups and she sets the alpha level at .05 for the hypothesis test.
Mindfulness training |
No mindfulness training |
|||
Subject ID # |
# of binges |
Subject ID # |
# of binges |
|
1 |
6 |
11 |
7 |
|
2 |
4 |
12 |
8 |
|
3 |
4 |
13 |
6 |
|
4 |
5 |
14 |
9 |
|
5 |
3 |
15 |
4 |
|
6 |
4 |
16 |
6 |
|
7 |
5 |
17 |
6 |
|
8 |
5 |
18 |
8 |
|
9 |
7 |
19 |
9 |
|
10 |
3 |
20 |
8 |
|
21 |
7 |
|||
22 |
5 |
f) Calculate the pooled standard deviation for the populations and then use it to calculate the standardized effect size of this test.
In: Statistics and Probability
I would like to know how to solve this problem using POM or Excel
Alan Resnik, a friend of Ray Cahnman, bet Ray $5 that Ray’s car would not start 5 days from now (see Problem 14-8).
What is the probability that it will not start 5 days from now if it started today?
What is the probability that it will not start 5 days from now if it did not start today?
What is the probability that it will start in the long run if the matrix of transition probabilities does not change?
In: Statistics and Probability
Consider the following preferences and election problem. Let us assume that a president has to be elected. 4 candidates want to become a president, who are representing different political ideologies: A is a left-wing candidate, B is a social-democrat, C is a right-liberal candidate and D is a right-wing candidate. 20% of the voters (group left) preference A≻B≻C≻D, 30% of the voters (group social democrats) have the preference B≻A≻C≻D, 10% of the voters (group right-liberal) preference C≻B≻A≻D, 40% of the voters (group right) have preference D≻C≻B≻A. Please note the sign “ ≻ “ means “preferred to.”. In total we have 100 voters.
A What is the election outcome if a pure plurality voting system is applied? How many votes does the winner receives?
B Who will be elected if the Instant Runoff voting system is applied? Who will be the first and second ranked candidate in the first election round? How much votes will the winner get in the second round?
C. Who will win if the system of unanimity is applied?
D. Who will win, if the point-count voting is applied? Please note, the voters have to rank their preferences for the first 3 candidates like, the most preferred candidate get 3 points, the second ranked candidate gets 2 points and the third ranked candidate gets 1 point. The fourth preferred candidate get zero points. How many points will be distributed in total and how much points will the winner receive?
In: Statistics and Probability
Problem 11-15 (Algorithmic)
Ocala Software Systems operates a technical support center for its software customers. If customers have installation or use problems with Ocala software products, they may telephone the technical support center and obtain free consultation. Currently, Ocala operates its support center with one consultant. If the consultant is busy when a new customer call arrives, the customer hears a recorded message stating that all consultants are currently busy with other customers. The customer is then asked to hold and is told that a consultant will provide assistance as soon as possible. The customer calls follow a Poisson probability distribution, with an arrival rate of six calls per hour. On average, it takes 8.5 minutes for a consultant to answer a customer's questions. The service time follows an exponential probability distribution. To improve customer service, Ocala Software Systems wants to investigate the effect of using a second consultant at its technical support center.
What effect would the additional consultant have on customer service? Would two technical consultants enable Ocala to meet its service guidelines (no more than 35% of all customers having to wait for technical support and an average customer waiting time of two minutes or less)? Round your answers to two decimal places.
With two consultants, % of customers have to wait, with an average waiting time of minutes.
A. % of customers
B. Average waiting time
In: Statistics and Probability
Multiple Choice
Select the best answer from the available choices for each question.
1. Which of the following is NOT part of the definition of a sample space S?
• S can be discrete or continuous
• Each outcome must be in S at most once
• Each element in S is equally likely
• Each outcome must be in S at least once
• S is a set of possible outcomes in an
experiment
2. Three A’s, three B’s, and two C’s are arranged at random in a row. What is the probability that the letters on both ends are different?
• 0.01
• 0.25
• 0.75
• 0.99
• None of the above
3. When constructing this exam, Diana and Audrey made a question bank of 40 Multiple Choice questions (you get 15), 30 True/False questions (you get 10), 30 Identify the Distribution questions (you get 10), and 4 versions of each of the 5 Long Answer questions. Assuming the order of the questions does not matter, how many different exams could be created?
• 4.029 x 10^10
• 5.260 x 10^22
• 3.718 x 10^28
• 6.403 x 10^53
• none of the above
4. Three people independently choose one of five sections of a course to enroll in. What is the probability that at least two enroll in the same section?
• 0.28
• 0.48
• 0.52
• 0.72
• None of the above
5. Three people are acting in a play. Character A has 50% of the lines, Character B has 40%, and Character C has 10%. Each person has a different probability of making a mistake on any line, independently: 0.03 for A, 0.07 for B, and 0.20 for C. Given that a mistake occurred, what is the probability that it was C’s line?
• 0.02
• 0.063
• 0.317
• 0.444
• None of the above
6. Which matches the conditions for using a Binomial approximation to the Hypergeometric?
• N must be large and n must be small relative to
N
• The sampling must be done with replacement
• n must be large and r/N must be small
• Two or more of the above
• None of the above
7. Suppose the amount of data you use on your phone (in units of 100 MB) has a Poisson distribution with mean 8 per month. You pay $15 per month plus $3 per 100 MB of data. Find the standard deviation of a random month's phone bill.
• 6.245
• 8.485
• 9.327
• 72
• None of the above
8. The amount of time a user plays Animal Crossing has an Exponential distribution with mean 30 minutes. Given that the user has been playing for 1 hour, what is the probability they will still be playing half an hour later?
• 0.012
• 0.135
• 0.368
• 0.632
• None of the above
9. Which of the following 4 statements is FALSE?
• If X~Y (i.e. X and Y have the same distribution),
then they have a correlation of 1
• If X and Y are independent, they have a correlation
of 0
• If X and Y have a correlation of 0, and Y and Z have a correlation of 1, then X and Z have correlation of 0
• Correlation tells us the strength and direction of the linear relationship between two variables, whereas covariance only tells us the direction
• Two or more of the above are false
10. Which of the following 4 statements is TRUE?
• If X and Y have a negative correlation, they are
independent
• If X and Y have a correlation of 0, they are independent
• If X and Y have a positive correlation, and Y and Z have a positive correlation, then X and Z have positive correlation
• If X = -0.5Y, then the correlation of X and Y is -1
• Two or more of the above are true
11. Which of the following 4 statements is FALSE?
• The Normal distribution has mean, median, and mode
all equal to μ
• A Normal random variable with a larger variance has a smaller maximum height of its pdf
• Approximately 95% of the area under the Standard Normal pdf is between -1 and 1
• The cdf F(x) of a Normal random variable is strictly increasing for all Real values of x
• Two or more of the above are false
12. The return on stock X has mean 5 and standard
deviation 3. The return on stock Y has mean 8 and standard
deviation 10. The standard deviation of an equal portfolio of the
two stocks (that is, 0.5X + 0.5Y) is 24.25. What is the correlation
between X and Y?
• -0.2
• 0
• 0.2
• 0.4
• None of the above
13. Suppose the length in cm of a male soccer player's foot follows a Normal distribution with mean 30 and variance 25. Suppose the length in cm of a female soccer player's foot follows a Normal distribution with mean 26 and variance 16. A male and female soccer player are selected at random. What is the probability the female player has a longer foot than the male player?
• 0.09
• 0.27
• 0.33
• 0.73
• 0.91
14. Independently of one another, a randomly surveyed individual can be a non-smoker, a light smoker, or a heavy smoker with probabilities 60%, 30% and 10%, respectively. Suppose we randomly survey a sample of 100 people. Given that 37 of them are light smokers, what is the expected number of heavy smokers?
• 6.3
• 9
• 27
• 54
• None of the above
15. The Princess Theatre has 280 seats. 200 tickets are sold for the new popular movie: The Attack of the Goose. Despite every attendee having an assigned seat, they all decide to sit in seats at random. Find the variance of the number of people sitting in their assigned seat. (Hint: use indicator variables)
• 0.712
• 0.714
• 5.246
• 11.704
• None of the above
In: Statistics and Probability
Amazon's third-largest market is the United Kingdom. In 2019, the employees at a local firm in London received their salary and decided to spend a portion of it on making online purchases from Amazon. The probability that the online purchases will be shipped on time to each employee is 0.68. Determine the following:
a. The probability that three products will be shipped on time.
b. The probability that out of three, two products will be shipped on time.
In: Statistics and Probability
I want to see if there is a relation between makeup sales and the age of people buying them on any given day at Sephora. 10 people were randomly sampled.
X-axis: 13, 14, 16, 17, 21, 25, 30, 37, 50, 62 (age)
Y-axis: 2, 7, 30, 26, 15, 33, 7, 3, 14, 1 (Makeup sales)
What is the impact of using a linear regression model in this case? What options, other than linear regression, can you see? You do not need to collect any data.
For your response to a classmate (two responses required, one in each option), examine your classmate’s problem to assess the appropriateness and accuracy of using a linear regression model. Discuss the meaning of the standard error of the estimate and how it affects the predicted values of Y for that analysis.
In: Statistics and Probability
Failure times of silicon wafer microchips. Researchers at National Semiconductor experimented with tin-lead solder bumps used to manufacture silicon wafer integrated circuit chips. The failure times of the microchips (in hours) were determined at different solder temperatures (degrees Centigrade). The data for one experiment are saved in WAFER file. The researchers want to predict failure time (y) based on solder temperature (x). a. Construct a scatterplot for the data. What type of relationship, linear or curvilinear, appears to exist between failure time and solder temperature? b. Fit the model, ? = ?0 + ?1? + ?2?2 + ?, to the data. Give the least squares prediction equation. c. Conduct a test to determine if there is upward curvature in the relationship between failure time and solder temperature. (Use α = .05) Please use R studio software to get data. TEMP FAILTIME 165 200 162 200 164 1200 158 500 158 600 159 750 156 1200 157 1500 152 500 147 500 149 1100 149 1150 142 3500 142 3600 143 3650 133 4200 132 4800 132 5000 134 5200 134 5400 125 8300 123 9700
In: Statistics and Probability
An instructor is interested in assessing students' typing speed
in the class. He knows that the average at the university is 49
word per minute (wpm) with a SD of 18 wpm. He collects
data from 31 of his students.
a) What is the probability of the average typing
speed being less than 49 wpm?
probability =
b) What is the proability of the average typing
speed being between 51 and 53 wpm?
probability =
Note: Do NOT input probability responses as
percentages; e.g., do NOT input 0.9194 as 91.94.
In: Statistics and Probability
True or False
1) If the obtained sample data are in the critical region, then we could conclude that the data provide support for the null hypothesis.
2) If a researcher is predicting that a treatment will increase scores, then the critical region for a directional test will be in the right-hand tail
3) Whenever the statistical decision is to reject the null hypothesis, there is risk of a type I error
4)In a type II error, the experimenter concludes there is evidence for an effect when in fact an effect does not exist
5) All other things being equal (Mean, Standard Deviation, M and a), You are more likely to make a Type I error with a sample of n=4 than with a sample of n=11
6) If the null-hypothesis is rejected using a one-tailed test, then it certainly would be rejected if the researcher used a two-tailed test.
7) Increasing the alpha level (for example from .01 to .05) will decrease the size of the critical region.
8) The t-distribution is symmetrical and has a mean of 0
9) The t statistic is used for hypothesis tests in situations where the population standard deviation (or variance) is unknown.
In: Statistics and Probability
You are about to play a series of 9 chess games online against an opponent called EliteChampion. The first to win 5 games wins the series. (Ignore the possibility of a draw.) You know that EliteChampion is most likely your friend, Jenna. There’s a 20% chance EliteChampion is your Mom, and that’s the only other possibility. You beat your Mom 70% of the time, but you only beat Jenna 40% of the time. Given that you won the first game, what is the probability you will win the series? Give an exact answer and also give a numerical approximation, correct to four decimal places.
In: Statistics and Probability
The following table gives the data from a local school district on children's ages (x) and the reading level (y)
Ages (in yrs.), x | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---|---|---|---|---|---|---|---|---|---|---|
Reading level, y | 1.3 | 2.2 | 3.7 | 4.1 | 4.9 | 5.2 | 6.0 | 7.1 | 8.5 | 9.7 |
a). Find the correlation coefficient (r) between age (in years) and
reading level. (write your calculator steps)
b) Find the coefficient of determination (r2 )
c) Find the slope of the regression line
d) Find the intercept of the regression line
e) Find the estimated equation of the regression line
f) Predict the reading level for age 10.5 years
In: Statistics and Probability