Questions
Consider the two dependent discrete random variables X and Y . The variable X takes values...

Consider the two dependent discrete random variables X and Y . The
variable X takes values in {−1, 1} while Y takes values in {1, 2, 3}. We observe that

P(Y =1|X=−1)=1/6
P(Y =2|X=−1)=1/2
P(Y =1|X=1)=1/2
P(Y =2|X=1)=1/4
P(X = 1) = 2/ 5
(a) Find the marginal probability mass function (pmf) of Y .
(b) Sketch the cumulative distribution function (cdf) of Y .
(c) Compute the expected value E(Y ) of Y .
(d) Compute the conditional expectation E[Y |X = 1].

In: Statistics and Probability

Stats I, Item # Q-02 At the Joseph Biden Middle School in Down-the-Shore, Delaware, student reading...

Stats I, Item # Q-02

At the Joseph Biden Middle School in Down-the-Shore, Delaware, student reading comprehension was evaluated, both pre-test and post-test, bookending a pilot program intervention advocated by the district superintendent and board of education.

Pre-Test

66

45

70

32

60

84

25

76

50

88

75

64

Post-Test

80

54

78

28

76

76

50

72

75

90

75

56

The vice-principal optimistically expected that the intervention would improve reading comprehension scores, whereas the principal pessimistically anticipated that the instrument confused students and scores dropped. Identify the mean average scores on the pre-test and post test instruments, and what the average change was across all participants reported in this sample. Test with 90% confidence that the claim that the intervention had an impact of some kind. State the hypotheses and the conclusions, both technically and contextually. Confirm the findings with the corresponding p-value and confidence interval.

In: Statistics and Probability

The number of faults over a period of time was collected for a sample of 100...

The number of faults over a period of time was collected for a sample of 100 data transmission lines. We want to test if the data come from a Poisson distribution.

number of faults 0 1 2 3 4 5 >5
number of lines 38 30 16 9 5 2 0

(a) Assuming the number of faults for a data-transmission line, Yi , i = 1, . . . , 100, follows a Poisson distribution with parameter λ, find the maximum likelihood estimate of λ for these data.

(b) Test the hypothesis that the number of faults for a data-transmission line follows a Poisson distribution using the chi-squared test. Do we reject the null hypothesis at the 5% significance level? Hint: You should combine the observed data into several groups such that expected frequencies are greater or equal to 5 for each group.

In: Statistics and Probability

Please use R or Rstudio for this exercise and show everything, including the R output. Pay...

Please use R or Rstudio for this exercise and show everything, including the R output. Pay attention in everything in Bold, please.

" The quality of Pinot Noir wine is thought to be related to the properties of clarity, aroma, body, flavor, and oakiness. Data for 38 wines are given in stat5_prob1.

(a) Fit a multiple linear regression model relating wine quality to these regressors.

(b) Construct the ANOVA table.

(c) Test for the significance of the regression in a 0.05 significance level. What conclu- sions can you draw?

(d) Use the t tests to assess the individual contribution of each regressor to the model in a 0.05 significance level. Discuss your findings.

(e) What is the contribution of the set of clarity and aroma to the model, given that all of the other regressors are included? Perform this hypothesis test using 0.05 significance level.

(f) Find a 95% confidence interval for the regression coefficient for flavor.

(g) Calculate R^2 and R^2 adj for this model. Compare these values to the R^2 and R^2 adj for the regression model relating wine quality to aroma and flavor. Discuss your results.

***Here is the data for the 38 wines***

# quality is y
# clarity is x1, aroma is x2, body is x3, flavor is x4, oakiness is x5.

y=c(9.8, 12.6, 11.9, 11.1, 13.3, 12.8, 12.8, 12, 13.6, 13.9, 14.4, 12.3, 16.1, 16.1, 15.5, 15.5, 13.8, 13.8, 11.3, 7.9, 15.1, 13.5, 10.8, 9.5, 12.7, 11.6, 11.7, 11.9, 10.8, 8.5, 10.7, 9.1, 12.1, 14.9, 13.5, 12.2, 10.3, 13.2)

x1=c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0.5, 0.8, 0.7, 1, 0.9, 1, 1, 1, 0.9, 0.9, 1, 0.7, 0.7, 1, 1, 1, 1, 1, 1, 1, 0.8, 1, 1, 0.8, 0.8, 0.8, 0.8)

x2=c(3.3, 4.4, 3.9, 3.9, 5.6, 4.6, 4.8, 5.3, 4.3, 4.3, 5.1, 3.3, 5.9, 7.7, 7.1, 5.5, 6.3, 5, 4.6, 3.4, 6.4, 5.5, 4.7, 4.1, 6, 4.3, 3.9, 5.1, 3.9, 4.5, 5.2, 4.2, 3.3, 6.8, 5, 3.5, 4.3, 5.2)

x3=c(2.8, 4.9, 5.3, 2.6, 5.1, 4.7, 4.8, 4.5, 4.3, 3.9, 4.3, 5.4, 5.7, 6.6, 4.4, 5.6, 5.4, 5.5, 4.1, 5, 5.4, 5.3, 4.1, 4, 5.4, 4.6, 4, 4.9, 4.4, 3.7, 4.3, 3.8, 3.5, 5, 5.7, 4.7, 5.5, 4.8)

x4=c(3.1, 3.5, 4.8, 3.1, 5.5, 5, 4.8, 4.3, 3.9, 4.7, 4.5, 4.3, 7, 6.7, 5.8, 5.6, 4.8, 5.5, 4.3, 3.4, 6.6, 5.3, 5, 4.1, 5.7, 4.7, 5.1, 5, 5, 2.9, 5, 3, 4.3, 6, 5.5, 4.2, 3.5, 5.7)

x5=c(4.1, 3.9, 4.7, 3.6, 5.1, 4.1, 3.3, 5.2, 2.9, 3.9, 3.6, 3.6, 4.1, 3.7, 4.1, 4.4, 4.6, 4.1, 3.1, 3.4, 4.8, 3.8, 3.7, 4, 4.7, 4.9, 5.1, 5.1, 4.4, 3.9, 6, 4.7, 4.5, 5.2, 4.8, 3.3, 5.8, 3.5). "

In: Statistics and Probability

Age at diagnosis for each of 20 patients under treatment for meningitis was given in a...

Age at diagnosis for each of 20 patients under treatment for meningitis was given in a research paper. Suppose the ages (in years) were as follows.

18 18 27 19 23 20 66 18 21 18 20 18
18 20 18 19 28 16 18 18

(a)

Calculate the values of the sample mean and the standard deviation. (Round your standard deviation to three decimal places.)

sample mean =
rstandard deviation =

(b)

Compute the upper quartile, the lower quartile, and the interquartile range.

upper quartile=
lower quartile =
interquartile range =

(c)

Are there any mild or extreme outliers present in this data set? (Enter your answers as comma-separated lists. If there is no answer, enter NONE.)

mild outliers=

extreme outliers=

In: Statistics and Probability

What is the real difference between between correlation and regression analysis? I see that correlation allows...

What is the real difference between between correlation and regression analysis? I see that correlation allows us to examine whether two variables exhibit some kind of association with each other. Does correlation mean means that one variable is dependent on another? Is this an example of correlation - Tips are dependent on the actual restaurant bill? Can you give other examples of correlation and regression?

Thanks.

In: Statistics and Probability

1. A school administrator sends out grade school students to sell boxes of candy to raise...

1. A school administrator sends out grade school students to sell boxes of candy to raise funds. Below is a selection of four students and the mean number of boxes they sold over a weekend. The administrator wants to calculate the average number of boxes sold across students, but wants to weight this by the number of nearby houses (because students with more houses nearby will sell more boxes). For these data, what is the weighted mean?

Mean Candy sold

5

4

18

10

Number of nearby houses

3

4

12

9

2.

Number of songs

Proportion

10

0.1

15

0.14

20

0.15

25

0.11

30

0.13

35

0.16

40

0.09

45

0.07

50

0.05

What is the average expected number of songs from this sample? (the mean of the probability distribution)

3.

Number of songs

Proportion

10

0.1

15

0.14

20

0.15

25

0.11

30

0.13

35

0.16

40

0.09

45

0.07

50

0.05

What is the standard deviation of the number of songs from this sample? (the SD of the probability distribution)

4.

Intervals Frequency Cumulative Percent
10-20 1 3
21-30 3 13
31-40 7 35
41-50 10 68
51-60 8 94
61-70 2 100

What number is at the 55th percentile? (You may round to a whole number for the answer)

In: Statistics and Probability

A 2010 Pew Research poll asked 1,306 Americans "From what you've read and heard, is there...

A 2010 Pew Research poll asked 1,306 Americans "From what you've read and heard, is there solid evidence that the average temperature on earth has been getting warmer over the past few decades, or not?". The table below shows the distribution of responses by party and ideology, where the counts have been replaced with relative frequencies. Earth is warming Not warming Don't know (or refuse) Total Conservative Republican 0.11 0.2 0.02 0.33 Mod/Lib Republican 0.06 0.06 0.01 0.13 Mod/Cons Democrat 0.25 0.07 0.02 0.34 Liberal Democrat 0.18 0.01 0.01 0.2 Total 0.6 0.34 0.06 1 a) Are believing that the earth is warming and being a liberal Democrat mutually exclusive? not mutually exclusive mutually exclusive b) What is the probability that a randomly chosen respondent believes the earth is warming or is a liberal Democrat? (please round to four decimal places) c) What is the probability that a randomly chosen respondent believes the earth is warming given that he is a liberal Democrat? (please round to four decimal places) d) What is the probability that a randomly chosen respondent believes the earth is warming given that he is a conservative Republican? (please round to four decimal places) e) Does it appear that whether or not a respondent believes the earth is warming is independent of their party ideology? belief in global warming and party ideology are dependent belief in global warming and party ideology are independent f) What is the probability that a randomly chosen respondent is a moderate/liberal Republican given that he does not believe that the earth is warming? (please round to four decimal places)

In: Statistics and Probability

In Excel, What are some of the functions that can be used to manipulate text?

In Excel, What are some of the functions that can be used to manipulate text?

In: Statistics and Probability

Carbon monoxide concentrations (in µg/m3 ) are measured on 6 different days in city A and...

Carbon monoxide concentrations (in µg/m3 ) are measured on 6 different days in city A and on 4 different days in city B. The following measurements have been obtained in cities A and B, respectively: 3.9, 4.7, 7.1, 6.9, 4.3, 6.3 and 7.3, 6.9, 7.6, 9.1. We assume that these two samples are independent.

(a) Assume that both samples are from normally distributed populations. We also assume that the same type of measurement device was used in both cities so that the measurement error (variance) is the same for both samples. Construct the 95% confidence interval for δ = µA−µB where µA and µB are mean concentrations of carbon monoxide in cities A and B, respectively. (b) Do we reject H0 : δ = 0 in favor of H1 : δ 6= 0 at the 5% significance level? Why or why not? Find the p-value of this test.

(c) Now assume that both samples are from normally distributed populations, as before, but two different measurement devices were used and their measurements errors are known. We assume that the variances are 2 and 1, respectively, for the two samples. Construct the 95% confidence interval for δ.

(d) Compare confidence intervals you obtained in (a) and (c). Which one is narrower? Briefly explain why.

(e) Show that (3.9, 7.1) is an approximate 97% confidence interval for the median concentration of carbon monoxide in city A.

In: Statistics and Probability

(a)Evaluate the following set of data for possible and probable outliers. 5 8 2 9 5...

(a)Evaluate the following set of data for possible and probable outliers.
5 8 2 9 5 3 7 4 2 7 4 10 4 3 17

(b) A firm pays 5/12 of its labor force an hourly wage of $5, 1/3 of the labor force a wage
of $6 and ¼ a wage of $7. Determine the average wage paid by the firm.


(c)For the same amount of capital invested in each of 3 years, an investor earned a rate of
return of 1%during the first year, 4% during the second year and 16% during the third.
Find the simple arithmetic mean and the geometric mean. Which do you think is a more
appropriate in this case? Explain.


(d) A plane travelled 200 miles at 600 mph and 100 miles at 500mph. What was the
average speed for the entire journey?


(e) A driver purchased $10 worth of gasoline at $0.90 per gallon and another $10 at
$1.10 per gallon. What is the average price per gallon?

In: Statistics and Probability

A plant manufacturers 12 voltage DC Power Supplies. Two engineers, Kelsier and Dockson, are given access...

A plant manufacturers 12 voltage DC Power Supplies. Two engineers, Kelsier and Dockson, are given access to a batch of 500 power supplies to test. Their tests determine the maximum power that the power supply can provide for a sustained period of time.

Kelsier, a newly hired quality control engineer randomly selects 10 power supplies from this batch, and his tests results in a sample mean of 150.030 W and the sample standard deviation of 3.502 W.

Dockson, another quality control engineer has worked for the company for a long time, and knows from past experience that the average capacity of the power supplies that they make is 151 W and sample standard deviation of 4 W.

Management requires that the maximum power that the DC power supplies can provide should be at least 148 W during sustained use. Assume that the capacity of power supplies is normally distributed.

a) How confident is Kelsier that the average maximum power that the average power supply of this batch can provide is at least 148 Watts?

b) Based on the data provided by Dockson, estimate the number of power supplies from the given batch that do not satisfy the management requirement. List all your assumptions.

In: Statistics and Probability

A stock analyst wondered whether the mean rate of return of​ financial, energy, and utility stocks...

A stock analyst wondered whether the mean rate of return of​ financial, energy, and utility stocks differed over the past 5 years. He obtained a simple random sample of eight companies from each of the three sectors and obtained the​ 5-year rates of return shown in the accompanying table​ (in percent). Complete parts​ (a) through​ (d) below.

Are the mean rates of return different at

α=0.05 level of​ significance?

Use technology to find the​ F-test statistic for this data set.

Find the P=Value

Financial Energy Utilities

10.73 12.89 11.88

15.05 13.96 5.76

17.01 6.43 13.46

5.07 11.19 9.90

19.59 18.93 3.95

8.16 20.73 3.44

10.38 9.60 7.11

6.52 17.40 15.70

In: Statistics and Probability

In a study on the physical activity of young adults, pediatric researchers measured overall physical activity...

In a study on the physical activity of young adults, pediatric researchers measured overall physical activity as the total number of registered movements (counts) over a period of time and then computed the number of counts per minute (cpm) for each subject (International Journal of Obesity, Jan. 2007). The study revealed that the overall physical activity of obese young adults has a mean of m = 320 cpm and a standard deviation of s = 100 cpm. (In comparison, the mean for young adults of normal weight is 540 cpm.) In a random sample of n = 100 obese young adults, consider the sample mean counts per minute, x.

a. Describe the sampling distribution of x.

b. What is the probability that the mean overall physical activity level of the sample is between 300 and 310 cpm?

c. What is the probability that the mean overall physical activity level of the sample is greater than 360 cpm?

In: Statistics and Probability

W74) Please answer in detail with EXCEL Function since I am learning in EXCEL function Tests...

W74) Please answer in detail with EXCEL Function since I am learning in EXCEL function

Tests with Chi-Square distribution

  1.    Variance

              A random sample of the number of games played by individual NBA scoring leaders is shown below and reproduced in the Excel answer workbook. If a sports analyst argues that this sample variance is no different from 40 at α = .05, is she correct? Assume, of course, that the number of games played variable is normally distributed.   Use the P-value method. (Round to 4 digits.) In your answer, as you’ll see in the Excel worksheet, you’re to identify the hypotheses, chi-square test statistic, and p-value, compute those values from the data set, and type in the correct conclusion.

(Hint: this is a two-tailed test. So compute the P/2 value corresponding to the Chi-square test statistic and compare it with α/2. Which tail of the Chi-square distribution to use? Compare your computed sample variance with the hypothesized population variance. If (n-1) s2 / σ2 > (n-3), use right tail; if less, use left tail.)

              88         86         80         74         82        

              79         82         78         60         75

  1.   

              A university administrator found that 60% of all students view courses as very useful, 20% as somewhat useful, and 20% as worthless. Of a random sample of 100 students taking business courses, 68 found the course in question very useful; 18, somewhat useful; and 14, worthless. Test the null hypothesis that the population distribution for business courses is the same as that for all courses at a 10% level of significance.   In your answer, as usual, identify the hypotheses, compute the chi-square test statistic and p-value, and type in the correct conclusion.

In: Statistics and Probability