The following data show the brand, price ($), and the overall score for six stereo headphones that were tested by a certain magazine. The overall score is based on sound quality and effectiveness of ambient noise reduction. Scores range from 0 (lowest) to 100 (highest). The estimated regression equation for these data is
ŷ = 23.194 + 0.318x,
where x = price ($) and y = overall score.
Brand | Price ($) | Score |
---|---|---|
A | 180 | 76 |
B | 150 | 71 |
C | 95 | 61 |
D | 70 | 58 |
E | 70 | 38 |
F | 35 | 26 |
(a) Compute SST, SSR, and SSE. (Round your answers to three decimal places.)
SST =
SSR =
SSE =
(b) Compute the coefficient of determination r2. (Round your answer to three decimal places.)
r2 =
(c) What is the value of the sample correlation coefficient? (Round your answer to three decimal places.)
In: Statistics and Probability
10m Wind Speed Data
Day 1 |
Day 2 |
Day 3 |
Day 4 |
Day 5 |
Day 6 |
Day 7 |
3.226 |
2.908 |
2.355 |
3.341 |
1.507 |
6.355 |
6.846 |
3.144 |
3.026 |
2.235 |
3.402 |
1.418 |
6.465 |
6.581 |
3.197 |
2.843 |
2.168 |
3.242 |
1.433 |
6.613 |
6.894 |
3.007 |
2.995 |
2.125 |
3.039 |
1.6 |
6.363 |
6.401 |
3.05 |
3.245 |
2.474 |
2.983 |
1.837 |
6.149 |
7.19 |
3.02 |
3.349 |
2.421 |
3.165 |
2.054 |
7.19 |
6.795 |
3.001 |
3.085 |
2.369 |
2.915 |
1.669 |
5.826 |
7.16 |
2.957 |
3.003 |
2.344 |
2.414 |
2.136 |
6.628 |
6.583 |
3.012 |
3.01 |
2.509 |
1.619 |
2.849 |
5.999 |
6.14 |
3.249 |
3.141 |
2.796 |
1.681 |
2.876 |
6.501 |
6.472 |
3.304 |
3.338 |
2.928 |
1.673 |
3.536 |
6.388 |
7.8 |
3.239 |
3.165 |
2.867 |
2.1 |
3.517 |
5.757 |
7.14 |
3.063 |
2.969 |
3.002 |
2.312 |
3.476 |
6.314 |
6.789 |
2.833 |
3.049 |
3.12 |
2.56 |
4.368 |
7.04 |
7.02 |
2.876 |
3.058 |
3.179 |
2.352 |
4.778 |
6.51 |
5.736 |
2.855 |
3.032 |
3.005 |
2.133 |
4.708 |
6.734 |
7.02 |
3.252 |
3.015 |
2.813 |
1.882 |
4.599 |
6.788 |
5.754 |
3.409 |
2.823 |
2.97 |
2.015 |
5.207 |
6.347 |
6.617 |
3.198 |
2.921 |
3.113 |
2.046 |
5.66 |
7.05 |
5.253 |
2.797 |
2.866 |
3.329 |
1.78 |
5.837 |
6.327 |
6.159 |
For the above wind data set, find the 3 M’s (mean, median, and mode) as well as the range of data for each day.
Based on the 3 M’s and the range of the data, which day has the most optimum wind speeds, consider variability, overall wind speeds, and stability of wind speeds as your deciding factor. Elaborate on your reasoning.
In: Statistics and Probability
A simple random sample of 800 elements generates a sample proportion of 0.70.
Provide a 90% confidence interval for the population proportion. (round to two decimal places) [Answer , Answer]
Provide a 99% confidence interval for the population proportion. (round to two decimal places) [Answer , Answer]
In: Statistics and Probability
2. A researcher believes that smoking affects a person’s sense of smell. To test this, he takes a sample of 25 smokers and gives them a test of olfactory sensitivity. In this test, higher scores indicate greater sensitivity. For his sample, the mean score on the test is 14.8 with a standard deviation of 2.4. The researcher knows the mean score in the population is 16.2, but the population standard deviation is unknown.
(a) What are the null and alternative hypotheses in this study (stated mathematically)?
(b) Should the researcher use a one-tailed or a two-tailed test? (c) Compute the appropriate test statistic for testing the hypothesis.
(d) Using α = 0.01, do you conclude that smoking affects a person’s sense of smell? Be sure to include a discussion of the critical value in your answer.
(e) What type of error might the researcher be making in part (d)?
In: Statistics and Probability
What is the forecast and MSE using regression? 2019 is the holdout sample and "car sales" is the independent variable.
Shipments | Car Sales | Fasteners |
Jan-17 | 17680000 | 335798 |
Feb-17 | 17650000 | 297853 |
Mar-17 | 17130000 | 318399 |
Apr-17 | 17230000 | 311730 |
May-17 | 17200000 | 363876 |
Jun-17 | 17200000 | 296832 |
Jul-17 | 17180000 | 297513 |
Aug-17 | 17020000 | 321144 |
Sep-17 | 18380000 | 317677 |
Oct-17 | 18200000 | 325487 |
Nov-17 | 17860000 | 272937 |
Dec-17 | 17700000 | 276282 |
Jan-18 | 17550000 | 335439 |
Feb-18 | 17560000 | 310514 |
Mar-18 | 17690000 | 407754 |
Apr-18 | 17770000 | 356169 |
May-18 | 17780000 | 345322 |
Jun-18 | 17700000 | 331997 |
Jul-18 | 17380000 | 343059 |
Aug-18 | 17360000 | 350277 |
Sep-18 | 17840000 | 265205 |
Oct-18 | 18000000 | 389332 |
Nov-18 | 17880000 | 310474 |
Dec-18 | 17890000 | 308429 |
Jan-19 | 17240000 | 385807 |
Feb-19 | 17030000 | 332529 |
Mar-19 | 17770000 | 407606 |
Apr-19 | 17050000 | 361946 |
May-19 | 17930000 | 453432 |
Jun-19 | 17710000 | 412892 |
Jul-19 | 17440000 | 447359 |
Aug-19 | 17510000 | 363769 |
Sep-19 | 17720000 | 361232 |
Oct-19 | 17050000 | 451421 |
Nov-19 | 17450000 | 363724 |
Dec-19 | 17160000 | 331619 |
In: Statistics and Probability
An insurance company was conducting performance analysis of their claims handling processes and process cycle time was one of their concerns. They collected a sample data of the process cycle time across a number of different claims handling processes over the past six months. However, the data followed a (non-normal) multimodal distribution instead of a normal distribution. Why? Explain what could be the reason(s) behind this?
The company then focused on the CTP insurance claims handling pro- cesses and a sample data of the process cycle time of about 500 CTP in- surance claims from the past six months. Such data followed a normal dis- tribution characterised by mean=20.5 days and standard deviation=5.25 days. Answer the following questions based on the above information.
(a) Given the sample data, how often the cycle time of a CTP insurance claim process could fall within the range [15.25, 20.75] days? Why?
(b) If the expected mean cycle time of CTP insurance claims is 20.2 days, did the company meet this target in the past six months? Conduct an appropriate statistic test to draw conclusion.
In: Statistics and Probability
In this chapter, you learn four steps that should be used to evaluate a regression model. What is the first step and why is it important? Explain the other three steps, indicating what you learn from each of those three steps.
In: Statistics and Probability
2. Numerical “Proof” that for X,Y independent,Var(X+Y) = Var(X−Y) =σ2X+σ2Y:
2.1 As we did in Lab 4, you will need to generate a sample(x,y)-values by independently generating x-values and y-values.(You may choose sample size of 50000.) State the two distributions you will use for generating x-values and y-values,and the corresponding population variances.
2.2 Compute the sample variances of the (X+Y)-values, and of the (X−Y)-values. What value are these two sample variances supposed to estimate?
2.3 Use the formula that explains the difference between the two sample variances and recompute them using the sample variances of the x- and y-values and their covariance.
please include r code
In: Statistics and Probability
A college admissions director wishes to estimate the mean age of all students currently enrolled. In a random sample of 22 students, the mean age is found to be 21.4 years. From past studies, the ages of enrolled students are normally distributed with a standard deviation of 10.2 years. Construct a 90% confidence interval for the mean age of all students currently enrolled.
b. The standard deviation of the sample mean:
In: Statistics and Probability
1. For a certain population, systolic BP is normally distributed with μ=122 and σ=14. Hypertension is defined as systolic BP over 150 mmHg.
a. what is the z-score for systolic BP of 150, given this information? z=
b. What percentage of the population is hypertensive? you may use the empirical rule =
2. Continuing from the previous question where BP follows a normal distribution with μ=122 and σ=14, suppose we randomly sample n=194 individuals from this population. What is the probability that the sample mean of systolic BP obtained from this sample will be between 135 and 146mmHg? It is recommended to first compute the standard error, then compute the z-scores for 135 and 146, then compute the difference between the cumulative probabilities associated with these two z-scores
a. 0.823
b. 0.133
c. 0.841
d. 0.957
In: Statistics and Probability
1.1 LetX∼Poisson(4). The r command dpois(0:9, 4) gives the probabilities that P(X=k) fork= 0,1,...,9.
1.1.1 Use the plot() function to plot these probabilities and to connect the points with lines.
1.1.2 Use the barplot() function to make a barplot these probabilities.
1.2 The r command rpois(1000, 4) generates a sample of 1000values from the Poisson(4) distribution. Use the barplot()function to plot the empirical probabilities for X=k resultingfrom this sample
please include r code needed to generate plots
In: Statistics and Probability
Use a two-tailed independent sample t-test to answer this question. A new drug is being tested to see if it reduces the number of backaches. Do these samples of number of monthly backaches from eight volunteers who are taking the drug and eight who are not taking the drug show a significant reduction in the mean number of monthly backaches (α= 0.05)? Identify the hypothesis statements, test statistic, p-value, and your decision. Also, construct and report the 95% confidence interval for each group. Analyze the confidence intervals - Do you come to the same conclusion as the t-test results? Why or why not?
New Drug Group: 4 4 3 5 4 5 4 6
Control Group: 5 6 6 6 8 6 6 7
In: Statistics and Probability
Many female undergraduates at four-year colleges switch from STEM majors into disciplines that are not science-based, thereby contributing to the underrepresentation of women in STEM fields. When female undergrads switch majors, are their reasons different from those of their male counterparts? This question was investigated in Science Education. A sample of 335 junior/senior undergraduates- 172 females and 163 males- at two large research universities were identified as “switchers”, that is they left a declared STEM major for a non-STEM major. Each student listed one or more factors that contributed to the switching decision.
(a) Of the 172 females in the sample, 74 listed lack or loss of interest in STEM (i.e., “turned off” by science) as a major factor, compared to 72 of the 163 males. Conduct a test (at α = .10) to determine whether the proportion of female switchers who give “lack of interest in STEM” as a major reason for switching differs from the corresponding proportion of males.
(b) Thirty–three of the 172 females in the sample indicated that they were discouraged or lost confidence because of low grades in STEM during their early years, compared to 44 of 163 males. Construct a 90 % confidence interval for the difference between the proportions of female and male switchers who lost confidence due to low grades in STEM. Interpret the result.
In: Statistics and Probability
1. Independent random samples of n1 = 200 and n2 = 200 observations were randomly selected from binomial populations 1 and 2, respectively. Sample 1 had 116 successes, and sample 2 had 122 successes.
a) Calculate the standard error of the difference in the two sample proportions, (p̂1 − p̂2). Make sure to use the pooled estimate for the common value of p. (Round your answer to four decimal places.)
b) Critical value approach: Find the rejection region when α = 0.01. (Round your answer to two decimal places. If the test is one-tailed, enter NONE for the unused region.)
z <
z >
2. The meat department of a local supermarket chain packages ground beef in trays of two sizes. The smaller tray is intended to hold 1 kilogram (kg) of meat. A random sample of 30 packages in the smaller meat tray produced weight measurements with an average of 1.01 kg and a standard deviation of 20 grams.
p-value =
In: Statistics and Probability
Out of 600 people sampled, 300 preferred Candidate A. Based on this, estimate what proportion of the voting population ( p ) prefers Candidate A. Use a 99% confidence level, and give your answers as decimals, to three places.
In: Statistics and Probability