Questions
Research the role of ETL tools in providing clean and purposely transformed data as part of...

Research the role of ETL tools in providing clean and purposely transformed data as part of data mining processes. Then explain the role of ETL in data mining and statistical analysis.

In: Math

Use Minitab to answer the questions. Make sure to copy all output from the Minitab: 1.  ...

Use Minitab to answer the questions. Make sure to copy all output from the Minitab:

1.   Followings Tables shows previous 11 months stock market returns.

Date

Monthly

SP500 Return

Monthly DJIA Return

12/7/2007

-0.8628

-0.7994

1/8/2008

-6.1163

-4.6323

2/8/2008

-3.4761

-3.0352

3/8/2008

-0.5960

-0.0285

4/8/2008

4.7547

4.5441

5/8/2008

1.0674

-1.4182

6/8/2008

-8.5962

-10.1937

7/8/2008

-0.9859

0.2468

8/8/2008

1.2191

1.4548

9/8/2008

-9.2054

-6.0024

10/8/2008

-16.8269

-4.8410

  1. Let’s consider the population mean of SP500 as µ1 and that of DJIA as µ2 while none of population variance is known. Test following hypothesis:

Ho : µ1 = µ2

Ha :   µ1 ≠ µ2

  1. Let’s consider the population variance of SP500 as σ21 and that of DJIA as σ22, and none of them are known. Test following hypothesis:

Ho :   σ21 = σ22

Ha :   σ21 ≠ σ22

6) Perform the following hypothesis test

Ho :   σ21 ≤ σ22

Ha :   σ21 > σ22

In: Math

A recent study claimed that at least 15% of junior high students are overweight. In a...

A recent study claimed that at least 15% of junior high students are overweight. In a sample of 160 students, 18 were found to be overweight. Test the claim, using α = 0.05.

  1. State the null and alternative hypotheses for the test. [3 marks]

(b) Calculate the value of the test statistic for this test. [2 marks]

(c) Calculate the p-value for this test. [2 marks]

(d) State the conclusion of this test. Give a reason for your answer

In: Math

Bags of whole coffee beans are filled automatically on a production line. A machine fills each...

Bags of whole coffee beans are filled automatically on a production line. A machine fills each bag so that the weight of coffee beans inside is normally distributed with a mean of 290 grams. The label on the bag, however, states that the weight of coffee beans inside is 283 grams.

a. What is the standard deviation of bags of coffee beans, if 13% of the bags have a weight below what is stated on the label?

b. New management wants to be more accurate to their customers and reduce the number of bags that are sent out under the weight of 283 grams listed on the label. They set a goal of sending no more than 1% of bags of coffee that are under the weight of 283 grams. To do this, the management ordered a new filling machine which decreased the standard deviation to 2.3151 grams. The weight of the bags of coffee beans will still be normally distributed. To what mean weight should the new equipment be set, with this new standard deviation and to meet their goal? The machine may only take a one decimal approximation.

In: Math

Consider the following hypotheses: H0: p ≥ 0.48 HA: p < 0.48 Compute the p-value based...

Consider the following hypotheses:

H0: p ≥ 0.48

HA: p < 0.48

Compute the p-value based on the following sample information. (You may find it useful to reference the appropriate table: z table or t table) (Round "z" value to 2 decimal places. Round intermediate calculations to at least 4 decimal places and final answers to 4 decimal places.)

p-value
a. x = 50; n = 122
b. x = 118; n = 329
c. p⎯⎯p¯ = 0.42; n = 41
d. p⎯⎯p¯ = 0.42; n = 413

In: Math

A study was conducted on students from a particular high school over the last 8 years....

A study was conducted on students from a particular high school over the last 8 years. The following information was found regarding standardized tests used for college admitance. Scores on the SAT test are normally distributed with a mean of 1057 and a standard deviation of 203. Scores on the ACT test are normally distributed with a mean of 22.6 and a standard deviation of 4.9. It is assumed that the two tests measure the same aptitude, but use different scales.

If a student gets an SAT score that is the 34-percentile, find the actual SAT score.
SAT score =  
Round answer to a whole number.

What would be the equivalent ACT score for this student?
ACT score =  
Round answer to 1 decimal place.

If a student gets an SAT score of 1341, find the equivalent ACT score.
ACT score =  
Round answer to 1 decimal place.

In: Math

A large​ family-held department store had the business objective of improving its response to complaints. The...

A large​ family-held department store had the business objective of improving its response to complaints. The variable of interest was defined as the number of days between when the complaint was made and when it was resolved. Data were collected from 40 complaints that were made in the last year. Use the data to complete parts​ (a) through​ (d) below.

47
13
8
27
43

159

17
49
20
3
105
19
3
64
88
22
30
120
49
102
2
15
28
16
63
29
46
66
10
29
20
48
2
25
25
16
28
86
18
41

Click the icon to view the data table.

a. Construct a

9595​%

confidence interval estimate for the population mean number of days between the receipt of a complaint and the resolution of the complaint.The

9595​%

confidence interval estimate is from

28.728.7

days to

51.451.4

days.

​(Round to one decimal place as​ needed.)

b. What assumption must you make about the population distribution in order to construct the confidence interval estimate in​ (a)?

A.

The number of complaints per day is normally distributed.

B.

The number of days to resolve complaints follows the t distribution.

C.

The number of days to resolve complaints is normally distributed.

Your answer is correct.

D.

The number of complaints per day follows the t distribution.

c. Do you think that the assumption needed in order to construct the confidence interval estimate in​ (a) is​ valid? Explain.

A.

​No, the data suggest the population distribution is skewed to the right.

This is the correct answer.

B.

​Yes, the data suggest the population distribution approximately follows the t distribution.

C.

​Yes, the data suggest the population distribution is approximately normal.

Your answer is not correct.

D.

​No, the data suggest the population distribution is skewed to the left.

d. What effect might your conclusion in​ (c) have on the validity of the results in​ (a)?

In: Math

Individual Age Gender (Male=1) Family Size Cigarettes/Day Alcohol/Day Total Claims 1 25 1 2 0 3...

Individual Age Gender (Male=1) Family Size Cigarettes/Day Alcohol/Day Total Claims
1 25 1 2 0 3 1
2 28 0 7 0 1 0
3 20 0 3 10 0 9
4 53 1 1 0 0 5
5 48 0 3 5 2 12
6 54 1 3 0 0 2
7 56 1 4 0 0 0
8 35 1 3 10 1 10
9 50 0 6 0 0 6
10 39 1 6 0 0 1
11 56 1 7 0 3 4
12 29 0 5 0 2 4
13 60 1 1 0 1 3
14 29 1 5 0 1 1
15 21 1 2 20 1 17
16 23 1 4 0 1 1
17 38 1 4 0 1 8
18 62 1 4 0 0 7
19 24 0 1 0 3 7
20 53 0 7 0 1 6

ai) Find the mean number of claims made by the sample of smokers and nonsmokers in the group separately.(i.e mean of smokers, mean of nonsmokers)

ii) What is the standard deviation of family size for this population of workers? (standard deviation of popuation) Standardize by converting your “X” values into “Z” values to see whether their historical values match up well with the new company. Use a Z table Hint: use the (ai) and (aii) values along with the means and standard deviations you calculated.

b) First find the Z-value for smokers.

c) And now the Z for nonsmokers.

d) Using your Z-table, find the probability that a nonsmoker will make fewer than 6 claims.

e) Next, find the probability that a smoker will make more than 11 claims.

f) Final Recommendation: This firm will be more risky than the current customer risk pool. True or False

In: Math

7. In an area of the Great Plains, records were kept on the relationship between the...

7. In an area of the Great Plains, records were kept
on the relationship between the rainfall (in inches)
and the yield of wheat (bushels per acre).

Rainfall (in inches)

x

Yield (Bushels per acre) y

10.5

50.5

8.8

46.2

13.4

58.8

12.5

59.0

7.0

31.9

16.0

78.8

7a. Using the linear
regression feature on your calculator,
find a linear equation that models the
miles per gallon as the respone
variable (y) and the engine size in
liters as the explanatory variable (x). (Use 2 decimal places in the regression equation.)

7b. Using the line you obtained in 7a. above, compute the sum of the squared
residuals of the least squares line for the given data. (Use 2 decimal places and show your
calculations, by hand!)

In: Math

Jobs are sent to a server at a rate of 2 jobs per minute. We will...

Jobs are sent to a server at a rate of 2 jobs per minute. We will model job arrivals using a (homogenous) Poisson process. For each question, clearly specify the parameter value(s) of the distribution as well as its name. (a) What is the probability of receiving more than 3 jobs in a period of one minute? (b) What is the probability of receiving more than 30 jobs in a period of 10 minutes? (No need to simplify.) (c) What is the expected value and the variance of inter-arrival times? (d) Compute the probability that the next job does not arrive during the next 30 seconds. (e) Compute the probability that the time till the fourth job arrives exceeds 40 seconds.

In: Math

Twenty percent of the contestants in a scholarship competition come from Pylesville High School, 40% come...

Twenty percent of the contestants in a scholarship competition come from Pylesville High School, 40% come from Millerville High School, and the remaining come from Lakeside High School. Two percent of the Pylesville students are among the scholarship winners; 3% of the Millerville contestants and 5% of the Lakeside contestants win. a) If a winner is chosen at random, what is the probability that they are from Lakeside? b) What percentage of the winners are from Pylesville?

In: Math

How are exploratory data analysis (EDA) and hypothesis testing different? Explain why EDA could be preferred...

How are exploratory data analysis (EDA) and hypothesis testing different? Explain why EDA could be preferred in data mining, and justify your explanation with a specific example.

In: Math

The scores of the top ten finishers in a recent Buick Open are listed below: Scores...

The scores of the top ten finishers in a recent Buick Open are listed below: Scores 65 66 67 66 67 67 70 71 68 70 Round all solutions one decimal place. You must show ALL of your work to receive credit for these problems. e) Find the variance of the data. f) Find the standard deviation of the data. g) Find quartile one, Q1 , of the data. Interpret the first quartile in the context of the problem. h) Find quartile three, Q 3 , of the data. Interpret the third quartile in the context of the problem. i) Find the inter-quartile range, IQR, of the data. Interpret the IQR in the context of the problem. j) List the Five Number Summary. k) Construct a Box-and-Whisker Plot for the data set. (Don’t forget the title)

In: Math

Since an instant replay system for tennis was introduced at a major​ tournament, men challenged 1389...

Since an instant replay system for tennis was introduced at a major​ tournament, men challenged 1389 referee​ calls, with the result that 427 of the calls were overturned. Women challenged 779 referee​ calls, and 227 of the calls were overturned. Use a 0.01 significance level to test the claim that men and women have equal success in challenging calls. Complete parts​ (a) through​ (c) below. a. Test the claim using a hypothesis test. Consider the first sample to be the sample of male tennis players who challenged referee calls and the second sample to be the sample of female tennis players who challenged referee calls. What are the null and alternative hypotheses for the hypothesis​ test? A. Upper H 0​: p 1equalsp 2 Upper H 1​: p 1not equalsp 2 Your answer is correct.B. Upper H 0​: p 1equalsp 2 Upper H 1​: p 1less thanp 2 C. Upper H 0​: p 1less than or equalsp 2 Upper H 1​: p 1not equalsp 2 D. Upper H 0​: p 1equalsp 2 Upper H 1​: p 1greater thanp 2 E. Upper H 0​: p 1not equalsp 2 Upper H 1​: p 1equalsp 2 F. Upper H 0​: p 1greater than or equalsp 2 Upper H 1​: p 1not equalsp 2 Identify the test statistic. zequals .78 . 78 ​(Round to two decimal places as​ needed.) Identify the​ P-value. ​P-valueequals .782218435 . 435 ​(Round to three decimal places as​ needed.) What is the conclusion based on the hypothesis​ test? The​ P-value is greater than the significance level of alphaequals0.01​, so fail to reject the null hypothesis. There is not sufficient evidence to warrant rejection of the claim that women and men have equal success in challenging calls. b. Test the claim by constructing an appropriate confidence interval. The 99​% confidence interval is -.199 negative . 199less thanleft parenthesis p 1 minus p 2 right parenthesisless than -.200 negative . 200. ​(Round to three decimal places as​ needed.)

In: Math

Joe, Ken and Ben, they live in a shared house. Each Sunday each of the tenants...

Joe, Ken and Ben, they live in a shared house. Each Sunday each of the tenants choose uniformly at random and independence of other tenant one part of the house: kitchen(K), living room(L),bathroom(B),garage(G). and cleans it during the week that follows( here week means a period of 7 days starting on Sunday)

(a) what is the probability that the garage is not cleaned during one week?

(b) what is the probability that the kitchen is cleaned exactly once during one week?

In: Math