Questions
General-purpose computers and domain-independent relational database systems have become a large market in the last several...

General-purpose computers and domain-independent relational database systems have become a large market in the last several decades. However, many people feel that generic data mining systems will not prevail in the data mining market. What do you think? For data mining, should we focus our efforts on developing domain-independent data mining tools or on developing domain-specific data mining solutions? Present your reasoning.

In: Math

What are the major differences among the three methods for increasing the accuracy of a classifier:...

What are the major differences among the three methods for increasing the accuracy of a classifier:

(a) bagging,

(b) boosting, and

(c) ensemble?

In: Math

Please as soon as possible Topic: How do CO2 emissions vary globally? Two specific questions: •...

Please as soon as possible

Topic: How do CO2 emissions vary globally?

Two specific questions:

• Do countries with higher CO2 emissions also have large agricultural sectors?

• Do CO2 emissions vary by geographic region?

This is an Excel file: https://drive.google.com/file/d/1R5pArnQ062_uNKHTHyJ0odioGrpQmnvn/view?usp=sharing

Read the dataset into R and extract the data that you will need for this analysis. You can extract the columns by referring to them by name or by number. Provide the code that you use to do this.

Example:

#Read CSV file

allglobal <- read.csv(file.choose())

#Extract chosen columns using names

Assignment 2 global <- allglobal[, c("Country", "Region", "GDP", "TaxRevenue")]

#Extract chosen columns using column numbers global <- allglobal[, c(1, 3, 12, 30)]

#Inspect data head(global)

Explore your two questions using appropriate summary statistics and graphs. The following structure is recommended for each question:

• Clearly state the question being analysed (e.g. with a heading)

• Summary statistics and graphs. In the example above, you might use R idiom such as

– mean(global$GDP) or maybe median(global$GDP) (why the difference)?

– hist(global$GDP)

– boxplot(GDP~Region, data=global) . . .

although in this case you might be better off using logarithmic axes, boxplot(GDP~Region, data=global, log=’y’)

you should include at least 3 different types of graphs and 4 different summary statistics. Marks will only be awarded if the graphs, summary statistics and explanations are correct and appropriate.

Thank you so muuch!!

In: Math

Three statistics students are having a discussion about selecting the appropriate distribution for a data set....

Three statistics students are having a discussion about selecting the appropriate distribution for a data set. Explain why you agree or disagree with each student and give your own suggestion for the approach the students should take.

Maya: Maya argues that since the students don’t know what the population data looks like they should simply use the sample probability mass distribution as their population mass distributions.

Greg: Greg says that the sample probability mass distribution is oddly shaped and will almost certainly not be the same as the population mass distribution function. He suggests that it’s best to find a match from the common probability mass functions that the students know about.

Jane: Jane argues that both Greg and Maya’s approach could introduce unknown error into the analysis that they are performing. She reasons that as long as there is going to be error, the students should try both approaches and choose the one that produces the results that they would most like to see.

In: Math

Read the passage provided, and then consider the following scenario. A physician is trying to decide...

Read the passage provided, and then consider the following scenario. A physician is trying to decide whether to prescribe medication for cholesterol reduction in a 45-year-old female patient. The null hypothesis is that the patient’s cholesterol is less than the threshold of treatable hypercholesterolemia. However, a sample of readings over a 2-year time period shows considerable variation, usually below but sometimes above the threshold. Define Type I and Type II error. List the costs of each type of error (in general terms). Who bears the cost of each? How might the patient’s point of view differ from the HMO’s or doctor’s? In what sense is this a business problem? A societal problem? An individual problem?

In: Math

Suppose a statistician built a multiple regression model for predicting the total number of runs scored...

Suppose a statistician built a multiple regression model for predicting the total number of runs scored by a baseball team during a season. Using data for n=200 samples, the results below were obtained. Complete parts a through d.

Ind. Var.

β estimate

Standard Error

Ind. Var..

β estimate

Standard Error

Intercept

3.88

17.03

Doubles (X3)

0.74

0.04

Walks (X1)

0.37

0.05

Triples (X4)

1.17

0.23

Singles (X2)

0.51

0.05

Home Runs (X5)

1.44

0.04

a. Write the least squares prediction equation for y= total number of runs scored by a team in a season.

y=(3.88)+(0.37)X1+(0.51)X2+(0.74) X3+(1.17).X4+(1.44) X5. CORRECT ANSWERS (Type integers or decimals.)

b. Interpret, practically, β0 and β1 in the model. Which statement below best interprets β0?

A. For a change of β0 in any variable, the runs scored increases by 1.

B. For a decrease of 1 in any variable, the runs scored changes by β0.

C. For an increase of 1 in any variable, the runs scored changes by β0.

D. For a change of β0 in any variable, the runs scored decreases by 1.

E. This parameter does not have a practical interpretation. Your answer is correct.

Which statement below best interprets β1?

A. For an increase of 1 in the number of walks, the runs scored changes by β1.

B. For a change of β1 in the number of walks, the runs scored increases by 1.

C. For a decrease of 1 in the number of walks, the runs scored changes by β1.

D. For a change of β1in the number of walks, the runs scored decreases by 1.

E. This parameter does not have a practical interpretation.

In: Math

To try to determine whether the composition of the earth’s atmosphere has changed over time, scientists...

To try to determine whether the composition of the earth’s atmosphere has changed over time, scientists can examine the gas in bubbles trapped inside ancient amber.  (That’s the plot of Jurassic Park.) Assume that the following 9 measures are a random sample from the late Cretaceous era (75 to 95 million years ago). The data represent the percent of nitrogen in each sample.

63.4     65.0     64.4     63.3     54.8     64.5     60.8     49.1     51.0

You asked to conduct a hypothesis test to determine whether the mean is less than 61.

1. Conduct a hypothesis test using a 95% confidence interval.

a. What value for t will you use?

b. What is the sample mean?

c. What is the sample standard deviation?

d. What is the standard error?

e. Calculate the confidence interval.

f. What conclusion will you draw about the null hypothesis and why.

2. Conduct a hypothesis test using the traditional method.

a. Choose a level of significance (a)

b. Draw a t-diagram in which you place the mean at zero and t-value at which you will reject the null hypothesis. Clearly label the reject and do not reject region.

c. Calculate the test statistic using: x̅ -  μ) / SE where SE =  . s= sample standard deviation.

d. Place the value you get for t on your diagram. Does it fall in the reject or do not reject region?

e. What is your conclusion? State it in words in the context of this problem.

f. Calculate the p-value. Compare the p-value to . What conclusion will you draw and why?

g. Your conclusion in e and f should be the same. If not look over your work.

                

            

In: Math

Show all work please A researcher is interested in understanding the predictors of why individuals bully...

Show all work please

A researcher is interested in understanding the predictors of why individuals bully other individuals.  She collects the following data.

ID of Respondent

# of Friends who Bully

Respondent was a Bully Victim

(0 = No; 1 = Yes)

Gender

(0 = Female; 1 = Male)

# of Times Respondent Bullied Others

1

2

1

1

5

2

4

1

0

2

3

3

0

1

8

4

2

0

0

4

5

6

1

1

6

6

3

0

0

2

7

7

1

1

7

8

4

0

0

0

9

2

1

1

1

10

7

1

1

8

Test the hypothesis (at the .05 level of significance) that individuals who were bullied are more likely to bully others. Test the hypothesis (at the .05 level of significance) that individuals who were bullied committed more bullying than those who were not bullied.

                        

In: Math

Suppose we have a binomial experiment in which success is defined to be a particular quality...

Suppose we have a binomial experiment in which success is defined to be a particular quality or attribute that interests us. (a) Suppose n = 45 and p = 0.24. (For each answer, enter a number. Use 2 decimal places.) n·p = n·q = Can we approximate p̂ by a normal distribution? Why? (Fill in the blank. There are four answer blanks. A blank is represented by _____.) _____, p̂ _____ be approximated by a normal random variable because _____ _____. first blank Yes No second blank can cannot third blank n·p exceeds n·p and n·q do not exceed both n·p and n·q exceed n·p does not exceed n·q exceeds n·q does not exceed fourth blank (Enter an exact number.) What are the values of μp̂ and σp̂? (For each answer, enter a number. Use 3 decimal places.) μp̂ = mu sub p hat = σp̂ = sigma sub p hat = (b) Suppose n = 25 and p = 0.15. Can we safely approximate p̂ by a normal distribution? Why or why not? (Fill in the blank. There are four answer blanks. A blank is represented by _____.) _____, p̂ _____ be approximated by a normal random variable because _____ _____. first blank Yes No second blank can cannot third blank n·p exceeds n·p and n·q do not exceed both n·p and n·q exceed n·p does not exceed n·q exceeds n·q does not exceed fourth blank (Enter an exact number.) (c) Suppose n = 45 and p = 0.14. (For each answer, enter a number. Use 2 decimal places.) n·p = n·q = Can we approximate p̂ by a normal distribution? Why? (Fill in the blank. There are four answer blanks. A blank is represented by _____.) _____, p̂ _____ be approximated by a normal random variable because _____ _____. first blank Yes No second blank can cannot third blank n·p exceeds n·p and n·q do not exceed both n·p and n·q exceed n·p does not exceed n·q exceeds n·q does not exceed fourth blank (Enter an exact number.) What are the values of μp̂ and σp̂? (For each answer, enter a number. Use 3 decimal places.) μp̂ = mu sub p hat = σp̂ = sigma sub p hat =

In: Math

A colleague of yours is completing a final report on the causes of the frequency of...

A colleague of yours is completing a final report on the causes of the frequency of cyberbullying.  In this report, she is asked to identify the causes that most strongly impacted the frequency of cyberbullying.  She conducts an OLS regression.  What statistic do you advise her to use in her discussion?  Why?

In: Math

The following sample observations were randomly selected. x: 6, 5, 4, 6, 10 y: 4, 6,...

The following sample observations were randomly selected. x: 6, 5, 4, 6, 10 y: 4, 6, 5, 2, 11. Determine the 0.95 confidence interval for the mean predicted when x=9. Determine the 0.95 prediction interval for an individual predicted when x=9.

In: Math

Per the textbook, to eliminate or reduce non-value-added work is a core step in improving profitability...

Per the textbook, to eliminate or reduce non-value-added work is a core step in improving profitability or efficiency of the business process. Give your opinion on whether or not you agree or disagree with this statement and include one (1) example of a business process which supports or criticizes the aforementioned statement to support your position. Determine at least two (2) challenges in identifying Opportunity for Improvements (OFIs). Suggest at least one (1) strategy that business management can use to mitigate the challenges in question. Provide a rationale to support your suggestion.

In: Math

bag contains 7 red marbles, 5 white marbles, and 8 blue marbles. You draw 5 marbles...

bag contains 7 red marbles, 5 white marbles, and 8 blue marbles. You draw 5 marbles out at random, without replacement. What is the probability that all the marbles are red?

The probability that all the marbles are red is  .

What is the probability that exactly two of the marbles are red?


What is the probability that none of the marbles are red?

In: Math

Bayesian analysis is a probabilistic classification method based on Bayes' theorem (postulated by an English Mathematician...

Bayesian analysis is a probabilistic classification method based on Bayes' theorem

(postulated by an English Mathematician named Thomas Bayes). In summary, the

Bayesian theory rests on the belief that the evidence about the true state of a given

problem can be expressed in term of the degree of understanding of the underlying

issues. That degree of understanding can in turn be expressed in terms of probability.

And based on the observation, the theorem provides the relationship between the

probabilities of two possible events and their conditional probabilities.

The formula for Bayes Theorem is given as: P(A|B) = P(B|A) * P(A)/P(B)

Question:

Discuss the reasons for using Bayesian analysis when faced with uncertainty in

making decisions.

Discussion Requirements:

How would you describe Bayesian Theorem?

Describe the assumptions of Bayesian analysis.

Provide the example of problem where one can use Bayesian analysis in Big Data Analytics.

Describe the the problems with Bayesian analysis.

In: Math

The company Dilmart has 8,000 pieces in inventory. The average dollar value of these pieces is...

The company Dilmart has 8,000 pieces in inventory. The average dollar value of these pieces is $ 10.79 with a standard deviation equal to $ 3.34. Suppose the inventory manager selected a random sample of n = 64 pieces of inventory and found a sample mean equal to $ 11.27. The probability of obtaining a sample mean of at least $ 11.27 is approximately 0.444. Select one: True False

In: Math