Question

In: Statistics and Probability

Problem 6: R simulation.Use the following codes to read the observed frequencies of how much the...

Problem 6: R simulation.Use the following codes to read the observed frequencies of how much the students smoke in dataset ”survey”.

library(MASS) # if you have not installed the package, run install.packages(”MASS”) first.

survey = na.omit(survey) # remove data with missing values

smoke.freq=table(survey$Smoke) # the frequencies of smoke

(1) Suppose the campus smoking statistics is as below. Determine whether the sample data in survey supports it at 0.05 significance level. (Hint: using the command chisq.test.) Heavy Never Occas Regul 0.045 0.795 0.085 0.075

(2) Describe the test associated with the command chisq.test(smoke.freq).

(3) Use the code table(survey$Sex, survey$Exer) to obtain a contingency table of the variables Sex (the sex of the student) and Exer (how often the student exercises). Test whether there is an association between the two variables .

Solutions

Expert Solution

R code to read the data (att statements starting with # are comments)

# if you have not installed the package, run install.packages(”MASS”) first.
library(MASS)
# remove data with missing values
survey = na.omit(survey)
#create a table of frequency
smoke.freq=table(survey$Smoke)
#print the table
print(smoke.freq)

# get the following output

1) The above output from the sample is the observed frequencies. Suppose probabilities of different level of smoking is as given below

Frequency of smoking Probabilities
Heavy 0.045
Never 0.795
Occas 0.085
Regul 0.075

We want to test if the observed frequencies from the sample support these probabilities

We want to test the following hypotheses

We need to use chi-square test as the sample statistics to test the hypotheses.

R-Code to get the statistics and the p-vlaues

#Set the campus smoking statistics
prob<-c(0.045, 0.795, 0.085, 0.075)
#perform the chi square test
chisq.test(smoke.freq,p=prob)

# output is

The sample ch-square statistics is 0.3132 and the p-value is 0.9575. Since the p-value is greater than the significance level alpha = 0.05, we can not reject the null hypothesis.

We can conclude that there is no sufficient evidence to reject the claim that the the sample data in survey supports the campus smoking statistics.

That means we can say that the probabilities given in the table is a good description of the sample data in the survey.

2) When we use the command chisq.test(smoke.freq), we are not supplying the proposed probability distribution to the fucntion chisq.test() using the parameter "p=". This means that we are using the default probability distribution, which assumes that each class/group/level is equally likely or the groups are uniformly distributed. Basically we use the fillowing probabilities for each smoking frequencies

Frequency of smoking Probabilities
Heavy 0.25
Never 0.25
Occas 0.25
Regul 0.25

and we test if the above uniform probability distribution is a good description of the sample data in survey.

The hypotheses are the same

R code

#chisquare test against uniform distribution
chisq.test(smoke.freq)

# output is

the sample test staistics is 269.38 and the p-value is 2.2e-16. Since this p-value is less than the level of significance alpha = 0.05, we reject the null hypothesis.

We can conclude that there is sufficient evidence to support the claim that uniform distribution is not a good description of the sample data.

3) We want ot test if there is an association between the variables Sex of the student and how often the student exercises.

We want to test the following hypotheses

We will do chi-square test to test the association.

We use the following R Code

# get the contigency table of the variables Sex and Exer
tab<-table(survey$Sex, survey$Exer)
print(tab)
#perform the chi square test
chisq.test(tab)

# get the ouput

The test statistics is 4.1585 and the p-value is 0.125

Since the p-value is greater than the level of significance alpha = 0.05, we can not reject the null hypothesis.

That means we conclude that there is no evidence to support the claim that there is an association between the variables Sex of the student and how often the student exercises.


Related Solutions

The observed frequencies of sales of different colours of cars are shown in the following table:...
The observed frequencies of sales of different colours of cars are shown in the following table: Category colour Black Red Green White Blue Observed Frequencies 20 30 25 25 48    Calculate the chi-square test statistic to test the claim of equal probabilities. (Round off the answer to 2 decimal digits.)
The observed frequencies of sales of different colours of cars are shown in the following table:...
The observed frequencies of sales of different colours of cars are shown in the following table: Category colour Black Red Green White Blue Observed Frequencies 20 30 25 25 46    Calculate the chi-square test statistic to test the claim of equal probabilities.
The observed frequencies of sales of different colours of cars are shown in the following table:...
The observed frequencies of sales of different colours of cars are shown in the following table: Category colour Black Red Green White Blue Observed Frequencies 20 30 25 25 48    Calculate the chi-square test statistic to test the claim of equal probabilities. (Round off the answer to 2 decimal digits.)
What are expected vs. observed frequencies? How are they computed? What are row and column marginals?...
What are expected vs. observed frequencies? How are they computed? What are row and column marginals? What values can the chi square test statistic take on? Describe the sampling distribution for chi square. How do you calculate degrees of freedom in chi square? In a 2x2 table, if one cell frequency is known, what else can you find?
The eigenvalue is a measure of how much of the variance of the observed variables a...
The eigenvalue is a measure of how much of the variance of the observed variables a factor explains. Any factor with an eigenvalue ≥1 explains more variance than a single observed variable, so if the factor for socioeconomic status had an eigenvalue of 2.3 it would explain as much variance as 2.3 of the three variables. This factor, which captures most of the variance in those three variables, could then be used in another analysis. The factors that explain the...
The eigenvalue is a measure of how much of the variance of the observed variables a...
The eigenvalue is a measure of how much of the variance of the observed variables a factor explains. Any factor with an eigenvalue ≥1 explains more variance than a single observed variable, so if the factor for socioeconomic status had an eigenvalue of 2.3 it would explain as much variance as 2.3 of the three variables. This factor, which captures most of the variance in those three variables, could then be used in another analysis. The factors that explain the...
The following table contains observed frequencies for a sample of 200. Column Variable Row Variable A...
The following table contains observed frequencies for a sample of 200. Column Variable Row Variable A B C P 40 48 45 Q 10 22 35 Test for independence of the row and column variables using a=.05. Compute the value of the test statistic (to 2 decimals). The -value is - Select your answer -less than .005between .005 and .01between .01 and .025between .025 and .05between .05 and .10greater than .10 What is your conclusion? - Select your answer -Cannot...
The following table contains observed frequencies for a sample of 200. Row Variable Column Variable A...
The following table contains observed frequencies for a sample of 200. Row Variable Column Variable A B C P 16 40 46 Q 34 30 34 Test for independence of the row and column variables using α = 0.05. State the null and alternative hypotheses. H0: Variable P is not independent of variable Q. Ha: Variable P is independent of variable Q. H0: Variable P is independent of variable Q.Ha: Variable P is not independent of variable Q.     H0: The...
The following table contains observed frequencies for a sample of 200.   Column Variable Row Variable A...
The following table contains observed frequencies for a sample of 200.   Column Variable Row Variable A B C P 20 45 50 Q 30 26 29 Test for independence of the row and column variables using  α = .05.   Compute the value of the  Χ 2 test statistic (to 2 decimals).
The following table contains observed frequencies for a sample of 200. Column Variable Row Variable A...
The following table contains observed frequencies for a sample of 200. Column Variable Row Variable A B C P 20 44 50 Q 30 26 30 Test for independence of the row and column variables using a= .05 Compute the value of the  test statistic (to 2 decimals).
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT