In: Statistics and Probability
Problem 6: R simulation.Use the following codes to read the observed frequencies of how much the students smoke in dataset ”survey”.
library(MASS) # if you have not installed the package, run install.packages(”MASS”) first.
survey = na.omit(survey) # remove data with missing values
smoke.freq=table(survey$Smoke) # the frequencies of smoke
(1) Suppose the campus smoking statistics is as below. Determine whether the sample data in survey supports it at 0.05 significance level. (Hint: using the command chisq.test.) Heavy Never Occas Regul 0.045 0.795 0.085 0.075
(2) Describe the test associated with the command chisq.test(smoke.freq).
(3) Use the code table(survey$Sex, survey$Exer) to obtain a contingency table of the variables Sex (the sex of the student) and Exer (how often the student exercises). Test whether there is an association between the two variables .
R code to read the data (att statements starting with # are comments)
# if you have not installed the package, run
install.packages(”MASS”) first.
library(MASS)
# remove data with missing values
survey = na.omit(survey)
#create a table of frequency
smoke.freq=table(survey$Smoke)
#print the table
print(smoke.freq)
# get the following output
1) The above output from the sample is the observed frequencies. Suppose probabilities of different level of smoking is as given below
Frequency of smoking | Probabilities |
Heavy | 0.045 |
Never | 0.795 |
Occas | 0.085 |
Regul | 0.075 |
We want to test if the observed frequencies from the sample support these probabilities
We want to test the following hypotheses
We need to use chi-square test as the sample statistics to test the hypotheses.
R-Code to get the statistics and the p-vlaues
#Set the campus smoking statistics
prob<-c(0.045, 0.795, 0.085, 0.075)
#perform the chi square test
chisq.test(smoke.freq,p=prob)
# output is
The sample ch-square statistics is 0.3132 and the p-value is 0.9575. Since the p-value is greater than the significance level alpha = 0.05, we can not reject the null hypothesis.
We can conclude that there is no sufficient evidence to reject the claim that the the sample data in survey supports the campus smoking statistics.
That means we can say that the probabilities given in the table is a good description of the sample data in the survey.
2) When we use the command chisq.test(smoke.freq), we are not supplying the proposed probability distribution to the fucntion chisq.test() using the parameter "p=". This means that we are using the default probability distribution, which assumes that each class/group/level is equally likely or the groups are uniformly distributed. Basically we use the fillowing probabilities for each smoking frequencies
Frequency of smoking | Probabilities |
Heavy | 0.25 |
Never | 0.25 |
Occas | 0.25 |
Regul | 0.25 |
and we test if the above uniform probability distribution is a good description of the sample data in survey.
The hypotheses are the same
R code
#chisquare test against uniform distribution
chisq.test(smoke.freq)
# output is
the sample test staistics is 269.38 and the p-value is 2.2e-16. Since this p-value is less than the level of significance alpha = 0.05, we reject the null hypothesis.
We can conclude that there is sufficient evidence to support the claim that uniform distribution is not a good description of the sample data.
3) We want ot test if there is an association between the variables Sex of the student and how often the student exercises.
We want to test the following hypotheses
We will do chi-square test to test the association.
We use the following R Code
# get the contigency table of the variables Sex and Exer
tab<-table(survey$Sex, survey$Exer)
print(tab)
#perform the chi square test
chisq.test(tab)
# get the ouput
The test statistics is 4.1585 and the p-value is 0.125
Since the p-value is greater than the level of significance alpha = 0.05, we can not reject the null hypothesis.
That means we conclude that there is no evidence to support the claim that there is an association between the variables Sex of the student and how often the student exercises.