In: Math
Generate a simulated data set with 100 observations based on the following model. Each data point is a vector Z= (X, Y) where X describes the age of a machine New, FiveYearsOld, and TenYearsOld and Y describes whether the quality of output from the machine Normal or Abnormal. The probabilities of a machine being in the three states are
P(X = New) = 1/4
P(X = FiveYearsOld) = 1/3
P(X = TenYearsOld) = 5/12
The probabilities of Normal output conditioned are machine age are
P(Y = Normal | X= New) = 8/10
P(Y = Normal | X= FiveYearsOld) = 8/10
P(Y = Normal | X= TenYearsOld) = 4/10
Your data should consist of two vectors Y and Z both of which are of class character. Convert these to factors using the as.factor function. Analyze your simulated data using the chisq.test function with inputs x=x, y=y. Perform the analysis with the exact same function, but with simulated p-values using the inputs x=x, y=y, simulate.p.values=TRUE, B=10000. Would you trust the p-values from the asymptotic distribution or the simulated p-values more? What conclusions can you draw about your simulated data from this analysis?