In: Statistics and Probability
FREE RESPONSE Questions
- This is a study guide for my statistics final and I would like if anyone can show a few examples on these or on some it would be much appreciated! (:
Example: As an example of a comparison of three means,
consider a single factor experiment: The following coded results
were obtained from a single factor randomized experiment, in which
the outputs of three machines were compared. Determine if there is
a significant difference in the results (α = 0.05).
ΣX=30 N=15 Total DF=N-1=15-1=14
GM = ΣX/N = 30/15 = 2.0
ΣX2 = 222 CM=(ΣX)2/N=(30)2/15 =60
Total SS = ΣX2 – CM = 222 – 60 = 162
Σ(TCM) = 197.2
SST = Σ(TCM) – CM =197.2 – 60 = 137.2 and
SSE = Total SS – SST = 162 – 137.2 = 24.8
The completed ANOVA table is:
Since the computed value of F (33.2) exceeds the
critical value of F, the null hypothesis is rejected. Thus, there
is evidence that a real difference exists among the machine
means.
σe is the pooled standard deviation of within treatments variation.
It can also be considered the process capability sigma of
individual measurements. It is the variation within measurements
which would still remain if the difference among treatment means
were eliminated.
ANOVA Table for an A x B Factorial Experiment
In a factorial experiment involving factor A at a levels
and factor B at b levels, the total sum of squares can be
partitioned into:
Total SS = SS(A) + SS(B) + SS(AB) + SSE
ANOVA Table for a Randomized Block Design
The randomized block design implies the presence of two
independent variables, blocks and treatments. The total sum of
squares of the response measurements can be partitioned into three
parts, the sum of the squares for the blocks, treatments, and
error. The analysis of a randomized block design is of less
complexity than an A x B factorial experiment.
Goodness-of-Fit Tests
GOF (goodness-of-fit) tests are part of a class of procedures that are structured in cells. In each cell there is an observed frequency, (Fo). From the nature of the problem, one either knows the expected or theoretical frequency, (Fe) or can calculate it. Chi square (χ2) is then summed across all cells according to the formula:The calculated chi square is then compared to the chi square critical value for the following appropriate degrees of freedom:
Uniform Distribution (GOF):
Example: Is a game die balanced? The null hypothesis, H0, states the die is honest and balanced. When a die is rolled, the expectation is that each side should come up an equal number of times. It is obvious there will be random departures from this theoretical expectation if the die is honest. A die was tossed 48 times with the following results:
The calculated chi square is 8.75. The critical chi square χ20.05,5 = 11.07. The calculated chi square does not exceed critical chi square. Therefore, the hypothesis of an honest die cannot be rejected. The random departures from theoretical expectation could well be explained by chance cause.
Normal Distribution (GOF):
Example: The following data (105 observations) is taken from an– R chart. There is sufficient data for ten cells. The alternative would be six cells which is too few. Twelve integer cells fit the range of the data. The null hypothesis: the data was obtained from a normal distribution. = 15.4, sigma = 1.54, number of effective cells = 6, DF = 3 and χ20.05,3 = 7.81
One degree of freedom is lost because estimates μ. A second degree of freedom is lost because SD estimates sigma. A third degree of freedom is lost because sample N represents the population.
Conclusion: Since the calculated chi square, 6.057, is less than critical chi square, 7.81, we fail to reject the null hypothesis of normality, and therefore, conclude that the data is from a normal distribution. ,
Poisson Distribution (GOF)
Example: The bead drum is an attribute variable, random
sample generating device, which was used to obtain the following
data. In this exercise red beads represent defects. Seventy-five
constant size samples were obtained. The goodness-of-fit test is
analyzed based on sample statistics. The null hypothesis is that
the bead drum samples represent a Poisson distribution.
N = 75
Sample Avg = 269/75 = 3.59
DF = 7 – 2 = 5
χ20.05,5 = 11.07
One degree of freedom is lost because (sample average = 3.59)
estimates μ. A second degree of freedom is lost because N (number
of samples) estimates the population.
Conclusion: Since the calculated chi square of 4.47 is less than the critical chi square value of 11.07 at the 95% confidence level, we fail to reject the null hypothesis that the bead drum samples represent a Poisson distribution.
Binomial Distribution (GOF)
Example: The null hypothesis states that the following
industrial sample data comes from a binomial population of
defectives (N = 80). In this case, we will estimate the probability
of a defective from the sample data, p = 0.025625.
One degree of freedom is. lost because the total sample frequency
represents the population. A second degree of freedom is lost
because is used to estimate μ
:
Contingency Tables
A two-way classification table (rows and columns)
containing original frequencies can be analyzed to determine
whether the two variables (classifications) are independent or have
significant association. R. A. Fisher determined that when the
marginal totals (of rows and columns) are analyzed in a certain
way, that the chi square procedure will test whether there is
dependency between the two classifications. In addition, a
contingency coefficient (correlation) can be calculated. If the chi
square test shows a significant dependency, the contingency
coefficient shows the strength of the correlation. It often happens
that results obtained in samples do not always agree exactly with
the theoretical expected results according to rules of probability.
A measure of the
difference found between observed and expected frequencies is
supplied by the statistic chi square, χ2, where:
If χ2 = 0, the observed and theoretical frequencies agree exactly.
If χ2 > 0, they do not agree exactly. The larger the value of
χ2, the greater the discrepancy between observed and theoretical
frequencies. The chi square distribution is an appropriate
reference distribution for critical values when the expected
frequencies are at least equal to 5.
Example: The calculation for the E (expected or theoretical)
frequency will be demonstrated in the following example. Five
hospitals tried a new drug to alleviate the symptoms of emphysema.
The results were classified at three levels: no change, slight
improvement, marked improvement. The percentage matrix is shown in
Table below. While the results expressed as
percentages do suggest differences among hospitals, ratios
presented as percentages can be misleading.
A proper analysis requires that original data be considered as
frequency counts. Table below lists the original data on which the
percentages are based.The calculation of expected, or
theoretical, frequencies is based on the marginal totals. The
marginal totals for the frequency data are the column totals, the
row totals, and the grand total. The null hypothesis is that all
hospitals have the same proportions over the three levels of
classifications. To calculate the expected frequencies for each of
the 15 cells under the null hypothesis requires the manipulation of
the marginal totals as illustrated by the following calculation for
one cell. Consider the count of 15 for Hospital Alno change cell.
The expected value, E,is:
The same procedure repeated for the other 14 cells yields
Each of these 15 cells makes a contribution to chi square (χ2). For the same selected (illustrative) cell, the contribution is:ChI Square over all cells.
Assume alpha to be 0.01. The degrees of freedom for contingency tables is: d.f. = (rows – 1) x (columns -1).
For this example: d.f. = (5 – 1) x (3 – 1) = 8
The critical chi square: χ20.01,8 = 20.09
The calculated chi square is larger than critical chi square.
Therefore, one rejects the null hypothesis of hospital equality of
results. The alternative hypothesis is that hospitals
differ.
Coefficient of Contingency (C)
The degree of relationship, association or dependence of
the classifications in a contingency table is given by:Where N equals the grand
frequency total.
The contingency coefficient is:==0.38
The maximum value of C is never greater than 1.0, and is
dependent on the total number of rows and columns. For the example
data, the maximum coefficient of contingency is:Where: k = min of (r, c) and r =
rows, c = columns
There is a Yates correction for continuity test that can be
performed when the contingency table has exactly two columns and
two rows. That is, the degrees of freedom is equal to
1.
Correlation of Attributes
Contingency table classifications often describe characteristics of objects or individuals. Thus, they are often referred to as attributes and the degree of dependence, association, or relationship is called correlation of attributes. For (k = r = c) tables, the correlation coefficient, φ, is defined as:The value of φ falls between 0 and 1. If the calculated value of chi square is significant, then φ is significant. In the above example, rows and columns are not equal and the correlation calculation is not applied.