Characteristics of an ANOVA (for both one-way and repeated measure) Finding df Finding sample size(n) from...

Characteristics of an ANOVA (for both one-way and repeated measure)
- Finding df
- Finding sample size(n) from df
- Finding sample size(N) from df
- Characteristics of a f-distribution
- N vs n
- What affects the size of your f-ratio
- What is MS
Correlation
- meaning of
- calculation
- interpretation
- SP

FREE RESPONSE Questions

Calculating an ANOVA (such as finding df, SS, MS, etc)
Interpreting SPSS output
MORE calculating an ANOVA
Correlation
APA formatting

- This is a study guide for my statistics final and I would like if anyone can show a few examples on these or on some it would be much appreciated! (:

Solutions

Expert Solution

ΣX2 is called the crude sum of squares
(ΣX)2 / N is the CM (correction for the mean), or CF (correction factor)
ΣX2 – (ΣX)2 / N is termed SS (total sum of squares, or corrected SS).
σ2(variance)==
In the one-way ANOVA, the total variation in the data has two parts: the variation among treatment means and the variation within treatments.
The grand average GM = ΣX/N
The total SS (Total SS) is then:
Total SS = Σ(Xi – GM)2 Where Xi is any individual measurement.
Total SS = SST + SSE Where SST = treatment sum of squares and SSE is the experimental error sum of squares.
Sum of the squared deviations of each treatment average from the grand average or grand mean.
Sum of the squared deviations of each individual observation within a treatment from the treatment average.For the ANOVA calculations:
Total Treatment CM Σ(TCM)=
SST = Σ(TCM) – CM
SSE = Total SS – SST (Always obtained by difference)
Total DF = N – 1 (Total Degrees of Freedom)
TDF = K – 1 (Treatment DF = Number of treatments minus 1)
EDF = (N – 1) – (K – 1) = N – K (Error DF, always obtained by difference)
MST =SST/TFD=SST/(K-1) (Mean Square Treatments)
MSE = SSE/EDF=SSE/(N-K) (Mean Square Error)To test the null hypothesis:
H0 : μ1 = μ2 = μ3………… = μk H1 : At least one mean different
F = MST/MSE     When F > Fα , reject H0
Example: As an example of a comparison of three means, consider a single factor experiment: The following coded results were obtained from a single factor randomized experiment, in which the outputs of three machines were compared. Determine if there is a significant difference in the results (α = 0.05).
ΣX=30 N=15 Total DF=N-1=15-1=14
GM = ΣX/N = 30/15 = 2.0
ΣX2  = 222 CM=(ΣX)2/N=(30)2/15 =60
Total SS = ΣX2 – CM = 222 – 60 = 162
Σ(TCM) = 197.2
SST = Σ(TCM) – CM =197.2 – 60 = 137.2 and
SSE = Total SS – SST = 162 – 137.2 = 24.8

The completed ANOVA table is:

Since the computed value of F (33.2) exceeds the critical value of F, the null hypothesis is rejected. Thus, there is evidence that a real difference exists among the machine means.
σe is the pooled standard deviation of within treatments variation. It can also be considered the process capability sigma of individual measurements. It is the variation within measurements which would still remain if the difference among treatment means were eliminated.

ANOVA Table for an A x B Factorial Experiment

In a factorial experiment involving factor A at a levels and factor B at b levels, the total sum of squares can be partitioned into:
Total SS = SS(A) + SS(B) + SS(AB) + SSE

ANOVA Table for a Randomized Block Design

The randomized block design implies the presence of two independent variables, blocks and treatments. The total sum of squares of the response measurements can be partitioned into three parts, the sum of the squares for the blocks, treatments, and error. The analysis of a randomized block design is of less complexity than an A x B factorial experiment.
Goodness-of-Fit Tests

GOF (goodness-of-fit) tests are part of a class of procedures that are structured in cells. In each cell there is an observed frequency, (Fo). From the nature of the problem, one either knows the expected or theoretical frequency, (Fe) or can calculate it. Chi square (χ2) is then summed across all cells according to the  formula:The calculated chi square is then compared to the chi square critical value for the following appropriate degrees of freedom:

Uniform Distribution (GOF):

Example: Is a game die balanced? The null hypothesis, H0, states the die is honest and balanced. When a die is rolled, the expectation is that each side should come up an equal number of times. It is obvious there will be random departures from this theoretical expectation if the die is honest. A die was tossed 48 times with the following results:

The calculated chi square is 8.75. The critical chi square χ20.05,5 = 11.07. The calculated chi square does not exceed critical chi square. Therefore, the hypothesis of an honest die cannot be rejected. The random departures from theoretical expectation could well be explained by chance cause.

Normal Distribution (GOF):

Example: The following data (105 observations) is taken from an– R chart. There is sufficient data for ten cells. The alternative would be six cells which is too few. Twelve integer cells fit the range of the data. The null hypothesis: the data was obtained from a normal distribution. = 15.4, sigma = 1.54, number of effective cells = 6, DF = 3 and χ20.05,3 = 7.81

One degree of freedom is lost because estimates μ. A second degree of freedom is lost because SD estimates sigma. A third degree of freedom is lost because sample N represents the population.
Col A: The cell boundaries are one half unit from the cell midpoint.
Col B: The cell middle values are integers.
Col C: The observed frequencies in each cell are Fo.
Col D: Distances from are measured from cell boundaries.
Col E: Distances from are divided by SD to transform distances into 2 units.
Col F: 2 units are converted into cumulative normal distribution probabilities.
Col G: The theoretical probability in each cell is obtained by taking the difference between cumulative probabilities in Column F. The top cell theoretical probability boundary is 1.0000.
Col H: The theoretical frequency in each cell is the product of N and Column G.
Col l: Each cell is required to have a theoretical frequency equal to or greater than four. Therefore, the top four cells must be added to the cell whose midpoint is 18. The bottom three cells must be added to the cell whose midpoint is 13. Thus, there are six effective cells, all of which have a theoretical frequency equal to or greater than four.
Col J: The observed frequency cells must be pooled to match the theoretical frequency cells. It does not matter if the observed frequencies are less than four.
Col K: The contributions to chi square are obtained by squaring the difference between Column I and Column J and dividing by Column l.
Conclusion: Since the calculated chi square, 6.057, is less than critical chi square, 7.81, we fail to reject the null hypothesis of normality, and therefore, conclude that the data is from a normal distribution. ,

Poisson Distribution (GOF)

Example: The bead drum is an attribute variable, random sample generating device, which was used to obtain the following data. In this exercise red beads represent defects. Seventy-five constant size samples were obtained. The goodness-of-fit test is analyzed based on sample statistics. The null hypothesis is that the bead drum samples represent a Poisson distribution.
N = 75
Sample Avg = 269/75 = 3.59
DF = 7 – 2 = 5
χ20.05,5 = 11.07
One degree of freedom is lost because (sample average = 3.59) estimates μ. A second degree of freedom is lost because N (number of samples) estimates the population.
Col A: Values of c which matched the actual distribution of sample defects found.
Col B: The probability that c defects would occur given the average value of the samples.
Col C: The theoretical number of defects that would occur (N x Col B).
Col D: The observed frequency of each number of defects.
Col E: The required minimum frequency of four for each effective cell resulted in pooling at both tails of the theoretical Poisson distribution.
Col F: The observed frequency distribution of defects must also be pooled to match the effective theoretical distribution.
Col G: The contributions to chi square are obtained from squaring the difference between Fe and Fo and dividing the result by Fe.
Conclusion: Since the calculated chi square of 4.47 is less than the critical chi square value of 11.07 at the 95% confidence level, we fail to reject the null hypothesis that the bead drum samples represent a Poisson distribution.

Binomial Distribution (GOF)

Example: The null hypothesis states that the following industrial sample data comes from a binomial population of defectives (N = 80). In this case, we will estimate the probability of a defective from the sample data, p = 0.025625.
One degree of freedom is. lost because the total sample frequency represents the population. A second degree of freedom is lost because is used to estimate μ :
Col A: The range of defectives matching the observed sample data.
Col B: The probability of observed cell defective count given sample size N and d.
Col C: The expected theoretical frequency (cell probability)(N).
Col D: The observed cell frequency count from the 80 samples.
Col E: Theoretical frequency with cells pooled to meet n = 4 minimum.
Col F: Observed cell frequency pooled to match theoretical frequency pooled cells.
Col G: Contributions to chi square (Fe – Fo)2/Fe.
Col H: The count of defectives by cell (d)(Fo).
Conclusion: The calculated chi square = 13.30. The critical chi square = 9.49. Since the calculated value is greater than the critical value, the null hypothesis that the sample data represents the binomial distribution is rejected at the 95% confidence level.
Contingency Tables

A two-way classification table (rows and columns) containing original frequencies can be analyzed to determine whether the two variables (classifications) are independent or have significant association. R. A. Fisher determined that when the marginal totals (of rows and columns) are analyzed in a certain way, that the chi square procedure will test whether there is dependency between the two classifications. In addition, a contingency coefficient (correlation) can be calculated. If the chi square test shows a significant dependency, the contingency coefficient shows the strength of the correlation. It often happens that results obtained in samples do not always agree exactly with the theoretical expected results according to rules of probability. A measure of the
difference found between observed and expected frequencies is supplied by the statistic chi square, χ2, where:
If χ2 = 0, the observed and theoretical frequencies agree exactly. If χ2 > 0, they do not agree exactly. The larger the value of χ2, the greater the discrepancy between observed and theoretical frequencies. The chi square distribution is an appropriate reference distribution for critical values when the expected frequencies are at least equal to 5.
Example: The calculation for the E (expected or theoretical) frequency will be demonstrated in the following example. Five hospitals tried a new drug to alleviate the symptoms of emphysema. The results were classified at three levels: no change, slight improvement, marked improvement. The percentage matrix is shown in Table below. While the results expressed as percentages do suggest differences among hospitals, ratios presented as percentages can be misleading.
A proper analysis requires that original data be considered as frequency counts. Table below lists the original data on which the percentages are based.The calculation of expected, or theoretical, frequencies is based on the marginal totals. The marginal totals for the frequency data are the column totals, the row totals, and the grand total. The null hypothesis is that all hospitals have the same proportions over the three levels of classifications. To calculate the expected frequencies for each of the 15 cells under the null hypothesis requires the manipulation of the marginal totals as illustrated by the following calculation for one cell. Consider the count of 15 for Hospital Alno change cell. The expected value, E,is:

The same procedure repeated for the other 14 cells yields

Each of these 15 cells makes a contribution to chi square (χ2). For the same selected (illustrative) cell, the contribution is:ChI Square over all cells.

Assume alpha to be 0.01. The degrees of freedom for contingency tables is: d.f. = (rows – 1) x (columns -1).

For this example: d.f. = (5 – 1) x (3 – 1) = 8

The critical chi square: χ20.01,8 = 20.09
The calculated chi square is larger than critical chi square. Therefore, one rejects the null hypothesis of hospital equality of results. The alternative hypothesis is that hospitals differ.

Coefficient of Contingency (C)

The degree of relationship, association or dependence of the classifications in a contingency table is given by:Where N equals the grand frequency total.
The contingency coefficient is:==0.38

The maximum value of C is never greater than 1.0, and is dependent on the total number of rows and columns. For the example data, the maximum coefficient of contingency is:Where: k = min of (r, c) and r = rows, c = columns
There is a Yates correction for continuity test that can be performed when the contingency table has exactly two columns and two rows. That is, the degrees of freedom is equal to 1.

Correlation of Attributes

Contingency table classifications often describe characteristics of objects or individuals. Thus, they are often referred to as attributes and the degree of dependence, association, or relationship is called correlation of attributes. For (k = r = c) tables, the correlation coefficient, φ, is defined as:The value of φ falls between 0 and 1. If the calculated value of chi square is significant, then φ is significant. In the above example, rows and columns are not equal and the correlation calculation is not applied.
Col H: Total defects found result from the product of number of defects and observed frequency.