In: Statistics and Probability
What could happen if you are analyzing some data to solve an engineering problem, but the data was not taken correctly and does not represent the population size?
1. Nonparametric Statistics can be used. If we have a basic knowledge of the underlying distribution of a variable, then we can make predictions about how, in repeated samples of equal size, this particular statistic will "behave," that is, how it is distributed.
2. For example, if we draw 100 random samples of 100 adults each from the general population, and compute the mean height in each sample, then the distribution of the standardized means across samples will likely approximate the normal distribution. Now imagine that we take an additional sample in a particular city ("Tallburg") where we suspect that people are taller than the average population. If the mean height in that sample falls outside the upper 95% tail area of the t distribution then we conclude that, indeed, the people of Tallburg are taller than the average population.
3. Nonparametric methods were developed to be used in cases when the researcher knows nothing about the parameters of the variable of interest in the population (hence the name nonparametric). In more technical terms, nonparametric methods do not rely on the estimation of parameters (such as the mean or the standard deviation) describing the distribution of the variable of interest in the population. Therefore, these methods are also sometimes (and more appropriately) called parameter-free methods or distribution-free methods.
Nonparametric methods are most appropriate when the sample sizes are small. When the data set is large (e.g., n > 100) it often makes little sense to use nonparametric statistics at all.
4. Comparing Means: If your data is generally continuous (not binary), such as task time or rating scales, use the two sample t-test. It’s been shown to be accurate for small sample sizes.
Comparing Two Proportions: If your data is binary (pass/fail, yes/no), then use the N-1 Two Proportion Test. This is a variation on the better known Chi-Square test (it is algebraically equivalent to the N-1 Chi-Square test).
5. Studies involving fMRIs, which cost a lot to operate, have limited sample sizes as well
Large sample data than required increases the accuracy of the results.