In: Statistics and Probability
When large samples are at hand one may make fewer a-priori
assumptions regarding the exact form of the distribution of the
measurement. General limit theorems, such as the Central Limit
Theorem, may be used in order to establish the validity of the
inference under general conditions. On the other hand, for small
sample sizes, one must make strong assumptions with respect to the
distribution of the observations in order to justify the validity
of the procedure.
It may be claimed that making statistical inferences when the
sample size is small is worthless. How can one trust
conclusions that depend on assumptions regarding the distribution
of the observations, like assumptions that cannot be verified? What
is your opinion?
For illustration consider the construction of a confidence
interval. The confidence interval for the expectation is
implemented with a specific formula. The significance level of the
interval is provable when the sample size is large or when the
sample size is small but the observations have a Normal
distribution. If the sample size is small and the observations have
a distribution different from the Normal then the nominal
significance level may not coincide with the actual significance
level.
It is important to examine the data's distribution when the sample size is small. If the sample size is less than 30 then it must be check the distribution of the data weather the sample coming from normal population or not. On the other hand if the sample size is large then there is no need to check the distribution of the data.
If someone want to know the distribution of data i.e. it is normal distribution or not then he/she will perform the test for normality like Kolmogorov Smirnov or Shapiro Wilk test.
The hypothesis of the test in both the test is
Ho: The data is normally distributed
Ha: The data is not normally distributed
If the test probability p-value is less than the level of significance i.e. generally 0.05 then we reject the null hypothesis and conclude that the data is not normal.
If the data is not normal and fail to fulfil the assumption of parametric test then we will choose the test from non-parametric method like Wilcoxon sign rank sum test, Wilcoxon sign rank test Kruskal Wallis test etc.
The parametric test are most powerful than the non parametric because in the non parametric the acctual data is not consider in the analysis. The ranks of data are used in the non parametric test for the analysis.