In: Math
Explain what it means conceptually when we find that a correlation — or any other statistic that is testing a hypothesis — is significant. Refer to type I error, alpha, and confidence level.
Statistical Significance
When any experiment or research is done, statistical significance tests are done to check whether the result from data generated is not likely to occur randomly or by any chance. In order to determine the characteristics of population, we see the sample data and find how precise our estimate is.
The size of our sample dictates the amount of information we have and therefore, in part, determines our precision or level of confidence that we have in our sample estimates. The larger the size of sample, more the information we have and less will be the uncertainty.
The first step in conducting a test of statistical significance is to state the hypothesis.
Null Hypothesis - The claim tested by a statistical test is called the null hypothesis (H0). The test is designed to assess the strength of the evidence against the null hypothesis. Often the null hypothesis is a statement of “no difference.”
Alternate Hypothesis - The claim about the population that evidence is being sought for is the alternative hypothesis (Ha ).
Claim - The average marks of the class is 50.
Null Hypothesis(H0) μ = 50
Alternate hypothesis(Ha) μ≠50
When conducting a significance test, the goal is to provide evidence to reject the null hypothesis. If the evidence is strong enough to reject the null hypothesis, then the alternative hypothesis can automatically be accepted. However, if the evidence is not strong enough, researchers fail to reject the null hypothesis.
Test-Statistic (z score) -
A z-score indicates how many standard deviations an element is from the mean. A z-score can be calculated from the following formula.
z = (X - μ) / σ
P-Value
After computing the test statistic, the next step is to find out the probability of obtaining this score when the null hypothesis is true. The Normal curve helps researchers determine the percentage of individuals in the population who are located within certain intervals or above or below a certain score.To find this information, the score needs to be standardized. In the case of the example, this was already done by computing z, the test statistic.
P-Value and Statistical Significance It is important to know how small the p-value needs to be in order to reject the null hypothesis.
In order to solve that, we use alpha(Significance level) which is the probability to reject H0 when H0 is true.
when P-value > alpha ---- Accept the null hypothesis (Results are insignificant)
and P-value <=alpha --- Reject the null hypothesis (Results are significant)
Confidence interval is used to describe the amount of uncertainty associated with a sample estimate of a population parameter. A 90% confidence level means that we would expect 90% of the interval estimates to include the population parameter; a 95% confidence level means that 95% of the intervals would include the parameter; and so on.
Type I and Type II errors
Type I error, also known as a “false positive”: the error of rejecting a null hypothesis when it is actually true.
Type II error, also known as a "false negative": the error of not rejecting a null hypothesis when the alternative hypothesis is the true state of nature
We perform a hypothesis test of the significance of the correlation coefficient to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population. The same procedure of hypothesis testing explained above we follow while testing the significance of data.