In: Statistics and Probability
An essay on importance of hypothesis testing and confidence intervals in ease of doing business
Business of today is driven solely by data. Data drives any aspect of a business. Today all key decisions of a business are taken on basis of analysis of data .In a business, a manager takes decisions on a daily basis which may impact the efficiency and success of his/her company. To monitor progress and stay competitive, most organizations collect and store large amounts of data. While it is a universal fact that data collected can be used to make important business decisions, it is also a reality that data storage itself doesn’t guarantee improved decision making. Instead it is pertinent to correctly and critically analyze and interpret the data before applying it to improve business performance. Hence in a business ,the significance of the validity of the data, the significance of the interpretation of the data and the significance of our decision based on the inference provided by the data is as much important as the availability of the data itself. Wise business leaders conduct formal and informal research to inform their business decisions. Good research starts with a good hypothesis, which is simply a statement making a prediction based on a set of observations.
A tool to analyse the significance of the inference of any data is hypothesis testing and the confidence intervals.
So what actually is a hypothesis testing and what is the confidence interval attached with it ?
Wikipedia defines a hypothesis testing as a statistical test which is used to determine whether the hypothesis/assumption assumed for the sample of data stands true for the entire population or not, or whether the hypothesis /assumption made for a given set of samples holds true or not. . Simply put , the hypothesis is an assumption which is tested to determine the relationship between two data sets.
The confidence intervals are the extended definition of the hypothesis testing and is nothing but the boundary under which the given assumption holds good.
Wiki defines confidence interval as a type of interval estimate which is computed from the statistics of the observed data, that might contain the true value of an unknown population parameter. The interval has an associated confidence level that, loosely speaking, quantifies the level of confidence that the parameter lies in the interval. More strictly speaking, the confidence level represents the frequency (i.e. the proportion) of possible confidence intervals that contain the true value of the unknown population parameter. In other words, if confidence intervals are constructed using a given confidence level from an infinite number of independent sample statistics, the proportion of those intervals that contain the true value of the parameter will be equal to the confidence level.
Confidence intervals consist of a range of potential values of the unknown population parameter. However, the interval computed from a particular sample does not necessarily include the true value of the parameter. Based on the (usually taken) assumption that observed data are random samples from a true population, the confidence interval obtained from the data is also random.
The confidence level is designated prior to examining the data. Most commonly, the 95% confidence level is used.[4] However, other confidence levels can be used, for example, 90% and 99%.
Let us try to understand the hypothesis testing with the help of an example. Let there be a company which needs to investigate if the carbon emission levels maintained by it are comparable to the global set norms or not.
Now how will it go about this assertion ?
It will first of all make an assumption . This assumption is called as the hypothesis or a null hypothesis to be very precise. In this assumption , the company makes an assumption that it wants to believe in or which it wants to check. Suppose it assumes that the carbon emission levels maintained by it are equal to the global permissible norms set. In other words, it is assumed that both are equal .
So in this case, the company may take a sample reading over a period say six months, or 1 year or any reasonable duration in continuity and find its mean emission levels.
Now it will assume that this sample mean is equal to the global limit/norm.
This will be the null hypothesis.
Just as an assumption holds a counter assumption as well, the null hypothesis also holds a counter hypothesis called as an alternate hypothesis. This says that the null hypothesis is not true. This can have three states thus . (the two means are not equal ,first greater than second ,second greater than first).
So the company has now modelled its hypothesis model with null and alternate hypotheses.
Now the assumption is needed to be checked statistically. This is done by finding a test statistic which is nothing but the location of the mean in consideration(company’s mean carbon emission levels) to the standard curve of the distribution. Post assertion of the test statistic , it is compared to the critical limit of its value . In other words, the test statistic value is checked with respect to the value (called critical value) which is predetermined based on the level of significance of the statement .If the company needs to check validity of its claim with 95 % confidence then it will have to compare its test statistic with the critical value of test statistic at 95 % levels ( It is obtained from z statistic table). Another approach of comparison is to compare the p values ( indicator of percentage of truthness of assumption) to the critical p value at 95% confidence (equal 0.05)
If the values are lower than critical values then we assert that out null hypothesis is siginifcant and cant be rejected.
Thus, this is how the claim or assumption of any belief /idea/understand is verified statistically through the hypothesis testing.
Another example for a deeper understand of application and importance of hypothesis testing in business decisions
Hypothesis testing has many business applications. Let's take quality control for example. Suppose a company makes toys which must have a width of 150 mm, with a small tolerance. In this case the basic hypotheses might be:
H0: The toy sizes equal 150
HA: The toy sizes do not equal 150.
For the statistics used in the test, toy sizes will randomly sample and measure the average toy size over 30 production runs. They want a fairly strict assurance that the samples are close to the required value, so they use a 5% (0.05) significance level for evaluation.
They measure the average of the samples and their standard deviation. As can be expected, the sample values will be normally distributed about the average, so it is easy to derive the Z-statistic for the data. This value reflects the number of standard deviations the measured value is from the average.
Assume we derived the following values from our test:
Average = 150.12
Standard Deviation = 0.496
Z-statistic = 1.23
The Z-statistic of 1.23 falls well within the 0.05 value significance level, which occurs at 2.576 standard deviations. Thus for toys ,it can be said that the result is statistically significant at the 5% probability value, in other words, the measured value is relatively likely to occur as a matter of chance. However it is stated, if the value falls inside the range, the null hypothesis can be accepted.
Let's visualize the concept using a normal distribution plot. We can accept the null hypothesis if the measured data falls inside the vertical lines, which represent the statistical significance level we have chosen for the comparison. Outside those lines, we reject the null hypothesis.
Null Hypothesis Testing |
In the nutshell we can enumerate the following benefits of hypothesis testing and confidence intervals for businesses
Thus to conclude, it can be said that the hypothesis testing and the relevant confidence intervals help a business to be identify its status and performance (good/bad) against the competition or the norms and it provides it with the optimal limits of targets for optimal performances. It gives a basis for an organization to take decision and stand by it. It provides insights into course correction if any or to continue the good work based on the scenario it might be in. Hypothesis testing is the future and if any business has still not adopted it , it will surely be making lots of guesses and speculations in its decisions. It is a must have to tool for any business which is looking for an information based decision making .