In: Statistics and Probability
List and explain four ways in which hypothesis testing using big data can improve competitive advantage and decision making for businesses.
hypothesis testing using big data:
The people who are doing bussiness make decisions on a daily basis that affect the efficiency and success of their companies . To monitor progress and stay competitive, most organizations collect and store large amounts of data. While the data collected can be used to make important business decisions, data storage itself does not equate to improved decision making. Rather,its is crucial to correctly make analysis and interpret the data before implementing it to improve bussiness.
One important way of analyzing a set of data is through hypothesis testing. Hypothesis testing uses sample statistics to test a claim about a data parameter. For example, company wants to increase sales by funding a new marketing campaign. It uses the campaign in one region of the country to test the effects and begins collecting data to see whether the campaign has had its desired effect. Your company will only continue the new campaign nationally if it believes sales will rise by more than 25%,To test the claim, you can perform hypothesis testing on the sample data collected from the test region.After testing the data we can give that, is that compaign is really have effect or not on the sales.
in the case of Predictive Analytics:
when we are predicting the data,using certain statistical hypothesis based model we want large data that also know as big data and when we have big data,we can fit a model with high accuracy.and we can predict the results with high accurate and reliable that can increase business insight.
When comparing the means:
when we are having two small data sets,it's easy to compare the means of two groups and we can give conclusion using stated hypothesis easyly.But when we are having large amount of data(trillions of rows) and we need to find the mean and implement the hypothesis,its difficult to find mean of the each group's ,and stating which one is good.So here we need Bigdata to proccess the data.
Machine Learning:
Various statistical methods like regression model ,random Forest and so on machine learning technic used for data analysis make assumptions about normality, including correlation, regression, t-tests, and analysis of variance. Central limit theorem states that when sample size has 100 or more observations, violation of the normality is not a major issue.So if we deal with big data then we will get normality distributed data by central limit theorem.