In: Math
Question:
Discuss the reasons for using Bayesian analysis when faced with uncertainty in
making decisions.
Discussion Requirements:
How would you describe Bayesian Theorem?
Describe the assumptions of Bayesian analysis.
Provide the example of problem where one can use Bayesian analysis in Big Data Analytics.
Describe the the problems with Bayesian analysis.
Bayes theorem describes the probability of an event based on other information that might be relevant. Essentially, you are estimating a probability, but then updating that estimate based on other things that you know. This is something that you already do every day in real life. For instance, if your friend is supposed to pick you up to go out to dinner, you might have a mental estimate of if she will be on time, be 15 minutes late, or be a half hour late. That would be your starting probability. If you then look outside and see that there are 8 inches of new snow on the ground, you would update your probabilities to account for the new data. Bayes theorem is a formal way of doing that.
Example of Bayesian analyis in Big Data Analytics
In any application area where you have lots of heterogeneous or noisy data or anywhere you need a clear understanding of your uncertainty are areas that you can use Bayesian Statistics. From discussions with experts some of the areas that have seen early adoption have been e-commerce, insurance, finance and healthcare.
example, A represents the proposition that it rained today, and B represents the evidence that the sidewalk outside is wet:
p(rain | wet) asks, "What is the probability that it rained given that it is wet outside?" To evaluate this question, let's walk through the right side of the equation. Before looking at the ground, what is the probability that it rained, p(rain)? Think of this as the plausibility of an assumption about the world. We then ask how likely the observation that it is wet outside is under that assumption, p(wet | rain)? This procedure effectively updates our initial beliefs about a proposition with some observation, yielding a final measure of the plausibility of rain, given the evidence.
This procedure is the basis for Bayesian inference, where our initial beliefs are represented by the prior distribution p(rain), and our final beliefs are represented by the posterior distribution p(rain | wet). The denominator simply asks, "What is the total plausibility of the evidence?", whereby we have to consider all assumptions to ensure that the posterior is a proper probability distribution.
Bayesians are uncertain about what is true (the value of a KPI, a regression coefficient, etc.), and use data as evidence that certain facts are more likely than others. Prior distributions reflect our beliefs before seeing any data, and posterior distributions reflect our beliefs after we have considered all the evidence.
problems with Bayesian analysis