In: Statistics and Probability
Suppose we have a two-armed bandit. Our estimate of the payout rate of the first arm is 0.7, and our estimate of the payout rate for the second arm is 0.6. That is, ρ1 = 0.7 and ρ2 = 0.6. Our 95% confidence intervals for θ1 and θ2 are (0.6, 0.8) and (0.3, 0.9), respectively. Suppose you used an -greedy strategy with = 0.5. How might you decide which arm to pull? Suppose you used an interval exploration strategy with 95% confidence intervals. What arm would you pull?