In: Statistics and Probability
Suppose the DNA bases in a gene sequence follow the distribution:
DNA base |
Probability |
A |
1/3 |
C |
θ |
G |
1/3 |
T |
1/3 - θ |
In an experiment, the number of observed bases that are “A” or “C” in a gene sequence is x, and the number of observed bases that are “G” or “T” is y. The EM method is used to find the best value for the parameter θ. Describe the Expectation step for computing the expected numbers of A, C, G, and T bases and the Maximization step for estimating θ. Give formulas for the estimations and detailed steps about how to obtain the formulas.