In: Statistics and Probability
Nucleotide Pairs
The human genome is composed of the four DNA nucleotides: A, T, G, and C.
Some regions of the human genome are extremely G–C rich (i.e., a high proportion of the DNA nucleotides there are guanine, G, and cytosine, C).
Other regions are relatively A–T rich (i.e., a high proportion of the DNA nucleotides there are adenine, A, and thymine, T).
Imagine that you want to compare nucleotide sequences from two regions of the genome.
Sixty percent of the nucleotides in the first region are G–C (30% each of guanine and cytosine) and 40% are A–T (20% each of adenine and thymine).
The second region has 25% of each of the four nucleotides.
If you choose a single nucleotide at random from each of the two regions, what is the probability that they are the same nucleotide? (Hint: Where X is any of the 4 DNA nucleotides calculate Pr(X|X) for all four and sum.)
On a separate sheet draw a probability tree (first branch will have 4 limbs).
Assume that nucleotides over a single strand of DNA occur independently within regions and that you randomly sample a two-nucleotide sequence from each of the two regions. List all the possible 2-nucleotide sequences for each region and their probabilities (include pairs like XX and assume XY is the same as YX):
First region pairs:
Second region pairs:
What is the probability that the two pairs chosen from different regions are the same?