In: Computer Science
Problem: Given seven 8-mers as listed below
ATCGATAG
GGCCAATT
CGATATCG
AAGCAAGC
AGCGTACG
CCGCATTA
ATCCATCG
1) Create the profile matrix; 2) Derive the consensus; 3) Calculate the consensus score; 4) Calculate the total distance between the 8-mers and the consensus.
Solution for the question is provided below. Please comment if any doubt below.
The profile matrix is the count of each nucleotide base in the sequence. The sequence given is,
ATCGATAG
GGCCAATT
CGATATCG
AAGCAAGC
AGCGTACG
CCGCATTA
ATCCATCG
The profile matrix for the corresponding sequence is provided below.
A |
4 |
1 |
1 |
0 |
6 |
3 |
1 |
1 |
C |
2 |
1 |
4 |
4 |
0 |
0 |
3 |
1 |
G |
1 |
3 |
2 |
2 |
0 |
0 |
1 |
4 |
T |
0 |
2 |
0 |
1 |
1 |
4 |
2 |
1 |
Consensus can be found by find the nucleotide that has maximum count in each position.
A |
4 |
1 |
1 |
0 |
6 |
3 |
1 |
1 |
C |
2 |
1 |
4 |
4 |
0 |
0 |
3 |
1 |
G |
1 |
3 |
2 |
2 |
0 |
0 |
1 |
4 |
T |
0 |
2 |
0 |
1 |
1 |
4 |
2 |
1 |
Consensus |
A |
G |
C |
C |
A |
T |
C |
G |
Therefore the consensus is AGCCATCG
Consensus score can be found by calculating the sum of count of each nucleotide in the consensus.
A |
4 |
1 |
1 |
0 |
6 |
3 |
1 |
1 |
C |
2 |
1 |
4 |
4 |
0 |
0 |
3 |
1 |
G |
1 |
3 |
2 |
2 |
0 |
0 |
1 |
4 |
T |
0 |
2 |
0 |
1 |
1 |
4 |
2 |
1 |
Score = 4+3+4+4+6+4+3+4 =32
Total Distance can be calculated by the sum of count of different nucleotide in each position on comparing with consensus.
A |
4 |
1 |
1 |
0 |
6 |
3 |
1 |
1 |
C |
2 |
1 |
4 |
4 |
0 |
0 |
3 |
1 |
G |
1 |
3 |
2 |
2 |
0 |
0 |
1 |
4 |
T |
0 |
2 |
0 |
1 |
1 |
4 |
2 |
1 |
2+1=3 |
1+1+2=4 |
1+2=3 |
2+1=3 |
1 |
3 |
1+1+2=4 |
1+1+1=3 |
Total distance =3+4+3+3+1+3+4+3 =24