In: Biology
advantages of using a protien sequence rather than a DNA sequence when searching the bioinformatics database
There are many reasons of why searching a protein sequence is more logical than searching for a DNA sequence in a database. For starters, consider this: DNA is made up of genes, and these genes code for a certain protein. Now there are many different genes that code for the same protein, and most of the times, we miss to check all genes that are there, so it leads to a sort of insignificant study. The second reason is that, it is not always characterized in the gene sequence that what part of gene would actually be functional and what part would be cleaved later on. In that case it is beneficial to study a protein sequence. Other major reason is that DNA is just made of 4 bases, that are, A,T,G and C; because of which whenever you're looking for similarity of a gene sequnece to others, they'll be atleast 25% similar, which is something misleading. Whereas if you look at a protein sequence, there are atleast 20 different amino acids, and so when you look similarity of a protein sequence with others, the results will be quiet satisfying. Also, if you look at any database for DNA sequences, you'll always find them large, and more are the chances of the data to be random and sometimes error prone. It is a huge task to negate all irrelevant sequences for a stretch of DNA. Contrastingly, the databases used for protein sequencing are more easier to study. A major factor about DNA that misleads in reasearch is its ability of the DNA to get mutated, if a mutation causes change in a protein, then it is relevant to study that mutation. But there's a fact that sometimes error in DNA leads to no protein change but it would still be different from other sequences, while protein sequences usually remain conserved, so its quite easier to study them. When you're looking for DNA sequences, generally identity matrices are used, and they're not that much senitive; while, for protein sequences PAM and BLOSUM matrices are used, and they're very sensitive.