Question

In: Biology

In the original implementation of PSI-BLAST, the algorithm performed a multiple sequence alignment and deleted all...

In the original implementation of PSI-BLAST, the algorithm performed a multiple sequence alignment and deleted all but one copy of aligned sequence segments having ≥ 98% identity. In a recent modification, the program now purges segments having ≥ 94% identity. What do you think would happen if this percentage were adjusted to ≥ 75% identity. How could you test this idea in practice?

Solutions

Expert Solution

ANS) PSI-BLAST is an iterative application to go looking a database for proteins with distant similarity to a question sequence. we investigated over a dozen adjustments to the methods utilized in psi-blast, with the aim of enhancing accuracy in locating real positive fits. to assess overall performance we used a set of 103 queries for which the proper positives in yeast were annotated by using human professionals, and a popular measure of retrieval accuracy (roc) that can be normalized to tackle values among zero (worst) and 1 (quality). the adjustments we keep in mind novel enhance the roc score from 0.758 ± zero.1/2 to zero.895 ± 0.003. this doesn't encompass the blessings from four modifications we included in the ‘baseline’ model, even though they were not carried out in psi-blast version 2.zero. the development in accuracy became confirmed on a small 2d check set. this take a look at involved analyzing 3 protein households with curated lists of actual positives from the non-redundant protein database. the modification that accounts for the general public of the development is the use, for every database collection, of a position-precise scoring gadget tuned to that series’s amino acid composition. the usage of composition-based totally information is specially beneficial for massive-scale automated packages of psi-blast.


Related Solutions

Copy three sequences and perform a multiple sequence alignment on them. In your alignment, include B73...
Copy three sequences and perform a multiple sequence alignment on them. In your alignment, include B73 and one strain containing the insertion and one that does not. >B73 CGCAACACATGTTAGCACAAGAGTTGCAGGAACACTGTAAGTGAGCGTCTGCTTTT ATATTTTTTCGATTTCTTCTCTTATTGTTTCTCGAGCGCCTAGAGAAGCTCGTTCTTG TACCATCCTTTTCATTTTAGTTGGATGATACTCCCTCCGTCTCATGATATAAGGCGTA ACCACTTTTTATTCTTGTCCCGCAATATAAGGCGTGCTCTCTCTATGCATACGTATAT CGATGCAGTGGTATAGAGACAATTAAATGTATTTCTTGGTCTTTGAACCAGAGGTGG TTACGCCTTATATACTGGGACGGAGGGAGTACTTGTTAAATTTTAAGAAATGTCCAG AGAGTTCAGCAAAGCAATAACATTTGTACTTACCTGTATATGATATATACATGTGGC CTATGAATACTTTTTTCAGCTTTGCTTACTCCGTTTTAATTGGCAGGGGTTATTTGGC TCCTGAGTATGCTATTCGAGGACAGGTGACACGCAAATCAGATGTGTACAGCTTTG GCGTGCTGCTTCTGGAAATTGTTTGCGGGAGATCAAACAGCGATACAAGATTAGCA TACGGAGATCAGATACTCCTCGAAAA Sequence with insertion >Tzi25 CGCAACACATGTTAGCACAAGAGTTGCAGGAACACTGTAAGTGAGCGTCTGCTTTT ATATTTTTTCGATTTCTTCTCTTATTGTTTCTCGAGCGCCTAGAGAAGCTCGTTCTTG TACCATCCTTTTCATTTTAGTTGGATGATACTCCCTCCGTCTCATGATATAAGGCGTA ACCACTTTTTATTCTTGTCCCGCAATATAAGGCGTGCTCTCTCTATGCATACGTATAT CGATGCAGTGGTATAGAGACAATTAAATGTATTTCTTGGTCTTTGAACCAGAGGTGG TTACGCCTTATATACTGGGACGGAGGGAGTACTTGTTAAATTTTAAGAAATGTCCAG AGAGTTCAGCAAAGCAATAACATTTGTACTTACCTGTATATGATATATACATGTGGC CTATGAATACTTTTTTCAGCTTTGCTTACTCCGTTTTAATTGGCAGGGGTTATTTGGC TCCTGAGTATGCTATTCGAGGACAGGTGACACGCAAATCAGATGTGTACAGCTTTG GCGTGCTGCTTCTGGAAATTGTTTGCGGGAGATCAAACAGCGATACAAGATTAGCA TACGGAGATCAGATACTCCTCGAAAAGGTCAAAATGACGCT Sequence without insertion: >A679 CGCAACACATGTTAGCACAAGAGTTGCAGGAACACTGTAAGTGAGCGTCTGCTTTT ATATTTTTCGATTTCTTCTCTTATTTTTTTCTCGAGTGCCTAGAGAAGCTCGTTCTTGT ACCATCCTTTTCATTTTAGTTGGATGATACTTGTTAAATTTAAGAAATGTCCAGAGAG TTCCGCAAAGCAATAACATCTGTACTTACCTGTATATGATATATACATGTGGCCTAT GAATACTTTTTTGAGCTTTGCTTACTCCGTTTTAATTGGCAGGGGTTATTTGGCTCCT GAGTATGCTATTCGAGGACAGGTGACACGCAAATCAGATGTGTACAGCTTTGGCGT GCTGCTTCTGGAAATTGTTTGCGGGAGATCAAACAGCGATACAAGATTAGCATACG GAGATCAGATACTCCTCGAAAAGGTCAGATGACGCT What does this insertion represent (i.e. what type of polymorphism is this and explain
i just performed a BLAST search with a query protein sequence of unknown function. The top...
i just performed a BLAST search with a query protein sequence of unknown function. The top hit was an alcohol dehydrogenase sequence with an E value of 6x10-12. What does this result suggest about your query sequence and why? i just want to know the meaning of the e value like what the e value depicts
In which of the following situations can multiple regression be performed? Select all that apply. Select...
In which of the following situations can multiple regression be performed? Select all that apply. Select all that apply: predicting next year's salary of a free-agent baseball player, given the player's previous season's salary and the player's age predicting the price of a book, in dollars, given the number of pages predicting a 40-year-old man's height, in centimeters, given his shoe size and his weight, in kilograms predicting the grade of a research paper (out of 100 points), given the...
Select all TRUE statements below. a) A multiple regression analysis can be performed with more than...
Select all TRUE statements below. a) A multiple regression analysis can be performed with more than one continuous dependent variable at a time. b) In a two-way between-subjects ANOVA, is it possible to have a significant interaction, but not a significant main effect for either factor. c) A one-way between-subjects ANOVA cannot be conducted with 5 groups. d) The normality assumption of multiple linear regression can be checked by examining a scatterplot of fitted values versus standardized residuals. e) The...
Select all TRUE statements below.: a) A multiple regression analysis can be performed with more than...
Select all TRUE statements below.: a) A multiple regression analysis can be performed with more than one continuous dependent variable at a time. b) In a two-way between-subjects ANOVA, is it possible to have a significant interaction, but not a significant main effect for either factor. c) A one-way between-subjects ANOVA cannot be conducted with 5 groups. d) The normality assumption of multiple linear regression can be checked by examining a scatterplot of fitted values versus standardized residuals. e) The...
Recall the Matrix-Multiplication Algorithm for determining all-pairs distances in a graph. Provide a linear-time recursive implementation...
Recall the Matrix-Multiplication Algorithm for determining all-pairs distances in a graph. Provide a linear-time recursive implementation of the function void print_path(Matrix D[], int n, int i, int j); that takes as input the array of matrices D[1], D[2], . . . , D[n − 1] that have been computed by the algorithm, and prints the optimal path from vertex i to vertex j. Hint: for convenience you may use the notation Dr ij to denote the value D[r][i, j], i.e....
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT