In: Biology
Use the PopSet (population study data sets) in ENTREZ to retrieve coding sequence (CDS) of amylase-related gene (amyrel) generated for Drosophila yakuba by Cariou et al. (2001). You should retrieve 5 Drosophila yakuba sequences from different population strains. ***Make sure all your sequences are Drosophila yakuba.***
(a) Report the GenBank accession numbers.
(b) Align coding regions of the 5 sequences. Which alignment software did you use?
(c) Look at the first 200 bases of the alignment. Count the numbers of segregating sites(K), the numbers of segregating sites per sites (k), and nucleotide diversity (π).
Go to the site https://www.ncbi.nlm.nih.gov/Web/Search/entrezfs.html and select ENTREZ
https://www.ncbi.nlm.nih.gov/gquery/ and select PopSet
Enter amyrel in the search column
https://www.ncbi.nlm.nih.gov/popset/12620152 - this has Cariou et al (2001) amyrel sequences for different strains of Drosophila yakuba
(a)
1) GenBank: AF280878.1 : Drosophila yakuba strain LO4
2) GenBank: AF280877.1 : Drosophila yakuba strain LBV2 clone 4
3) GenBank: AF280876.1 : Drosophila yakuba strain LBV2 clone 1
4) GenBank: AF280875.1 : Drosophila yakuba strain SA3 clone 8
5) GenBank: AF280874.1 : Drosophila yakuba strain SA3 clone 6
b) Get the FASTA sequence of each strain and use Multiple sequence alignment -Clustal Omega
https://www.ebi.ac.uk/Tools/msa/clustalo/
Results :
CLUSTAL O(1.2.4) multiple sequence alignment AF280878.1 ATGTTCAAGTTGGCTTTGACCCTGACACTCTGCTTGGCGGGCAGCCTCTCGCTGGCCCAG 60 AF280876.1 ATGTTCAAGTTGGCTTTGACCCTGACACTCTGCTTGGCGGGCAGCCTCTCGCTGGCCCAG 60 AF280877.1 ATGTTCAAGTTGGCTTTGACCCTGACACTCTGCTTGGCGGGCAGCCTCTCGCTGGCCCAG 60 AF280875.1 ATGTTCAAGTTGGCTTTGACCCTGACACTCTGCTTGGCGGGCAGCCTCTCGCTGGCCCAG 60 AF280874.1 ATGTTCAAGTTGGCTTTGACCCTGACACTCTGCTTGGCGGGCAGCCTCTCGCTGGCCCAG 60 ************************************************************ AF280878.1 CACAATCCCCATTGGTGGGGCAATCGCAACACCATCGTCCACTTGTTCGAGTGGAAGTGG 120 AF280876.1 CACAATCCCCATTGGTGGGGCAATCGCAACACCATCGTCCACTTGTTCGAGTGGAAGTGG 120 AF280877.1 CACAATCCCCATTGGTGGGGCAATCGCAACACCATCGTCCACTTGTTCGAGTGGAAGTGG 120 AF280875.1 CACAATCCCCATTGGTGGGGCAATCGAAACACCATCGTCCACTTGTTCGAGTGGAAGTGG 120 AF280874.1 CACAATCCCCATTGGTGGGGCAATCGAAACACCATCGTCCACTTGTTCGAGTGGAAGTGG 120 ************************** ********************************* AF280878.1 TCGGACATTGCCCAGGAGTGTGAGAATTTTCTGGGACCCCGAGGATTCGCCGGCGTTCAA 180 AF280876.1 TCGGACATCGCCCAGGAGTGTGAGAATTTTCTGGGCCCACGAGGATTCGGCGGCGTTCAA 180 AF280877.1 TCGGACATTGCCCAGGAGTGTGAGAATTTTCTGGGACCCCGAGGATTCGCCGGCGTTCAA 180 AF280875.1 TCGGACATCGCCCAGGAGTGTGAGAATTTTCTGGGCCCACGAGGATTCGCCGGCGTTCAA 180 AF280874.1 TCGGACATCGCCCAGGAGTGCGAGAATTTTCTGGGCCCACGAGGATTCGCCGGCGTTCAA 180 ******** *********** ************** ** ********** ********** AF280878.1 GTGAGCCCCGTGAATGAGAACATCATATCGGCGGGTCGTCCTTGGTGGGAGCGATACCAA 240 AF280876.1 GTGAGCCCCGTGAATGAGAACATCATAGCGGCGGGTCGTCCTTGGTGGGAGCGATACCAA 240 AF280877.1 GTGAGCCCCGTGAATGAGAACATCATAGCGGCGGGTCGTCCTTGGTGGGAGCGATACCAA 240 AF280875.1 GTGAGCCCCGTGAATGAGAACATCATAGCGGCGGGTCGTCCTTGGTGGGAGCGATACCAA 240 AF280874.1 GTGAGCCCCGTGAATGAGAACATCATAGCGGCGGGTCGTCCTTGGTGGGAGCGATACCAA 240 *************************** ********************************