In: Computer Science
USE Python 2.7(screen shot program with output)
the task is: takes in a list of protein sequences as input and finds all the transmembrane domains and returns them in a list for each sequence in the list with given nonpolar regions and returns the lists for those.
1. This code should call two other functions that you write: regionProteinFind takes in a protein sequence and should return a list of 10 amino acid windows, if the sequence is less than 10 amino acids, it should just return that sequence. (initially it should grab amino acids 1-10…the next time it is called it should grab amino acids 2-11…) for each sequence in the list.
testcode:
"protein='MKLVVRPWAGCWWSTLGPRGSLSPLGICPLLMLLWATLR''
the regionProteinFind
returns:['MKLVVRPWAG','KLVVRPWAGC','LVVRPWAGCW','VVRPWAGCWW','VRPWAGCWWS','RPWAGCWWST','PWAGCWWSTL','WAGCWWSTLG','AGCWWSTLGP','GCWWSTLGPR',
'CWWSTLGPRG','WWSTLGPRGS','WSTLGPRGSL','STLGPRGSLS','TLGPRGSLSP','LGPRGSLSPL','GPRGSLSPLG','PRGSLSPLGI','RGSLSPLGIC','GSLSPLGICP','SLSPLGICPL','LSPLGICPLL','SPLGICPLLM','PLGICPLLML','LGICPLLMLL','GICPLLMLLW','ICPLLMLLWA','CPLLMLLWAT','PLLMLLWATL','LLMLLWATLR']
2nd testcode;
protein=MP
region protein sequence should return: ['ME']
2. A second function called testForTM , which should calculate and return the decimal fraction of ten amino acid window which are nonpolar for each sequence in the list. the nonpolar regions are (A,V,L,I,P,M,F,W). my code for this is:
def testForTM(AAWindow):
totalNP= 0
nonPolarList=['A', 'V', 'L', 'I', 'P', 'M', 'F', 'W']
for aa in AAWindow:
if aa in nonPolarList:
totalNP+=1
return totalNP/10.0 #THIS SHOULD DEVIDE BY len(AAWindow) so
it works for sequences less than 10 length like 'MP'
3. The last function,tmSCANNER should call the get protein region and test for TM and Ultimately, as a result the code should be used to scan each protein sequence in the list as input generating list of numbers of non polar for each protein sequence which measures the fraction of nonpolar residues in each 10bp window(it slides 10 amino acids at a time until it is at the last aa window of a protein sequence with any length and give the lists for those. The code should output what is displayed below.
#Test code for TMFinder
input=> listOfProtein=['MKLVVRPWAGCWWSTLGPRGSLSPLGICPLLMLLWATLR', 'MARKCSVPLVMAWLTWTTSRAPLPH', 'MPWPTSITXXXXXXSWSPEWLSSGLRSILGWEQPRVSHKGHSHEWHRRP']
tmValuesList=TMFinder(listOfProtein)
print 'The list of TM values are:', tmValuesList
as a result it should print out this list:
["protein 1:'MKLVVRPWAGCWWSTLGPRGSLSPLGICPLLMLLWATLR'", 'TMValue:[0.7, 0.6, 0.7, 0.7, 0.6, 0.5, 0.6, 0.5, 0.5, 0.4, 0.4, 0.4, 0.4, 0.3, 0.4, 0.5, 0.4, 0.5, 0.4, 0.5, 0.6, 0.7, 0.7, 0.8, 0.8, 0.8, 0.9, 0.8, 0.9, 0.8]',"protein2:'MARKCSVPLVMAWLTWTTSRAPLPH'", 'TMValue:[0.6, 0.6, 0.6, 0.7, 0.8, 0.8, 0.9, 0.8, 0.7, 0.6, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]',"protein3:'MPWPTSITXXXXXXSWSPEWLSSGLRSILGWEQPRVSHKGHSHEWHRRP'",'TMValue:[0.5, 0.4, 0.3, 0.2, 0.1, 0.1, 0.2, 0.1, 0.2, 0.2, 0.3, 0.4, 0.4, 0.4, 0.4, 0.5, 0.4, 0.4, 0.4, 0.5, 0.4, 0.4, 0.4, 0.4, 0.5, 0.4, 0.5, 0.5, 0.4, 0.3, 0.3, 0.2, 0.2, 0.2, 0.1, 0.2, 0.1, 0.1, 0.1]']
This is time sensitive.Thank you for the help!!!
Please save the following code in ProteinList.py and run
# Program starts here
class ProteinList():
def __init__(self, prot) :
self.protein = prot
# It splits the protein into 10 amino acid windows.
def regionProteinFind(self) :
if len(self.protein) <= 10 :
return self.protein
region = ['' for i in range(len(self.protein)-9)]
for i in range(len(self.protein)-9) :
region[i] = self.protein[i:i+10]
# print(region)
return region
# calculate and return the decimal fraction of ten amino acid
window which are
# nonpolar for each sequence in the list
def testForTM(self, region):
totalNP = 0
nonPolarList = ['A', 'V', 'L', 'I', 'P', 'M', 'F', 'W']
for i in range(len(region)) :
if region[i:i+1] in nonPolarList:
totalNP += 1
return totalNP / len(region)
if __name__ == "__main__" :
print ('Starting... ')
listOfProtein = ['MKLVVRPWAGCWWSTLGPRGSLSPLGICPLLMLLWATLR',
'MARKCSVPLVMAWLTWTTSRAPLPH',
'MPWPTSITXXXXXXSWSPEWLSSGLRSILGWEQPRVSHKGHSHEWHRRP']
count = 1
for protein in listOfProtein :
finder = ProteinList(protein)
region = finder.regionProteinFind()
print('[Protein ' + str(count) + ' : ' + protein + ',')
print('TM Value : [', end=' ')
for amino in region :
print(finder.testForTM(amino), end=',')
print('],')
count += 1
print(']')
print ('Completed... ')
# Program ends here.
It will generate the expected output. My screen shot is
Starting... [Protein 1: MKLVVRPWAGCWWSTLGPRGSLSPLGICPLLMLLWATLR, TM Value : [0.7,0.6,0.7,0.7,0.6,0.5,0.6,0.5,0.5,0.4,0.4.0.4,0.4,0.3,0.4,0.5,0.4,0.5,0.4,0.5,0.6, 0.7,0.7,0.8,0.8,0.8,0.9,0.8,0.9,0.8,1, [Protein 2: MARKCSVPLVMAWLTWTTSRAPLPH, TM Value: 0.6,0.6,0.6,0.7,0.8,0.8,0.9,0.8,0.7,0.6,0.5,0.5,0.5,0.5,0.5,0.5,1, [Protein 3 : MPWPTSITXXXXXXSWSPEWLSSGLRSIL GWEQPRVSHKGHSHEWHRRP, TM Value : [0.5,0.4,0-3,0.2,0.1,0.1,0.2,0.1,0.2,0.2,0.3,0.4.0.4,0.4.0.4,0.5,0.4,0.40.4,0.5,0.4, 0.4.0.4.0.4,0.5,0.4,0.5,0.5,0.4,0.3,0.3,0.2,0.2,0.2,0.1,0.2,0.1,0.1,0.1,0.2,1, Completed... I
Code screenshot is below