In: Computer Science
Complete a) flowchart solution and b) working Python solution. It will need to compare gene sequences, looking for differences (mutations), keeping track of how many are different and will include clear two-part output: indicating if the ‘baby’ has the mutation for diabetes or not, and the identity between the siblings’ sequences.
BABY:
ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCTGCCGGGAGGCAGAGGACCTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAACTACTGCAACTAG
BROTHER:
ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAACTACTGCAACTAG
Does the ‘baby’ carry the mutation associated with
diabetes?
Which (what number) nucleotide has mutated?
What is the ‘identity” of the siblings’ sequences? In other words,
find the percentage, to one decimal place, describing how similar
those two are.
--------------------------------------------------
So far I have this:
def find_mutation_location(nuc1, nuc2):
n = len(nuc1)
for i in range(0,n,3):
if nuc1[i:i+3] != nuc2[i:i+3]:
return (i+1)//3
n1 =
"ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCTGCCGGGAGGCAGAGGACCTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAACTACTGCAACTAG"
n2 =
"ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAACTACTGCAACTAG"
print(find_mutation_location(n1,n2))
def calculate_similarity(nuc1, nuc2):
n = len(nuc1) // 3
different = 0
for i in range(0,n,3):
if nuc1[i:i+3] != nuc2[i:i+3]:
different += 1
print(calculate_similarity(n1,n2))
I seem to have been able to locate the mutation at nucleotide number 54, but am having issues with calculating the similarity to a percentage/1 decimal point. Please advise, and also provide working flow-chart solution if possible. Thanks!
Many changes are done in code.
CODE:
def find_mutation_location(nuc1, nuc2):
n = len(nuc1)
pos=[] # position list to store all mutation
for i in range(0,n,3):
if nuc1[i:i+3] != nuc2[i:i+3]:
pos.append((i+1)//3) # adding the position in the list
return pos # returning list
n1 = "ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCTGCCGGGAGGCAGAGGACCTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAACTACTGCAACTAG"
n2 = "ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGACCTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTGGAGAACTACTGCAACTAG"
print(*(find_mutation_location(n1,n2))) # * is use to print items in list in a string
def calculate_similarity(nuc1, nuc2):
n = len(nuc1) # calculating DNA length
total =0 # total gene set
similar =0 # similar gene set
for i in range(0,n,3):
total = total +1 # adding 1 in total every time loop is executed
if nuc1[i:i+3] == nuc2[i:i+3]: # if similar gene set
similar = similar +1 # add 1 to number of similar gene set
return similar/total*100 # returning percentage
print("{:.1f}%".format(calculate_similarity(n1,n2))) # .format is use to print percentage upto 1 decimal place
Go through screentshot for indentation.
OUTPUT:
FLOWCHART: