Question

In: Computer Science

This question is about the Hamming distance (HD). Given a reference DNA sequence and a set...

This question is about the Hamming distance (HD).
Given a reference DNA sequence and a set of candidate sequences, you are going to find out which candidate(s) has/have the Hamming distance smaller or equal to a threshold k.
e.g.
Reference DNA: AATGCGC
Candidate 1: AATCCCC
Candidate 2: TTTGCTC
Candidate 3: AATAAAA
If the Hamming distance threshold k = 3:
HD1 = 2
HD2 = 3
HD3 = 4
Your result should return a list of all the qualified candidate numbers and their Hamming distances in format:
[[candidate number, Hamming distance]].
For the result of the example above:
[[1, 2], [2, 3]]
Here begins the question.
Reference sequence: AGAAACTCTCTGGCCTAAAG

Candidate 1: ACTTAGGTCTCTAAGCCCTC
Candidate 2: AGAAACGTTATGTGGACGTT
Candidate 3: AGTCTGACTCTGATCCAAAG
Candidate 4: AGTCTGCCTTGGCCATTAGC
Candidate 5: GTAAGCTAACCCCGCCAGCA
Candidate 6: AGAAACTCTCTTGTCTAAAG
Candidate 7: AGAAATGATGTCGCCTAAAG
Candidate 8: GCGAATGCGACTGGCGAGGT
Threshold = 11.

Submit the codes you have written to solve the previous question (the one of Hamming distance).
If you are using another language other than Python3, please note which language it is.


Solutions

Expert Solution

  • Below is the detailed explanation of the problem mentioned above in python3 with code and it's output.
  • For better understanding please read the comments mentioned in the code.
  • CODE:

#function which returns list of id's with hamming distance of candidates whose hamming distance is less than equal to the threshold k
def find(reference, candidates,threshold):
#answer list
ans=[]
#initialize id to 1
id=1
#iterate for all candidates
for s in candidates:
#to scalculate hamming distance of this candidate
hamming=0
#calculate hamming distance
for i in range(len(s)):
if s[i]!=reference[i]:
hamming+=1
#if hamming distance is less than threshold
if hamming<=threshold:
ans.append([id,hamming])
#move to next candidate
id+=1
#return answer
return ans

  • INPUT/OUTPUT:
  1. #test case 1
    reference="AATGCGC"
    candidates=["AATCCCC", "TTTGCTC", "AATAAAA"]
    threshold=3
    print(find(reference,candidates,threshold))
    [[1, 2], [2, 3]]
  2. #test case 2
    reference="AGAAACTCTCTGGCCTAAAG"
    candidates=["ACTTAGGTCTCTAAGCCCTC", "AGAAACGTTATGTGGACGTT", "AGTCTGACTCTGATCCAAAG", "AGTCTGCCTTGGCCATTAGC", "GTAAGCTAACCCCGCCAGCA", "AGAAACTCTCTTGTCTAAAG", "AGAAATGATGTCGCCTAAAG", "GCGAATGCGACTGGCGAGGT"]
    threshold=11
    print(find(reference,candidates,threshold))
    [[2, 11], [3, 8], [6, 2], [7, 5]]
  • For better understanding and clarity below are the screenshot attached of the code and it's input/output.

CODE and INPUT/OUTPUT

So if you have any doubt regarding this solution please feel free to ask it in the comment section below and if it is helpful then please upvote this solution, THANK YOU.


Related Solutions

Questions 9 to 13 are in reference to the DNA sequence shown in Question 8. Here...
Questions 9 to 13 are in reference to the DNA sequence shown in Question 8. Here is Question 8. Question 8: The top strand of the following segment of DNA serves as the template strand: 3’ TACACCTTGGCGACGACT 5’ 5’ ATGTGGAACCGCTGCTGA 3’ We will refer to this segment of DNA as the original (or unmutated) sequence. Please answer the following questions: (a) What is the mRNA sequence? The mRNA sequence is  5'  3'. **Please enter your sequence in the 5' to 3' direction....
Question no 1: The d-neighborhood Neighbors(Pattern, d) is the set of all k-mers whose Hamming distance...
Question no 1: The d-neighborhood Neighbors(Pattern, d) is the set of all k-mers whose Hamming distance from Pattern does not exceed d. Generate the d-Neighborhood of a String Find all the neighbors of a pattern. Given: A DNA string Pattern and an integer d. Return: The collection of strings Neighbors(Pattern, d). Sample Dataset ACG 1 Sample Output CCG TCG GCG AAG ATG AGG ACA ACC ACT ACG Question no 2: We say that a k-mer Pattern appears as a substring...
If the DNA sequence in the question above is at the BEGINNING of a gene, what...
If the DNA sequence in the question above is at the BEGINNING of a gene, what are the first 2 amino acids that will be joined at the ribosome? Remember there is a START codon. Use the information on this chart to answer the question. asparagine, alanine methionine, glutamine leucine, alanine alanine, lysine methionine, asparagine
Given the following prokaryote sequence of DNA: Identify if any promoter sequences are present in DNA...
Given the following prokaryote sequence of DNA: Identify if any promoter sequences are present in DNA SEQ1 (Highlight/Underline these in the sequence) Predict the mRNA sequence that would arise from this DNA sequence. Identify the ribosome binding site, start and stop codons (if present). (Highlight/Underline these in the sequence). Predict the sequence of the peptide that would arise from the predicted mRNA sequence using the genetic code table (see Codon usage table). Based on your knowledge of the properties of...
Given the sequence 5’-AGTTACCTGA-3’ what would be the sequence of the complementary DNA strand? Which of...
Given the sequence 5’-AGTTACCTGA-3’ what would be the sequence of the complementary DNA strand? Which of the following? 5’-TCAATGGACT-3’ 3’-AGTTACCTGA-5' ’5’-AGTTACCTGA-3’ 3’-TCAGGTAACT-5’ 5’-TCAGGTAACT-3’
Please answer only Question 12 and 13. Question 12: In reference to the original sequence (shown...
Please answer only Question 12 and 13. Question 12: In reference to the original sequence (shown in Question 8), classify each type of mutation present from Questions 9 to 11. Choose the best option for each. mutation #1    mutation #2    mutation #3 A. base substitution - silent mutation B. insertion - frameshift mutation C. deletion - frameshift mutation D. base substitution - missense mutation E. base substitution - nonsense mutation 3 points    QUESTION 13 Questions 9 to...
Related to translation: Predict the amino acid sequence for the given template strand of DNA.            ...
Related to translation: Predict the amino acid sequence for the given template strand of DNA.             3’- T A C G T T C A T A T C C G T A T G T A T A T T – 5’                              
A survey was given to 18 students. One question asked about the one-way distance the student...
A survey was given to 18 students. One question asked about the one-way distance the student had to travel to attend college. The results, in miles, are shown in the following table. Use the median procedure for finding quartiles to find the first, second, and third quartiles for the data. Distance Traveled to Attend College 46 50 18 26 64 78 4 38 44 44 10 70 74 44 86 32 26 48 Q1 = Q2 = Q3 =
Given a DNA molecule with 10,000bp, in theory how many times the “GTCCGGAC” sequence is expected?...
Given a DNA molecule with 10,000bp, in theory how many times the “GTCCGGAC” sequence is expected? And how about “ATWAT” sequence? (Where W= A or T)
You are doing an EMSA experiment. The DNA sequence you study is about 1000bp long and...
You are doing an EMSA experiment. The DNA sequence you study is about 1000bp long and contains the first two exons of a protein coding gene. You run your DNA alone on the gel for 1 hour at 72 volts and you notice that the DNA fragment matches the 1000bp marker in the DNA ladder, which is to be expected. You then start adding different transcription factors to your DNA sample and run the EMSA experiment to see which one...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT