Question

In: Computer Science

Question no 1: The d-neighborhood Neighbors(Pattern, d) is the set of all k-mers whose Hamming distance...

Question no 1:
The d-neighborhood Neighbors(Pattern, d) is the set of all k-mers whose Hamming distance from Pattern does not exceed d.
Generate the d-Neighborhood of a String
Find all the neighbors of a pattern.
Given: A DNA string Pattern and an integer d.
Return: The collection of strings Neighbors(Pattern, d).
Sample Dataset
ACG
1
Sample Output
CCG
TCG
GCG
AAG
ATG
AGG
ACA
ACC
ACT
ACG

Question no 2:
We say that a k-mer Pattern appears as a substring of Text with at most d mismatches if there is some k-mer substring Pattern' of Text having d or fewer mismatches with Pattern, i.e., HammingDistance(Pattern, Pattern') ≤ d. Our observation that a DnaA box may appear with slight variations leads to the following generalization of the Pattern Matching Problem.
Approximate Pattern Matching Problem
Find all approximate occurrences of a pattern in a string.
Given: Strings Pattern and Text along with an integer d.
Return: All starting positions where Pattern appears as a substring of Text with at most d mismatches.
Sample Dataset
ATTCTGGA
CGCCCGAATCCAGAACGCATTCCCATATTTCGGGACCACTGGCCTCCACGGTACGGACGTCAATCAAATGCCTAGCGGCTTGTGGTTTCTCCTACGCTCC
3
Sample Output
6 7 26 27 78

Question no 3:
We defined a mismatch in “Compute the Hamming Distance Between Two Strings”. We now generalize “Find the Most Frequent Words in a String” to incorporate mismatches as well.
Given strings Text and Pattern as well as an integer d, we define Countd(Text, Pattern) as the total number of occurrences of Pattern in Text with at most d mismatches. For example, Count1(AACAAGCTGATAAACATTTAAAGAG, AAAAA) = 4 because AAAAA appears four times in this string with at most one mismatch: AACAA, ATAAA, AAACA, and AAAGA. Note that two of these occurrences overlap.
A most frequent k-mer with up to d mismatches in Text is simply a string Pattern maximizing Countd(Text, Pattern) among all k-mers. Note that Pattern does not need to actually appear as a substring of Text; for example, AAAAA is the most frequent 5-mer with 1 mismatch in AACAAGCTGATAAACATTTAAAGAG, even though AAAAA does not appear exactly in this string. Keep this in mind while solving the following problem.
Frequent Words with Mismatches Problem
Find the most frequent k-mers with mismatches in a string.
Given: A string Text as well as integers k and d.
Return: All most frequent k-mers with up to d mismatches in Text.
Sample Dataset
ACGTTGCATGTCGCATGATGCATGAGAGCT
4 1
Sample Output
GATG ATGC ATGT

Question no 4
We now extend “Find the Most Frequent Words with Mismatches in a String” to find frequent words with both mismatches and reverse complements. Recall that Pattern refers to the reverse complement of Pattern.
Frequent Words with Mismatches and Reverse Complements Problem
Find the most frequent k-mers (with mismatches and reverse complements) in a DNA string.
Given: A DNA string Text as well as integers k and d.
Return: All k-mers Pattern maximizing the sum Countd(Text, Pattern) + Countd(Text, Pattern) over all possible k-mers.
Sample Dataset
ACGTTGCATGTCGCATGATGCATGAGAGCT
4 1
Sample Output
ATGT ACAT

Solutions

Expert Solution

1.

def Suffix(Pattern):
    return Pattern[1:]

def FirstSymbol(Pattern):
    return Pattern[0]

def HammingDistance(str1, str2):
    hd = 0
    for (x, y) in zip(str1, str2):
        if x != y:
            hd = hd + 1
    return hd

def Neighbors(Pattern, d):
    nucleotide = ['A', 'C', 'G', 'T']
    if d==0:
        return [Pattern]
    if len(Pattern)==1:
        return ['A', 'C', 'G', 'T']
    Neighborhood = []
    SuffixNeighbors = Neighbors(Suffix(Pattern),d)
    for SuffixNeighbor in SuffixNeighbors:
        if HammingDistance(Suffix(Pattern), SuffixNeighbor)<d:
            for x in nucleotide:
                Neighborhood.append(x + SuffixNeighbor)
        else:
            Neighborhood.append(FirstSymbol(Pattern)+SuffixNeighbor)
    return Neighborhood

2.

def Text(DNA, i, n):
    return DNA[i:i + n]

def HammingDistance(str1, str2):
    hd = 0
    for (x, y) in zip(str1, str2):
        if x != y:
            hd = hd + 1
    return hd

def ApproximatePatternMatching(Genome, Pattern, d):
    positions = []

    for i in range(0, len(Genome) - len(Pattern) + 1):
        Patterndash = Text(Genome, i, len(Pattern))
        if (HammingDistance(Pattern, Patterndash) <= d):
            positions.append(i)

    return positions

3.

def NumberToSymbol(index):
    dict = {0: 'A', 1: 'C', 2: 'G', 3: 'T'}
    return dict[index]

def Quotient(m,n):
    return m//n

def Remainder(m,n):
    return m%n

def NumberToPattern(index, k):
    if k == 1:
        return NumberToSymbol(index)
    prefixIndex = Quotient(index, 4)
    r = Remainder(index,4)
    symbol = NumberToSymbol(r)
    PrefixPattern = NumberToPattern(prefixIndex, k-1)
    return PrefixPattern+symbol

def createKmers(k):
    kmers = []
    for i in range (0, pow(4,k)):
        kmer = NumberToPattern(i, k)
        kmers.append(kmer)

    return kmers

def Text(DNA, i, n):
    return DNA[i:i + n]

def HammingDistance(str1, str2):
    hd = 0
    for (x, y) in zip(str1, str2):
        if x != y:
            hd = hd + 1
    return hd

def ApproximatePatternCount(Genome, Pattern, d):
    positions = []

    for i in range(0, len(Genome) - len(Pattern) + 1):
        Patterndash = Text(Genome, i, len(Pattern))
        if (HammingDistance(Pattern, Patterndash) <= d):
            positions.append(i)

    return len(positions)

def FrequentWordsWithMismatches(pattern, k, d):
    kmers = createKmers(k)
    Count = []
    res = []

    for kmer in kmers:
        Count.append(ApproximatePatternCount(pattern, kmer, d))

    maxCount = max(Count)

    for i in range(0, len(Count)):
        if Count[i] == maxCount:
            res.append(kmers[i])

    return res

4.

def ReverseComplement(DNAstr):
    RevCompStr = ""
    for nucleotide in DNAstr:
        if nucleotide == 'a' or nucleotide == 'A':
            complement = 'T'
        elif nucleotide == 't' or nucleotide == 'T':
            complement = 'A'
        elif nucleotide == 'g' or nucleotide == 'G':
            complement = 'C'
        elif nucleotide == 'c' or nucleotide == 'C':
            complement = 'G'
        RevCompStr = RevCompStr + complement
    RevCompStr = RevCompStr[::-1]
    return RevCompStr

def NumberToSymbol(index):
    dict = {0: 'A', 1: 'C', 2: 'G', 3: 'T'}
    return dict[index]

def Quotient(m,n):
    return m//n

def Remainder(m,n):
    return m%n

def NumberToPattern(index, k):
    if k == 1:
        return NumberToSymbol(index)
    prefixIndex = Quotient(index, 4)
    r = Remainder(index,4)
    symbol = NumberToSymbol(r)
    PrefixPattern = NumberToPattern(prefixIndex, k-1)
    return PrefixPattern+symbol

def createKmers(k):
    kmers = []
    for i in range (0, pow(4,k)):
        kmer = NumberToPattern(i, k)
        kmers.append(kmer)

    return kmers

def createKmersRC(kmers):
    kmersRC = []
    for kmer in kmers:
        kmersRC.append(ReverseComplement(kmer))
    return kmersRC


def Text(DNA, i, n):
    return DNA[i:i + n]

def HammingDistance(str1, str2):
    hd = 0
    for (x, y) in zip(str1, str2):
        if x != y:
            hd = hd + 1
    return hd

def ApproximatePatternCount(Genome, Pattern, d):
    positions = []

    for i in range(0, len(Genome) - len(Pattern) + 1):
        Patterndash = Text(Genome, i, len(Pattern))
        if (HammingDistance(Pattern, Patterndash) <= d):
            positions.append(i)

    return len(positions)

def FrequentWordsWithMismatchesAndReverseComplements(text, k , d):
    kmers = createKmers(k)
    kmersRC = createKmersRC(kmers)

    Countkmers = []
    CountkmersRC = []
    CountTotal = []

    res = []

    for kmer in kmers:
        Countkmers.append(ApproximatePatternCount(text, kmer, d))
    for kmer in kmersRC:
        CountkmersRC.append(ApproximatePatternCount(text, kmer, d))

    for i in range (0, len(kmers)):
        CountTotal.insert(i, Countkmers[i] + CountkmersRC[i])

    maxCount = max(CountTotal)

    for i in range(0, len(CountTotal)):
        if CountTotal[i] == maxCount:
            res.append(kmers[i])

    return res

Related Solutions

1. What are the k-mers of length k = 21 for this sequence read in FASTQ...
1. What are the k-mers of length k = 21 for this sequence read in FASTQ format? @K000384:75:HM57CBBXX:1:1101:25530:1384 1:N:0:GTGGCC CTGGCACTGGGCTTCAAGCTGGGCTACCTTCTGTTT + AAFFFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ [please be detailed, thanks]
Python(please take a screen shot!): 1. hamming distance: write a function distance that take two bits...
Python(please take a screen shot!): 1. hamming distance: write a function distance that take two bits strings, you can assume each strings only contains 0's and 1's. (Note: the two strings might have the same length or not!) for example: hamming('010001001111', '01010100') should return 5(1 bit changed plus 4 bits "lost" from the end). 2. write a main function that ask user for two file names, open files and read the 1st line of each, and compares them using Hamming...
2. A spring whose coefficient is k is compressed by length d and a block of...
2. A spring whose coefficient is k is compressed by length d and a block of mass m is placed right next to the end of the spring. When the spring is released, the block acquires a velocity and keeps on moving along a horizontal smooth surface until it reaches an inclined plane whose angle is θ and whose coefficient of friction with the block is μ. It travels up the inclined plane for a while before reaching maximum height....
Let X be the set of all subsets of R whose complement is a finite set...
Let X be the set of all subsets of R whose complement is a finite set in R: X = {O ⊂ R | R − O is finite} ∪ {∅} a) Show that T is a topological structure no R. b) Prove that (R, X) is connected. c) Prove that (R, X) is compact.
Prove that if A is an enumerable set all of whose members are also enumerable sets,...
Prove that if A is an enumerable set all of whose members are also enumerable sets, then UA is also enumerable.
QUESTION 10 The sign on a church in your neighborhood reads “All are welcome at Sunday...
QUESTION 10 The sign on a church in your neighborhood reads “All are welcome at Sunday Service.” Because the church has limited seating and is usually full, the Sunday Service is a. a club good. b. a common resource. c. a public good. d. a private good. 4 points    QUESTION 11 It is commonly argued that national defense is a public good. Nevertheless, the weapons used by the U.S. military are produced by private firms. We can conclude that...
Consider a firm whose production is given by Q(K, L) = K^1/3L^1/3, where K and L...
Consider a firm whose production is given by Q(K, L) = K^1/3L^1/3, where K and L are, respectively, the quantities of capital and labour production inputs. Prices of capital and labour are both $1 per unit. (a) Derive and interpret the firm’s demand functions for capital and labour. (b) Derive and interpret the firm’s Long-Run Cost Function. (c) In the long-run, if the firm wished to produce 16 units of output, what quantities of capital and labour would it optimally...
A woman stands a distance d from a loud motor that emits sound uniformly in all...
A woman stands a distance d from a loud motor that emits sound uniformly in all directions. The sound intensity at her position is an uncomfortable 3.3×10-3 W/m2. At a distance 2.9 times as far from the motor, what are (a) the sound intensity and (b) the sound intensity level relative to the threshold of hearing?
Which of the following statements are true? Briefly explain your answer. 1. Training a k-nearest-neighbors classifier...
Which of the following statements are true? Briefly explain your answer. 1. Training a k-nearest-neighbors classifier takes less computational time than testing it. 2. The more training examples, the more accurate the prediction of a k-nearest-neighbors. 3. k-nearest-neighbors cannot be used for regression. 4. A k-nearest-neighbors is sensitive to the number of features.
If there are 3 quantum wells separated by the same distance in a 1-D array with...
If there are 3 quantum wells separated by the same distance in a 1-D array with wavefunction ψ1 , ψ2 , and ψ3 for these three quantum wells respectively. The electrons only acts on the neighboring electrons with perturbation energy H’ = -|ΔE|. If the original eigen energy of ψ1 , ψ2 , or ψ3 is E1, try to use degenerate perturbation theory to calculate and sketch the new eigen energies for this array with 3 quantum wells.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT