Question

In: Computer Science

For simplicity, you can consider DNA sequences as strings in the alphabet of {A, C, G,...

For simplicity, you can consider DNA sequences as strings in the alphabet of {A, C, G, T}. Implement a special hash table to store all short DNA segments of certain length (e.g., 20) found in a long DNA sequence. Every DNA segment is stored as a (key, value) pair, where the key is the sequence of the segment, and the value is the position of the segment in the DNA. For example given a DNA sequence of ACGTACGTTT, there are 3 segments of length 8, so the corresponding (key, value) pairs are: (ACGTACGT, 0), (CGTACGTT, 1), and (GTACGTTT, 2). Use open hash table for this challenge. DNA segments with different keys may be hashed to the same slot (collision) and if it happens, all these DNA segments will be put in a linked list in the corresponding slot. If the keys are of length 20, there will be a total of different DNA segments of 20 letters. A hash table with these many slots can be very sparse (many slots won't be occupied). To avoid this problem, you will use a hash function that computes the index for each key only based on the first few letters, e.g., the first 4 letters in each key. There are a total of unique DNA segments with 4 letters: AAAA, CAAA, GAAA, TAAA, ACAA, CCAA, GCAA, .... and TTTT, so any function that maps each of these segments into a unique integer (in the range of [0,255]) will work well as a hash function for a hash table that stores the DNA segments. Given the length of DNA segments and the number of letters used for hashing, and a DNA sequence, your program needs to compute the number of slots needed for the hash table, and the number of slots that are filled (non-empty) after putting the segments from the given DNA into the table.

Sample Input 0

8 4
ACGTACGTAC

Sample Output 0

256 3

Explanation 0

In this case, n = 8 and k = 4. The input DNA is ACGTACGTAC so there are three (key, value) pairs: ("ACGTACGT", 0), ("CGTACGTA", 1), and ("GTACGTAC", 2). As the hash function computes the index based on the first four (k=4) letters of each key, and there are a total of  (256) diffent combinations of four letters, the hash table only needs 256 slots. Out of the 256 slots, only three are occupied.

In Java Please

Solutions

Expert Solution

n = int(input('Enter value for n: '))
k = int(input('Enter value for k: '))

DNA = input('Enter DNA: ')

numPairs = (len(DNA) - n + 1)
totalSlots = 4 ** k

print(totalSlots, numPairs)

Update:

import java.util.Scanner;

public class DNATest {

   public static void main(String[] args) {
      
       Scanner in = new Scanner(System.in);

       System.out.println("Enter value for n: ");
       int n = in.nextInt();
      

       System.out.println("Enter value for k: ");
       int k = in.nextInt();
       in.nextLine();
      
       System.out.println("Enter DNA: ");
       String dna = in.nextLine();
      
       int numPairs = (dna.length() - n + 1);
       int totalSlots = (int) Math.pow(4, k);

       System.out.println(totalSlots + " " + numPairs);
       in.close();
      
   }

}


**************************************************

Thanks for your question. We try our best to help you with detailed answers, But in any case, if you need any modification or have a query/issue with respect to above answer, Please ask that in the comment section. We will surely try to address your query ASAP and resolve the issue.

Please consider providing a thumbs up to this question if it helps you. by Doing that, You will help other students, who are facing similar issue.


Related Solutions

Java programming problem: For simplicity, you can consider DNA sequences as strings in the alphabet of...
Java programming problem: For simplicity, you can consider DNA sequences as strings in the alphabet of {A, C, G, T}. Implement a special hash table to store all short DNA segments of certain length (e.g., 20) found in a long DNA sequence. Every DNA segment is stored as a (key, value) pair, where the key is the sequence of the segment, and the value is the position of the segment in the DNA. For example given a DNA sequence of...
We would like to align two DNA sequences: (v) C G A T A C T,...
We would like to align two DNA sequences: (v) C G A T A C T, and (w) G A T T C G T : i) s(i, j) = 1.5 if vi = wj (matches); ii) s(i, j) = -1.0 if vi != wj (mismatches); iii) d = 0.25 (indels: insertions or deletions). What would be the maximum alignment score? Explain how you get the result.
Find the number of N digit sequences from the alphabet a, b, c, d with an...
Find the number of N digit sequences from the alphabet a, b, c, d with an even number of a's and an odd number of b's
Problem 4 (Sets defined inductively) [30 marks] Consider the set S of strings over the alphabet...
Problem 4 (Sets defined inductively) [30 marks] Consider the set S of strings over the alphabet {a, b} defined inductively as follows: • Base case: the empty word λ and the word a belong to S • Inductive rule: if ω is a string of S then both ω b and ω b a belong to S as well. 1. Prove that if a string ω belongs to S, then ω does not have two or more consecutive a’s. 2....
Consider this reaction: A(g) + B(g)  C(g) at 298K. Consider the following conditions: What can be said...
Consider this reaction: A(g) + B(g)  C(g) at 298K. Consider the following conditions: What can be said about the spontaneity of this reaction compared to the same at standard conditions? A) It is less spontaneous, but still nonspontaneous overall B) It is more spontaneous, and is a spontaneous reaction at these conditions C) It is more spontaneous, but still not spontaneous overall D) It is less spontaneous, and it is a spontaneous reaction at these conditions E) It doesn't matter if...
What is the mRNA sequence of the following original DNA sequences? a) TAC b) GAG c)...
What is the mRNA sequence of the following original DNA sequences? a) TAC b) GAG c) CAT d) AAA e) GCC f) TTC g) CCG h) TGT
Using the pigeonhole theorem prove that any set of 220 10-character strings over the alphabet {a,b,c,d}...
Using the pigeonhole theorem prove that any set of 220 10-character strings over the alphabet {a,b,c,d} contains a pair of anagrams.
Explain how alteration in DNA sequences contribute to variation that can be subject to natural selection....
Explain how alteration in DNA sequences contribute to variation that can be subject to natural selection. (Please use these terms in your answer) Chromosome number Chromosomal human disorders: down syndrome, turner syndrome Natural selection Horizontal acquisitions   Naked DNA Viral transmission Conjugation Transposition Viruses recombination Meiosis(in various organisms)  
8. Epigenetic modifications to DNA sequences and resulting alterations in chromatin structure can be analyzed by...
8. Epigenetic modifications to DNA sequences and resulting alterations in chromatin structure can be analyzed by examining DNA methylation and histone modifications. To examine the methylation of a DNA sequence, you treat it with sodium bisulfite.   a. (2 points) If your original DNA sequence is: 5’-ACAGTCCGTCGGAGCCTGCCAGTCGATCGCACCT-3’ and your sequence after sodium bisulfite treatment reads: 5’-ACAGTTCGTCGGAGCTTGTTAGTCGATCGCACTT-3’, which positions on the original DNA sequence are methylated? (Indicate methylations by circling the methylated nucleotides.) b. (2 points) Draw the double-stranded DNA sequence after...
1.) Consider this exothermic reaction: A(g) + B(g) 2 C(g) + D(g) What can be said...
1.) Consider this exothermic reaction: A(g) + B(g) 2 C(g) + D(g) What can be said about the spontaneity of this reaction? A) It is never spontaneous B) Not enough info C) It is only spontaneous at high temperatures E) It is only spontaneous at low temperatures F) It is always spontaneous 2.) Consider this reaction: A(g) + B(g) C(g) What can be said about the entropy () of this reaction? A.) It is equal to zero B) It is...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT