In: Computer Science
For simplicity, you can consider DNA sequences as strings in the alphabet of {A, C, G, T}. Implement a special hash table to store all short DNA segments of certain length (e.g., 20) found in a long DNA sequence. Every DNA segment is stored as a (key, value) pair, where the key is the sequence of the segment, and the value is the position of the segment in the DNA. For example given a DNA sequence of ACGTACGTTT, there are 3 segments of length 8, so the corresponding (key, value) pairs are: (ACGTACGT, 0), (CGTACGTT, 1), and (GTACGTTT, 2). Use open hash table for this challenge. DNA segments with different keys may be hashed to the same slot (collision) and if it happens, all these DNA segments will be put in a linked list in the corresponding slot. If the keys are of length 20, there will be a total of different DNA segments of 20 letters. A hash table with these many slots can be very sparse (many slots won't be occupied). To avoid this problem, you will use a hash function that computes the index for each key only based on the first few letters, e.g., the first 4 letters in each key. There are a total of unique DNA segments with 4 letters: AAAA, CAAA, GAAA, TAAA, ACAA, CCAA, GCAA, .... and TTTT, so any function that maps each of these segments into a unique integer (in the range of [0,255]) will work well as a hash function for a hash table that stores the DNA segments. Given the length of DNA segments and the number of letters used for hashing, and a DNA sequence, your program needs to compute the number of slots needed for the hash table, and the number of slots that are filled (non-empty) after putting the segments from the given DNA into the table.
Sample Input 0
8 4 ACGTACGTAC
Sample Output 0
256 3
Explanation 0
In this case, n = 8 and k = 4. The input DNA is ACGTACGTAC so there are three (key, value) pairs: ("ACGTACGT", 0), ("CGTACGTA", 1), and ("GTACGTAC", 2). As the hash function computes the index based on the first four (k=4) letters of each key, and there are a total of (256) diffent combinations of four letters, the hash table only needs 256 slots. Out of the 256 slots, only three are occupied.
In Java Please
n = int(input('Enter value for n: ')) k = int(input('Enter value for k: ')) DNA = input('Enter DNA: ') numPairs = (len(DNA) - n + 1) totalSlots = 4 ** k print(totalSlots, numPairs)
Update:
import java.util.Scanner;
public class DNATest {
public static void main(String[] args) {
Scanner in = new
Scanner(System.in);
System.out.println("Enter value
for n: ");
int n = in.nextInt();
System.out.println("Enter value
for k: ");
int k = in.nextInt();
in.nextLine();
System.out.println("Enter DNA:
");
String dna = in.nextLine();
int numPairs = (dna.length() - n +
1);
int totalSlots = (int) Math.pow(4,
k);
System.out.println(totalSlots +
" " + numPairs);
in.close();
}
}
************************************************** Thanks for your question. We try our best to help you with detailed answers, But in any case, if you need any modification or have a query/issue with respect to above answer, Please ask that in the comment section. We will surely try to address your query ASAP and resolve the issue.
Please consider providing a thumbs up to this question if it helps you. by Doing that, You will help other students, who are facing similar issue.