In: Computer Science
Java programming problem: For simplicity, you can consider DNA sequences as strings in the alphabet of {A, C, G, T}. Implement a special hash table to store all short DNA segments of certain length (e.g., 20) found in a long DNA sequence. Every DNA segment is stored as a (key, value) pair, where the key is the sequence of the segment, and the value is the position of the segment in the DNA. For example given a DNA sequence of ACGTACGTTT, there are 3 segments of length 8, so the corresponding (key, value) pairs are: (ACGTACGT, 0), (CGTACGTT, 1), and (GTACGTTT, 2). Use open hash table for this challenge. DNA segments with different keys may be hashed to the same slot (collision) and if it happens, all these DNA segments will be put in a linked list in the corresponding slot. If the keys are of length 20, there will be a total of 420 different DNA segments of 20 letters. A hash table with these many slots can be very sparse (many slots won't be occupied). To avoid this problem, you will use a hash function that computes the index for each key only based on the first few letters, e.g., the first 4 letters in each key. There are a total of 44 unique DNA segments with 4 letters: AAAA, CAAA, GAAA, TAAA, ACAA, CCAA, GCAA, .... and TTTT, so any function that maps each of these segments into a unique integer (in the range of [0,255]) will work well as a hash function for a hash table that stores the DNA segments. Given the length of DNA segments and the number of letters used for hashing, and a DNA sequence, your java program needs to compute the number of slots needed for the hash table, and the number of slots that are filled (non-empty) after putting the segments from the given DNA into the table. Input Format Two lines. The first line contains two numbers: n and k, where n is the length of DNA segments to be used as the keys, and k indicates how many letters are used for computing the index in the hash function. The second line contains the sequence of a DNA sequence. Constraints: 20 <= n <= 30 2 <= k <= 8 k < n The length of the DNA sequence is > n Output Format: Two numbers in one line. The first number shows the total slots (filled & unfilled) in the hash table, and the second number shows the filled slots. Sample input: 8 4 ACGTACGTAC Sample output: 256 3 Explanation: In this case, n = 8 and k = 4. The input DNA is ACGTACGTAC so there are three (key, value) pairs: ("ACGTACGT", 0), ("CGTACGTA", 1), and ("GTACGTAC", 2). As the hash function computes the index based on the first four (k=4) letters of each key, and there are a total of (256) different combinations of four letters, the hash table only needs 256 slots. Out of the 256 slots, only three are occupied.
Summary-
input :
First line: n and k -> indicating how many letters are used for computing the index in hash function
Second line- DNA Sequence
Output:
Returning the output in the form of string with the total slots and the filled slots.
Code-
import java.util.*;
public class Bts {
public static void main(String[] args) {
//Scanner class to receive the inputs
Scanner sc=new Scanner(System.in);
// For storing the n and k values of the input
int n,k,j=0;
String firstline="8 4";
n=Character.getNumericValue(firstline.charAt(0));
//k will be at 2nd index because at index 1 there will
be a space
k=Character.getNumericValue(firstline.charAt(2));
String secondline=sc.next();
//Creating a hashtable
Hashtable<String,Integer> h = new
Hashtable<String,Integer>();
for(int i=0;i<secondline.length()-n;i++)
{
if(!(h.containsKey(secondline.substring(i, i+8))))
{
h.put(secondline.substring(i,i+8),j);
j++;
}
}
System.out.println(Math.pow(k, k) + h.size()+1+"
");
}
public static int fact(int n)
{
int ans=1;
while(n!=1)
{
ans =
ans*n;
n--;
}
return ans;
}
}