Question

In: Computer Science

Java programming problem: For simplicity, you can consider DNA sequences as strings in the alphabet of...

Java programming problem: For simplicity, you can consider DNA sequences as strings in the alphabet of {A, C, G, T}. Implement a special hash table to store all short DNA segments of certain length (e.g., 20) found in a long DNA sequence. Every DNA segment is stored as a (key, value) pair, where the key is the sequence of the segment, and the value is the position of the segment in the DNA. For example given a DNA sequence of ACGTACGTTT, there are 3 segments of length 8, so the corresponding (key, value) pairs are: (ACGTACGT, 0), (CGTACGTT, 1), and (GTACGTTT, 2). Use open hash table for this challenge. DNA segments with different keys may be hashed to the same slot (collision) and if it happens, all these DNA segments will be put in a linked list in the corresponding slot. If the keys are of length 20, there will be a total of 420 different DNA segments of 20 letters. A hash table with these many slots can be very sparse (many slots won't be occupied). To avoid this problem, you will use a hash function that computes the index for each key only based on the first few letters, e.g., the first 4 letters in each key. There are a total of 44 unique DNA segments with 4 letters: AAAA, CAAA, GAAA, TAAA, ACAA, CCAA, GCAA, .... and TTTT, so any function that maps each of these segments into a unique integer (in the range of [0,255]) will work well as a hash function for a hash table that stores the DNA segments. Given the length of DNA segments and the number of letters used for hashing, and a DNA sequence, your java program needs to compute the number of slots needed for the hash table, and the number of slots that are filled (non-empty) after putting the segments from the given DNA into the table. Input Format Two lines. The first line contains two numbers: n and k, where n is the length of DNA segments to be used as the keys, and k indicates how many letters are used for computing the index in the hash function. The second line contains the sequence of a DNA sequence. Constraints: 20 <= n <= 30 2 <= k <= 8 k < n The length of the DNA sequence is > n Output Format: Two numbers in one line. The first number shows the total slots (filled & unfilled) in the hash table, and the second number shows the filled slots. Sample input: 8 4 ACGTACGTAC Sample output: 256 3 Explanation: In this case, n = 8 and k = 4. The input DNA is ACGTACGTAC so there are three (key, value) pairs: ("ACGTACGT", 0), ("CGTACGTA", 1), and ("GTACGTAC", 2). As the hash function computes the index based on the first four (k=4) letters of each key, and there are a total of (256) different combinations of four letters, the hash table only needs 256 slots. Out of the 256 slots, only three are occupied.

Solutions

Expert Solution

Summary-

input :

First line: n and k -> indicating how many letters are used for computing the index in hash function

Second line- DNA Sequence

Output:

Returning the output in the form of string with the total slots and the filled slots.

Code-

import java.util.*;

public class Bts {

  
   public static void main(String[] args) {
  
   //Scanner class to receive the inputs
   Scanner sc=new Scanner(System.in);
  
   // For storing the n and k values of the input
   int n,k,j=0;
  
   String firstline="8 4";
   n=Character.getNumericValue(firstline.charAt(0));
  
   //k will be at 2nd index because at index 1 there will be a space
   k=Character.getNumericValue(firstline.charAt(2));
  
   String secondline=sc.next();


//Creating a hashtable
   Hashtable<String,Integer> h = new Hashtable<String,Integer>();
  
   for(int i=0;i<secondline.length()-n;i++)
   {
       if(!(h.containsKey(secondline.substring(i, i+8))))
           {
           h.put(secondline.substring(i,i+8),j);
           j++;
           }
   }
  
   System.out.println(Math.pow(k, k) + h.size()+1+" ");
  
  
   }
  
   public static int fact(int n)
   {
       int ans=1;
       while(n!=1)
       {
           ans = ans*n;
           n--;
       }
       return ans;
   }
  
}


Related Solutions

For simplicity, you can consider DNA sequences as strings in the alphabet of {A, C, G,...
For simplicity, you can consider DNA sequences as strings in the alphabet of {A, C, G, T}. Implement a special hash table to store all short DNA segments of certain length (e.g., 20) found in a long DNA sequence. Every DNA segment is stored as a (key, value) pair, where the key is the sequence of the segment, and the value is the position of the segment in the DNA. For example given a DNA sequence of ACGTACGTTT, there are...
Problem 4 (Sets defined inductively) [30 marks] Consider the set S of strings over the alphabet...
Problem 4 (Sets defined inductively) [30 marks] Consider the set S of strings over the alphabet {a, b} defined inductively as follows: • Base case: the empty word λ and the word a belong to S • Inductive rule: if ω is a string of S then both ω b and ω b a belong to S as well. 1. Prove that if a string ω belongs to S, then ω does not have two or more consecutive a’s. 2....
JAVA PROGRAMMING Implement a class Purse. A purse contains a collection of coins. For simplicity, we...
JAVA PROGRAMMING Implement a class Purse. A purse contains a collection of coins. For simplicity, we will only store the coin names in an ArrayList<String>. Supply a method void addCoin(String coinName). Add a method toString to the Purse class that prints the coins in the purse in the format Purse[Quarter,Dime,Nickel,Dime]. Write a method reverse that reverses the sequence of coins in a purse. Implement a TestPurse class with a main method in it. It will use the toString method to...
PROGRAMMING LANGUAGE : JAVA Problem specification. In this assignment, you will create a simulation for a...
PROGRAMMING LANGUAGE : JAVA Problem specification. In this assignment, you will create a simulation for a CPU scheduler. The number of CPU’s and the list of processes and their info will be read from a text file. The output, of your simulator will display the execution of the processes on the different available CPU’s. The simulator should also display: -   The given info of each process -   CPU utilization - The average wait time - Turnaround time for each process...
Runtime efficiency and code simplicity are often competing goals. How can you deal with this problem?...
Runtime efficiency and code simplicity are often competing goals. How can you deal with this problem? Is it possible to have code that is both simple and efficient?
Write a program in java that can perform encryption/decryption. In the following let the alphabet A...
Write a program in java that can perform encryption/decryption. In the following let the alphabet A be A={A, a, B, b, . . ., “ ”, “.”,“’ ”}. The encoding is A→0, a→1, B→2, b→4, . . ., Z→50, z→51, “ ”→52, “.”→53 and “’”→54.
Programming Problem Design a program in java that can perform three different arithmetic operations based on...
Programming Problem Design a program in java that can perform three different arithmetic operations based on the user’s input. The three operations are 1. Summation of integers from 1 to m 2. Factorial of a given number n (n!) 3. Finding the leftmost digit of a given integer (note: you have to use int) This program should prompt the user a menu including four options, ask the user for one option, and perform the corresponding computation. This process will repeat...
Build a Dynamic Programming algorithm in Python for the following problem. Given two sequences of vowels,...
Build a Dynamic Programming algorithm in Python for the following problem. Given two sequences of vowels, find the length of longest subsequence present in both of them. NOTE: A subsequence is a sequence that appears in the same order, but not necessarily contiguous. For example, “aei”, “aeo”, “eio”, “aiu”, ... are subsequences of “aeiou”. Sample Input 1: aeiou aiu Sample Output 1: 3 Sample Input 2: aeaiueoiuaeeoeooaauoi aeuioauuuoaeieeaueuiouaiieuiuuuaoueueauaeiauuo Sample Output 2: 16 Sample Input 3: iioioieiaiauaoeoiioiiue iuuueauiaieooaoaaouaaaae Sample Output 3:...
Explain how alteration in DNA sequences contribute to variation that can be subject to natural selection....
Explain how alteration in DNA sequences contribute to variation that can be subject to natural selection. (Please use these terms in your answer) Chromosome number Chromosomal human disorders: down syndrome, turner syndrome Natural selection Horizontal acquisitions   Naked DNA Viral transmission Conjugation Transposition Viruses recombination Meiosis(in various organisms)  
8. Epigenetic modifications to DNA sequences and resulting alterations in chromatin structure can be analyzed by...
8. Epigenetic modifications to DNA sequences and resulting alterations in chromatin structure can be analyzed by examining DNA methylation and histone modifications. To examine the methylation of a DNA sequence, you treat it with sodium bisulfite.   a. (2 points) If your original DNA sequence is: 5’-ACAGTCCGTCGGAGCCTGCCAGTCGATCGCACCT-3’ and your sequence after sodium bisulfite treatment reads: 5’-ACAGTTCGTCGGAGCTTGTTAGTCGATCGCACTT-3’, which positions on the original DNA sequence are methylated? (Indicate methylations by circling the methylated nucleotides.) b. (2 points) Draw the double-stranded DNA sequence after...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT