Question

In: Computer Science

In java what is the best algorithm and data structure choice to search for all instances...

In java what is the best algorithm and data structure choice to search for all instances where a pattern is found some larger texts. Also please provide the worst-case complexity big o notation of this and why it is best.

Assuming the texts are all stored in a class instance that has a field contents. Which is then stored as a array list value in a linked hash map with the key being the username of the poster.

Solutions

Expert Solution

Theory:

The above problem can is String Matching a pattern and there exists multiple algorithms for the same..

String Matching Problem

I/p :

  1. text of size n (where pattern is to be found)
  2. pattern to search of size m

O/p : location where element is found

Analysis

The best algorithm to implement the above problem depend on many aspects:

Time Complexity : Both in terms of pattern and text.

Space Complexity: Both in terms of pattern and text.

The following algorithm are compared on how text and pattern are

  1. Boyer -Moore Algorithm
  2. Suffix Tree

==================================================================

Boyer Moore Algorithm  preprocesses pat[] and constructs an auxiliary lps[] of size m (same as size of pattern) which is used to skip characters while matching.

Complexity

Preprcessing pattern: O(m)

  Searchng pattern in text: O(n) best case , O(mn) worst case

==================================================================

Suffix tree . A suffix tree on the other hand uses pre-processing of the text which s termed as suffix tree. After preprocessing text, searching a pattern requires O(m) time.

Complexity

Preprcessing text: O(n)

  Searchng pattern in text: O(m) best case (unique pattern among all words in the list) , O(m*k) where k is no of pattern present which is K< n/m. i.e. O(n)

=================================================================

The choice of algorithm now depends

  • if text is static then Suffix Tree is good choice as pattern can be found in  O(m).
  • if text containing many occurrence than it better to use Boyer-Moore Algorithm. In all other cases Suffix tree is preferred.
  • Suffix tree creates tree of all text which requires a considerable space.

Please use the one suited to Given problem specification.

Analyzing from the above problem it seems text is static and not many occurrence exist in text . Suffix tree will be a good choice . Still it completely dependent on your problem specification.

Code

import java.util.Iterator;
import java.util.TreeMap;
import java.util.ArrayList;

public class SuffixTree
{
        private static final int INF = Integer.MAX_VALUE-1;
        private int last;
        private String str;
        private Node root, sentinel;

        public class Edge
        {
                /** An inclusive range [a,b] of indices into str representing the edge label. */
                public int a, b;
                /** Node where the edge points to */
                public Node end;
                public Edge(int a, int b, Node end) { this.a = a; this.b = b; this.end = end; }
                /** Returns the first character on the edge. */
                public char getFirst() { return str.charAt(a); }
                /** Length of the string on the edge label. */
                public int length() { return Math.min(last,b) - a + 1; } 
                @Override
                public String toString() { return str.substring(a, Math.min(b,last)+1); }
        }
        /**
         * Node of the SuffixTree.
         */
        public class Node implements Iterable<Edge>
        {
                /** Maps first letters to edges leading to children */
                public TreeMap<Character,Edge> edges = new TreeMap<Character,Edge>();
                /** Suffix link used in linear tree building algorithm */
                private Node suffix;
                /** An inclusive range [a,b] of indices into str representing a prefix leading to this node */
                public int a, b;
                /** Adds an edge to the map */
                public void add(Edge e) { edges.put(e.getFirst(),e); }
                /** Gets the edge using the given first character */
                public Edge get(char c) { return edges.get(c); }
                /** Returns the number of children */
                public int numChildren() { return edges.size(); }
                @Override
                public Iterator<Edge> iterator() { return edges.values().iterator(); }
                public String toString() { return str.substring(a,Math.min(b,last)+1); }
        }
        public SuffixTree(String str)
        {
                this.str = str;
                buildTree();
                setPrefix(root,null,0);
        }
        @Override
        /** Pretty prints the tree.  For each node it prints the prefix leading to that node in quotes.
         * Then it prints each edge leaving the node indexed by first letter, the edge label, the inclusive range of indices,
         * and the length of the label.
         */
        public String toString() { return prettyFormat(new StringBuilder(), root, new StringBuilder()).toString(); }
        private int fix(int x) { return x == INF ? last : x; }
        private StringBuilder prettyFormat(StringBuilder sb, Node n, StringBuilder tab)
        {
                sb.append(tab).append('"').append(n).append('"').append('\n');
                for (Edge e : n)
                {
                        char c = e.getFirst();
                        sb.append(tab).append(c).append(" : ").append(e).append(" = ").append(e.a).append(',').append(fix(e.b)).append(',').append(e.length()).append('\n');
                        tab.append("  ");
                        prettyFormat(sb,e.end,tab);
                        tab.delete(tab.length()-2, tab.length());
                }
                return sb;
        }
        /** Returns the string used to make the suffix tree */
        public String getStr() { return str; }
        /** Returns the root node */
        public Node getRoot() { return root; }
        private void setPrefix(Node n, Edge e, int len)
        {
                for (Edge edge : n) setPrefix(edge.end,edge,len+Math.min(edge.b,last)-edge.a+1);
                if (e == null) { n.a = 0; n.b = -1; }
                else { n.b = Math.min(e.b,last); n.a = n.b - len+1; }
        }
        private void buildTree()
        {
                root = new Node();
                sentinel = new Node();
                root.suffix = sentinel;
                Node s = root;
                int[] k = {0};
                last = -1;
                for (int i = 0; i < str.length(); ++i)
                {
                        last++;
                        s = update(s,k,i);
                        s = canonize(s,k,i);
                }
        }
        private Node update(Node s, int[] k, int i)
        {
                Node oldr = root, r = testAndSplit(s,k[0],i-1,str.charAt(i));
                while (r != null)
                {
                        Node rp = new Node();
                        Edge e = new Edge(i,INF,rp);
                        r.add(e);
                        if (oldr != root) oldr.suffix = r;
                        oldr = r;
                        s = canonize(s.suffix,k,i-1);
                        r = testAndSplit(s,k[0],i-1,str.charAt(i));
                }
                if (oldr != root) oldr.suffix = s;
                return s;
        }
        private Node testAndSplit(Node s, int k, int p, char c)
        {
                if (k > p) return s == sentinel ? null : s.get(c) == null ? s : null;
                Edge e = s.get(str.charAt(k));
                if (c == str.charAt(e.a+p-k+1)) return null; //check if char after implicit node is c
                Node r = new Node();
                Edge re = new Edge(e.a+p-k+1,e.b,e.end);
                r.add(re);
                Edge se = new Edge(e.a,e.a+p-k,r);
                s.add(se);
                return r;
        }
        private Node canonize(Node s, int[] k, int p)
        {
                if (p < k[0]) return s;
                if (s == sentinel) { s = root; k[0]++; if (p < k[0]) return s; }
                Edge e = s.get(str.charAt(k[0]));
                while (e.b - e.a <= p - k[0])
                {
                        k[0] = k[0] + e.b - e.a + 1;
                        s = e.end;
                        if (k[0] <= p) e = s.get(str.charAt(k[0]));
                }
                return s;
        }
        
        public static void main(String[] args) 
        {

// The Function receiveData return data in Linked Hash Map object ihm. Please make function for the same.

LinkedHashMap<String, ArrayList> lhm=receiveData();

         
         ArrayList<String> arrlis=lhm.get("poster");                        
        StringBuffer strbuf=new StringBuffer();
        for (int i = 0; i < arrlis.size(); i++) {
                 strbuf.append(cars.get(i));
    }
                System.out.pritln(new SuffixTree(strbuf.toString()));
   
        }
}



Related Solutions

Binary Search Algorithm a.) Create a Java application that utilizes the "Binary Search Algorithm" presented in...
Binary Search Algorithm a.) Create a Java application that utilizes the "Binary Search Algorithm" presented in chapter 19 (NOT Java's pre-built binarySearch() method from imported Java library) to search for an integer in a random array of size 30 in the range of 0 to 1000 inclusive. You should use Java's random number generator to randomize the numbers in your array. b.) The application's main() method should display unsorted array and sorted array, prompt user for a search key, allow...
In java what is the full code of a BoyerMoore search algorithm when nested in another...
In java what is the full code of a BoyerMoore search algorithm when nested in another class, using only JCF and no non standard packages. The text searching through can only contain printable ASCII characters, and capitalization matters. Also please provide the worst-case time complexity of the search algorithm. The pattern is a String, and the contents searching is a String.
Assume you need to write a Java program that uses a binary search algorithm to search...
Assume you need to write a Java program that uses a binary search algorithm to search a sorted array for a given value. 1. Write a Java pseudocode that uses recursion to accomplish the task. Here is a hint. When you are searching for a particular value in an array, there are two possible outcomes. 1) The value is found and the array index of that value is returned. 2) The value is not found and we return -1. (5...
Write a Java program that implements the Depth-First Search (DFS) algorithm. Input format: This is a...
Write a Java program that implements the Depth-First Search (DFS) algorithm. Input format: This is a sample input from a user. 3 2 0 1 1 2 The first line (= 3 in the example) indicates that there are three vertices in the graph. You can assume that the first vertex starts from the number 0. The second line (= 2 in the example) represents the number of edges, and following two lines are the edge information. This is the...
Course: DSA Data Structure and Algorithm What is an AVL Tree, what are its different types...
Course: DSA Data Structure and Algorithm What is an AVL Tree, what are its different types of rotation, Illustrate with examples.
In a Uniprocessor scheduler, what is the best algorithm? Why
In a Uniprocessor scheduler, what is the best algorithm? Why
What is the greedy choice being made by Kruskal’s algorithm? Show that this choice results in...
What is the greedy choice being made by Kruskal’s algorithm? Show that this choice results in an optimal substructure.
Write a program in C++ to implement Binary Search Algorithm. Assume the data given in an...
Write a program in C++ to implement Binary Search Algorithm. Assume the data given in an array. Use number of data N at least more than 10. The function Binary Search should return the index of the search key V.
Class GroceryBag collects instances of class Item in an appropriate data structure.                       Class Item      &n
Class GroceryBag collects instances of class Item in an appropriate data structure.                       Class Item                 ---------------------------                -String description                -int cost   //price in cents                ----------------------------                + [constructor, getters, and a toString() method] •         (Assume class Item has already been coded)                       Class GroceryBag (Highlights)                ----------------------------------                  -bag   // an ArrayList of <Item>                ----------------------------------                   assume public methods: constructor, totalBag(), findItem(String it), listAll(), etc. •         The ONLY thing you need to do in the space below is to...
Develop a Java application to implement a binary tree data structure. A tree data structure starts...
Develop a Java application to implement a binary tree data structure. A tree data structure starts from the top node, called the root. Each node in the tree has a set of children, which are also nodes of the tree and can have children of their own, and so on. This keeps on going till we get to the bottom of the tree, to the “leaf” nodes. Each node in the tree, except for the root, has a parent. A...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT