In: Computer Science
In java what is the best algorithm and data structure choice to search for all instances where a pattern is found some larger texts. Also please provide the worst-case complexity big o notation of this and why it is best.
Assuming the texts are all stored in a class instance that has a field contents. Which is then stored as a array list value in a linked hash map with the key being the username of the poster.
Theory:
The above problem can is String Matching a pattern and there exists multiple algorithms for the same..
String Matching Problem
I/p :
O/p : location where element is found
Analysis
The best algorithm to implement the above problem depend on many aspects:
Time Complexity : Both in terms of pattern and text.
Space Complexity: Both in terms of pattern and text.
The following algorithm are compared on how text and pattern are
==================================================================
Boyer Moore Algorithm preprocesses pat[] and constructs an auxiliary lps[] of size m (same as size of pattern) which is used to skip characters while matching.
Complexity
Preprcessing pattern: O(m)
Searchng pattern in text: O(n) best case , O(mn) worst case
==================================================================
Suffix tree . A suffix tree on the other hand uses pre-processing of the text which s termed as suffix tree. After preprocessing text, searching a pattern requires O(m) time.
Complexity
Preprcessing text: O(n)
Searchng pattern in text: O(m) best case (unique pattern among all words in the list) , O(m*k) where k is no of pattern present which is K< n/m. i.e. O(n)
=================================================================
The choice of algorithm now depends
Please use the one suited to Given problem specification.
Analyzing from the above problem it seems text is static and not many occurrence exist in text . Suffix tree will be a good choice . Still it completely dependent on your problem specification.
Code
import java.util.Iterator; import java.util.TreeMap; import java.util.ArrayList;
public class SuffixTree { private static final int INF = Integer.MAX_VALUE-1; private int last; private String str; private Node root, sentinel; public class Edge { /** An inclusive range [a,b] of indices into str representing the edge label. */ public int a, b; /** Node where the edge points to */ public Node end; public Edge(int a, int b, Node end) { this.a = a; this.b = b; this.end = end; } /** Returns the first character on the edge. */ public char getFirst() { return str.charAt(a); } /** Length of the string on the edge label. */ public int length() { return Math.min(last,b) - a + 1; } @Override public String toString() { return str.substring(a, Math.min(b,last)+1); } } /** * Node of the SuffixTree. */ public class Node implements Iterable<Edge> { /** Maps first letters to edges leading to children */ public TreeMap<Character,Edge> edges = new TreeMap<Character,Edge>(); /** Suffix link used in linear tree building algorithm */ private Node suffix; /** An inclusive range [a,b] of indices into str representing a prefix leading to this node */ public int a, b; /** Adds an edge to the map */ public void add(Edge e) { edges.put(e.getFirst(),e); } /** Gets the edge using the given first character */ public Edge get(char c) { return edges.get(c); } /** Returns the number of children */ public int numChildren() { return edges.size(); } @Override public Iterator<Edge> iterator() { return edges.values().iterator(); } public String toString() { return str.substring(a,Math.min(b,last)+1); } } public SuffixTree(String str) { this.str = str; buildTree(); setPrefix(root,null,0); } @Override /** Pretty prints the tree. For each node it prints the prefix leading to that node in quotes. * Then it prints each edge leaving the node indexed by first letter, the edge label, the inclusive range of indices, * and the length of the label. */ public String toString() { return prettyFormat(new StringBuilder(), root, new StringBuilder()).toString(); } private int fix(int x) { return x == INF ? last : x; } private StringBuilder prettyFormat(StringBuilder sb, Node n, StringBuilder tab) { sb.append(tab).append('"').append(n).append('"').append('\n'); for (Edge e : n) { char c = e.getFirst(); sb.append(tab).append(c).append(" : ").append(e).append(" = ").append(e.a).append(',').append(fix(e.b)).append(',').append(e.length()).append('\n'); tab.append(" "); prettyFormat(sb,e.end,tab); tab.delete(tab.length()-2, tab.length()); } return sb; } /** Returns the string used to make the suffix tree */ public String getStr() { return str; } /** Returns the root node */ public Node getRoot() { return root; } private void setPrefix(Node n, Edge e, int len) { for (Edge edge : n) setPrefix(edge.end,edge,len+Math.min(edge.b,last)-edge.a+1); if (e == null) { n.a = 0; n.b = -1; } else { n.b = Math.min(e.b,last); n.a = n.b - len+1; } } private void buildTree() { root = new Node(); sentinel = new Node(); root.suffix = sentinel; Node s = root; int[] k = {0}; last = -1; for (int i = 0; i < str.length(); ++i) { last++; s = update(s,k,i); s = canonize(s,k,i); } } private Node update(Node s, int[] k, int i) { Node oldr = root, r = testAndSplit(s,k[0],i-1,str.charAt(i)); while (r != null) { Node rp = new Node(); Edge e = new Edge(i,INF,rp); r.add(e); if (oldr != root) oldr.suffix = r; oldr = r; s = canonize(s.suffix,k,i-1); r = testAndSplit(s,k[0],i-1,str.charAt(i)); } if (oldr != root) oldr.suffix = s; return s; } private Node testAndSplit(Node s, int k, int p, char c) { if (k > p) return s == sentinel ? null : s.get(c) == null ? s : null; Edge e = s.get(str.charAt(k)); if (c == str.charAt(e.a+p-k+1)) return null; //check if char after implicit node is c Node r = new Node(); Edge re = new Edge(e.a+p-k+1,e.b,e.end); r.add(re); Edge se = new Edge(e.a,e.a+p-k,r); s.add(se); return r; } private Node canonize(Node s, int[] k, int p) { if (p < k[0]) return s; if (s == sentinel) { s = root; k[0]++; if (p < k[0]) return s; } Edge e = s.get(str.charAt(k[0])); while (e.b - e.a <= p - k[0]) { k[0] = k[0] + e.b - e.a + 1; s = e.end; if (k[0] <= p) e = s.get(str.charAt(k[0])); } return s; } public static void main(String[] args) {
// The Function receiveData return data in Linked Hash Map object ihm. Please make function for the same.
LinkedHashMap<String, ArrayList> lhm=receiveData();
ArrayList<String> arrlis=lhm.get("poster"); StringBuffer strbuf=new StringBuffer(); for (int i = 0; i < arrlis.size(); i++) { strbuf.append(cars.get(i)); } System.out.pritln(new SuffixTree(strbuf.toString())); } }