Question

In: Computer Science

Purpose This project is meant to give you experience with sorting, binary searching, and Big-Oh complexity....

Purpose

This project is meant to give you experience with sorting, binary searching, and Big-Oh complexity.

Objective "Write in Java please"

Your goal is to take a book, as a large text file, and build a digital “concordance”. A concordance is like an index for a book in that it contains entries for words in the book, but it also includes passages from the book containing that word. For example, a query for the word Dormouse in Alice in Wonderland would produce results like:

14153

he poured a little hot tea upon its nose. The Dormouse shook its head impatiently, and said

14803

`Once upon a time there were three little sisters,' the Dormouse began in a great hurry; `and their names were Elsie,

where 14153 is the position of the word Dormouse in the file, and the passage includes the 10 words before and after Dormouse.

Approach

You should approach the project in four steps:

            1. Open the book as a text file, identify each word, and save its position

            2. Sort the words in the text file alphabetically using a merge sort

            3. Remove duplicate words, saving only their indices

            4. Allow the user to enter words, and return the passages where the words occur.

STEP 1: READ THE TEXT FILE AND NUMBER ALL THE WORDS

Consider a text file fox.txt containing:

the quick brown fox jumps over the lazy dog

Generate a new file fox.txt_words.txt that reads as follows:

9

the 0

quick 1

brown 2

fox 3

jumps 4

over 5

the 6

lazy 7

dog 8

where the first number, 9, is the total number of words, and what follows is each word and its placement in the file.

Suggested approach

1. Read the text file and count the words but don’t save them.

2. Make an array to hold the words.txt

3. Make a new FileReader and Scanner object, read the words from the text file into that array

4. Write the number of words to a new text file, then step through the array and write each word and its index to that file.

STEP 2: SORT THE WORDS ALPHABETICALLY

Starting with the text file fox.txt_words.txt, make a new text file fox.txt_sorted.txt . That file should contain the same words as the previous file, but this time the words are in alphabetical order. This is what the file should look like:

9

brown 2

dog 8

fox 3

jumps 4

lazy 7

over 5

quick 1

the 0

the 6

You may try this initially with a bubble sort. However, to get full credit for this step, you must use a merge sort to sort the words.

Approach:

1. Make a filereader/scanner for words.txt, read the number of words, and make two arrays: a string array to hold the words, and an int array to hold the indices.

2. Use the merge sort algorithm we learned in class to sort the string array. Make sure when you adjust the string array that you also adjust the int array, preserving the index of each word

3. Make a printwriter and write the words and indices to a new file fox.txt_sorted.txt.

STEP 3: REMOVE DUPLICATE WORDS

Now use fox.txt_sorted.txt to make a fourth file fox.txt_index.txt that reads like this:

brown 2

dog 8

fox 3

jumps 4

lazy 7

over 5

quick 1

the 0 6

Where 8 refers to the number of unique words, and each word is followed by one or more indices.

Approach:

1. Make a filereader and printwriter

2. Repeat the following for each word:

            3. Read a word and an index

            4. Check if that word is the same as a the previous word read from the input file

            5. If no, write that word and its index to the output file

            6. If yes, just write the index to the output file

STEP 4: QUERY AND SEARCH

Your program should now prompt the user for a word, find the word, and return its indices. For example:

Enter a word: the

the: 0 6

Enter a word: lazy

lazy: 7

Enter a word: dormouse

Can’t find dormouse in the text

You must use a binary search, like we did in class, to locate the word.

Your program should continue to query and search until the user closes the program.

Approach:

1. Make an filereader/scanner for fox.txt_index.txt

2. Read in each word and its indices. You may want to use the nextLine function to read all the indices as a single string, in which case you’ll have two arrays: string words[] and string indices[].

3. Prompt the user for a word.

4. Use a binary search to find the matching word.

5. Print out its indices if it exists or an error if it doesn’t

6. Repeat steps 3 and 4 forever

STEP 5: PRINT OUT THE SURROUNDING WORDS

Modify your code in step 4 so that, for each index, it prints out text from the original file. You should print out the previous two words, the query word, and the subsequent two words. For example:

Enter a word: the

0: the quick brown

6: jumps over the lazy dog

Enter a word: lazy

7: over the lazy dog

Enter a word: dormouse

Can’t find dormouse in the text

Approach:

Modify your code from Step 4:

1. After reading from fox.txt_index.txt, make a second array string originaltext[] and read in the original fox.txt word-by-word.

2. After querying the user and searching for the word, get its index entry and:

3. Make an istringstream object to parse the index string

4. Read a number index from that string: an index where that word occurs

5. Make a for loop, starting at i=index-2 and going to i<=index+2

6. Print originaltext[i] followed by a space

7. Repeat steps 4 – 7 until you have printed all the indices.

STEP 6: TIMING

1.

Go to Project Gutenberg and download three books: a short book, a medium book, and a long book. For example, Alice in Wonderland’s page is

https://www.gutenberg.org/ebooks/11

and you should download the Plain Text as alice.txt.

I recommend Alice in Wonderland or the Wizard of Oz as a short book, Great Expectations as a medium book, and War and Peace or the King James Bible as a long book.

2.

Use your STEP 1 code and record the total number of words in each book.

3.

Use System.currentTimeMillis to measure the elapsed time for STEP 1 on each book. Depending on the speed of your computer, you may or may not get non-zero values. If you do, do you see a O(n) runtime? (Use the approach below).

4.

Now measure the merge sort times in STEP 2. Unless your computer is really fast, you ought to see a measurable delay. Run the program a few times on each book, throw out any outliers, and average the runtimes for each book.

Since merge-sort is O(n log n), the runtime should be

            runtime = k * #words * log(#words)

Try computing k for each book:

            k = runtime / ( #words * log(#words))   the log base shouldn’t matter.

Are all three k’s approximately the same?

5.

If you have a really slow computer, you might be able to time the queries and see if they have a O(log n) time. Don’t bother doing this if they all have zero time.

Grading

Grading for this project will be as follows:

You complete STEP 1 correctly                                             50% (F+)

You complete STEP 2, but use a sort that isn’t merge sort    60% (D-)

You complete STEP 2 using merge sort                                 70% (C-)

You complete STEP 3                                                            75% (C)

You complete STEP 4                                                            85% (B)

You complete STEP 5                                                            95% (A)

You measure the timings in STEP 6                                       100% (A+)

No credit will be given for code that doesn’t compile. If you only partially complete a step, comment your work on the step out before submitting.

Basic Requirements

  • Your program must be written in Java
  • You may not use built in libraries for sorting or searching
  • Your program must compile without errors
  • You must submit your .java files to Blackboard
  • You must include a statement telling which steps you completed
  • You are expected to demonstrate your completed project to me
  • Your program must be your own work, must follow the approach listed above, and must produce the _words.txt, _sorted.txt, _index.txt text files.

Solutions

Expert Solution

Working code implemented in Java and appropriate comments provided for better understanding:

Source code for Main.java:

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Scanner;

public class Main {
   public static void main(String[] args) {
       Scanner in = new Scanner(System.in);
       boolean fileFound = false;
       String fileName = null;
      
       // prompt user for filename, will keep asking until user gives valid file name
       while (fileFound == false) {
           System.out.println("What is the name of the file?");
           fileName = in.nextLine();
           if ( new File(fileName).exists()) {
               fileFound = true;
           } else {
               System.out.println("File does not exist.");
           }
       }
      
       // open file now, also makes the word file
       int wordCount = makeWordsFile(fileName);

       // this sorts and created the sorted file
       alphaSort(fileName);

       // this removes the duplicates
       int numUniqueWords = removeDuplicates(fileName);

       // this does the search and print entries.
       queryAndSearch(numUniqueWords, fileName, wordCount);
       in.close();
   }

   public static int makeWordsFile(String fileName) {
       int wordCount = 0; // tracks word count
       // try catch in case file does not exist
       try {
           FileReader fReader = new FileReader(fileName); // open reader
           Scanner in = new Scanner(fReader);
           long time = System.currentTimeMillis(); // start time
           while (in.hasNext() == true) {
               // only counts word if it is not all punctuation
               if (in.next().replaceAll("\\p{Punct}", "").equals("") == false) {
                   wordCount++;
               }
           }
           in.close();
           time = System.currentTimeMillis() - time; // record elapsed time
           System.out.println("The elapsed time to count words is " + time + " ms.");

           System.out.println("The number of words is " + wordCount);
           try {
               fReader.close();
           } catch (IOException e) {
               e.printStackTrace();
           }
       } catch (FileNotFoundException ex) {
           ex.printStackTrace();
       }

       PrintWriter p;
      
       // keeps track of current word location
       int currentNum = 0;
       try {
           // create word file
           p = new PrintWriter(fileName.concat("_words.txt"));
           FileReader fReader = new FileReader(fileName);
           Scanner in = new Scanner(fReader);

           // puts # of words in the file
           p.println(wordCount);
           // continues if their is another token available
           while (in.hasNext() == true) {
             
               String current = in.next();
               // print word to file only if the word is not all punctuation
               if (!current.replaceAll("\\p{Punct}", "").toLowerCase().equals("")) {
                   p.println(current.replaceAll("\\p{Punct}", "").toLowerCase() + " " + currentNum);
                   currentNum++;
               }
           }

           p.close(); // close printer and scanner
           in.close();
       } catch (FileNotFoundException e) {
           // you don't want to do make this empty because if a error happens here you won't know what happened
           e.printStackTrace();
       }
       return currentNum;
   }

   // creates file and sorts words
   public static String alphaSort(String fileName) {
       int numWords = 0;
       String[] words = null;
      
       try {
           // opens word file
           Scanner fileScan = new Scanner(new FileReader(fileName.concat("_words.txt")));
           // receives the number of words, moves line cursor, parse string to int
           numWords = Integer.parseInt(fileScan.nextLine());
           words = new String[numWords];
          
           for (int i = 0; i < numWords; i++) { // collect entries
               words[i] = fileScan.nextLine();
           }
           fileScan.close();
       } catch (FileNotFoundException e) {
           e.printStackTrace();
       }

       long time = System.currentTimeMillis(); // start time
       words = mergeSort(words); // sort entries
       time = System.currentTimeMillis() - time; // print elapsed time
       System.out.println("Elapsed time to organize words alphabetically is " + time + " ms");

       // write entries to file
       try {
           PrintWriter pr2 = new PrintWriter(fileName.concat("_sorted.txt"));
           // print # of words
           pr2.println(words.length);
           // print word + index to line
           for (int i = 0; i < numWords; i++) {
               pr2.println(words[i]);
           }

           pr2.close();
       } catch (FileNotFoundException e) {
           e.printStackTrace();
       }

       return fileName.concat("_sorted.txt");
   }

   // merge sort
   public static String[] mergeSort(String[] strs) {
       // base case if array is size 1
       if (strs.length <=1)
           return strs;
      
       // splits array into 2 halves
       String[] left = new String[strs.length/2];
       String[] right = new String[strs.length - strs.length/2];
      
       // make two list and copy left and right half
       for (int i=0; i<strs.length/2; i++) {
           left[i] = strs[i];
       }
       for (int i=0; i< strs.length - strs.length/2; i++) {
           right[i] = strs[i + strs.length/2];
       }
      
       left = mergeSort(left); // assume merge sort gives assorted list
       right = mergeSort(right);
       // both left and right are sorted
      
       strs = merge(left, right);
      
       return strs;
   }
  
   public static String[] merge(String[] left, String[] right) {
       // make a place to hold the merged list
       String[] sorted = new String[left.length + right.length];
       int l = 0; // left index
       int r = 0; // right index
       int s = 0; // index in output list
      
       // keep going until one runs out
      
       while (l<left.length && r<right.length) {
           if (left[l].compareToIgnoreCase(right[r]) <= 0) { // if low element in left list is smaller
               sorted[s] = left[l]; // copy it down
               l++;
               s++;
           } else {
               sorted[s] = right[r]; // copy it down
               r++;
               s++;
           }
       }
       // empty out any remaining in left
       while (l<left.length) {
           sorted[s] = left[l];
           s++;
           l++;
       }
       // empty out any remaining in right
       while (r<right.length) {
           sorted[s] = right[r];
           s++;
           r++;
       }
       return sorted;
   }

   public static int removeDuplicates(String fileName) {
       int numUniqueWords = 0;
       // open file
       try {
           Scanner in = new Scanner(new FileReader(fileName.concat("_sorted.txt")));
           PrintWriter pr = new PrintWriter(fileName.concat("_index.txt"));
           // keeps track of last word
           String lastWord = "";

           int numWords = in.nextInt();
           // receive 1st string

           for (int i = 0; i < numWords; i++) {
               String word = in.next();
               int index = in.nextInt();

               // if this word is not the last word then make newline
               if (!lastWord.equals(word) && i > 0) {
                   pr.println();  
               }

               // if this word is the last word, then just print index
               if (lastWord.equals(word)) {
                   pr.print(" " + index);
               } else {
                   // print word, does not make new line mark
                   lastWord = word;
                   pr.print(word + " " + index);
                   // keeps track of unique words, increase unique words by one
                   numUniqueWords++;
               }
           }
           pr.close();
           in.close();
       } catch (FileNotFoundException e) {
           e.printStackTrace();
       }

       return numUniqueWords;
   }

   public static void queryAndSearch(int numUniqueWords, String fileName, int numWords) {

       // receive entries
       String[] words = new String[numUniqueWords];
       String[] indexs = new String[numUniqueWords];
       // open file, and go through each line and store word and index into two separate arrays
       try {
           String storeLine;
           String[] brokenLine; // breaks line into array , holds word and index
           Scanner scan = new Scanner( new FileReader(fileName.concat("_index.txt")));
           for (int i = 0; i < numUniqueWords; i++) {
               storeLine = scan.nextLine();
               brokenLine = storeLine.split(" ", 2);
               // stores word and index into appropriate position in array
               words[i] = brokenLine[0];
               indexs[i] = brokenLine[1];
           }
           scan.close();
       } catch (FileNotFoundException e) {
           e.printStackTrace();
       }

       Scanner in = new Scanner(System.in);

       boolean status = true;
       // keep asking forever until program is close
       while (status == true) {
           System.out.println("Enter the word. ");
           // does binary search, then receive result
           int result = binarySearch(words, in.nextLine());
           if (result == -1) {
               // if search is not in book then this happens
               System.out.println("Sorry this word is not in the concordance.");
           } else {
               // if search is in array then print results
               System.out.print(words[result] + ": ");
               // formats then results
               organizePrintLn(indexs[result]);
               // print all entries
               printEntries(fileName, indexs[result], numWords);
           }
       }
       in.close();
   }
  
   public static void organizePrintLn(String str) {
       String[] strings = str.split(" ");
       int currentLetters = 0;
       for (String s : strings) {
           currentLetters += s.length() + 1;
           // if current line has printed words is about 100 length then it will make next line
           // if not it will print on same line
           if (currentLetters < 100) {
               System.out.print(s + " ");
           } else {
               System.out.println(s + " ");
               currentLetters = 0;
           }
       }
       System.out.println();
   }

   public static int binarySearch(String[] array, String searchInput) {
       int leftBound = 0; // left index window
       int rightBound = array.length; // right index window

       while (leftBound <= rightBound) {
           int midPoint = leftBound + (rightBound - leftBound)/2; // midpoint calculate

           if (array[midPoint].compareToIgnoreCase(searchInput) == 0) {
               return midPoint;
               // if is alphabetically lower
           } else if (array[midPoint].compareToIgnoreCase(searchInput) < 0) {
               leftBound = midPoint +1;
               // if is alphabetically higher
           } else {
               rightBound = midPoint - 1;
           }  
       }
       // if not found
       return -1;
   }

   public static void printEntries(String fileName, String indexs, int numWords) {
       String[] originalText = new String[numWords];
       try {
           FileReader fr = new FileReader(fileName);
           Scanner in = new Scanner(fr);
           // will put word into array only if it is not all punctuation
           // will add all valid words to array
           for (int i = 0; i < numWords; i++) {
               boolean check = true;
               String currentWord;
               // will check until it finds valid word
               while (check == true) {
                   currentWord = in.next().replaceAll("\\p{Punct}", "");
                   if (!currentWord.equals("")) { // checks if current word is not all punctuation, if so add to array
                       originalText[i] = currentWord;
                       check = false;
                   }
               }
           }
           in.close();

       } catch (FileNotFoundException e) {
           e.printStackTrace();
       }

       // takes string indexs
       String[] stringIndex = indexs.split(" ");
       int numIndexs = stringIndex.length;
       int[] indexInt = new int[numIndexs];

       // turn index string to numbers
       for (int i = 0 ; i < numIndexs; i++) {
           indexInt[i] = Integer.parseInt(stringIndex[i]);
       }

       // for all indexs in the list
       for (int index : indexInt ) {
           int leftBound = index; // leftindex
           int rightBound = index; // rightindex

           // checks if left bound is too close to beginning, if so the leftbound is beginning of file
           if ((leftBound - 10) < 0) {
               leftBound = 0;
           } else {
               leftBound -= 10;
           }

           // checks is right bound is too close to end of text file. if so the rightbound is end of file
           if ((rightBound + 11) > originalText.length) {
               rightBound = originalText.length;
           } else {
               rightBound += 11;
           }


           System.out.print(index + ": ");

           // prints word from the leftbound to the rightbound
           for (int i = leftBound; i < rightBound; i++) {
               System.out.print(" " + originalText[i]);
           }

           System.out.println();
       }
   }
}

Sample Output Screenshots:

Hope it helps, if you like the answer give it a thumbs up. Thank you.


Related Solutions

Give the asymptotic (“big-Oh”) running time complexity of the following algorithm, show all the work you...
Give the asymptotic (“big-Oh”) running time complexity of the following algorithm, show all the work you have done. Algorithm: ArrayMangle(A[ ], int n) Input: an array A, an integer n x = 0; for (i=0; i<=n-1; i++) { for (j=i; j<=n-1; j++) { x = x + A[j]; } for (k=0; k<= n-1; k++) { for (j=0; j< =n-1; j++) { x = x + A[j]*A[k]; } } }
Project 3 Details The purpose of this project is to give you experience with creating and...
Project 3 Details The purpose of this project is to give you experience with creating and using custom objects. In it, you will make a couple custom classes that work in tandem with each other, and call them in your main function. For this project we'll be making something kind of like the Chatbot java file we made in class. You will create a Chatbot class that will contain a number of variables and functions. As you can imagine, a...
CVP Modeling project The purpose of this project is to give you experience creating a profitability...
CVP Modeling project The purpose of this project is to give you experience creating a profitability analysis that can be used to determine the effects of changing business conditions on a client's financial position. Managers often want to know the production level where profits earned from a product cover the cost of resources used to create it. Break-even analysis is how we determine this level. The point at which total sales revenues covers the costs of committed resources is called...
Question 2: Calculate the time complexity function T(n) and express it in terms of big-Oh for...
Question 2: Calculate the time complexity function T(n) and express it in terms of big-Oh for the following code: Part a       (hint: make use of sum of geometric progression): for (int i=1; i <= n ; i = i*2) {           for ( j = 1 ; j <= i ; j ++)             {                       cout<<”*”;             } } Part b      (hint: make use of sum of square of a number sequence): for (int i=1; i <= n ; i...
The purpose of this exercise is to give you experience in writing condition-controlled loops. Create a...
The purpose of this exercise is to give you experience in writing condition-controlled loops. Create a code in Python that will print monthly loan amortization schedule given the loan amount (P), APR i.e. Annual Percentage Rate (r=APR in %/12), loan terms in number of months (n). Please use these formulas: Monthly Payment (A) = P * ((r * (1 + r)**n)/(((1+r)**n)-1))            Monthly Interest Pmt (mIP)= r * Total Remaining Balance Monthly Principal Pmt (mPP) = A –...
Sorting with Binary Search Tree This assignment asks you to sort the lines of an input...
Sorting with Binary Search Tree This assignment asks you to sort the lines of an input file (or from standard input) and print the sorted lines to an output file (or standard output). Your program, called bstsort (binary search tree sort), will take the following command line arguments: % bstsort [-c] [-o output_file_name] [input_file_name] If -c is present, the program needs to compare the strings case sensitive; otherwise, it's case insensitive. If the output_file_name is given with the -o option,...
Write a program to compare those two searching algorithms and also compare two sorting algorithms. You...
Write a program to compare those two searching algorithms and also compare two sorting algorithms. You need to modify those codes in the book/slides to have some counters to count the number of comparisons and number of swaps. In the main function, you should have an ordered array of 120 integers in order to test searching algorithms, and the other two identical arrays of 120integers not in order to test sorting algorithms. Display all counters in the main functions. Counter...
For each of the following six program fragments give an analysis of the running time (Big-Oh...
For each of the following six program fragments give an analysis of the running time (Big-Oh will do). (25,very 5 mark) (a)sum = 0; for( i=0; i<n; i++ ) for( j=0; j<n; j++ ) sum++; (b) sum = 0; for( i=0; i<n; i++ ) for( j=0; j<n*n; j++ ) sum++; (c) sum = 0; for( i=0; i<n; i++ ) for( j=0; j<i; j++ ) sum++; (d) sum =0; for( i=1; i<n; i++ ) for( j=1; j<i*i; j++ ) if( j%1...
Purpose: This lab will give you experience modifying an existing ADT. Lab Main Task 1: Modify...
Purpose: This lab will give you experience modifying an existing ADT. Lab Main Task 1: Modify the ListInterface Java interface source code given below. Change the name of ListInterface to ComparableListInterface, and have ComparableListInterface inherit from the built-in Java interface Comparable. Note that Comparable has a method compareTo(). compareTo() must be used in programming logic you will write in a method called isInAscendingOrder() (more about isInAscendingOrder() is mentioned further down in the lab description). You can find a brief summary...
Purpose This assignment should give you experience in using file descriptors, open(), close(), write(), stat() and...
Purpose This assignment should give you experience in using file descriptors, open(), close(), write(), stat() and chmod(), perror(), and command line arguments. Program Write a C++ program that will allow you to add messages to a file that has NO permissions for any user. A Unix system has many files that have sensitive information in them. Permissions help keep these files secure. Some files can be publicly read, but can not be altered by a regular user (ex.: /etc/passwd). Other...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT