Question

In: Computer Science

Java program In this assignment you are required to create a text parser in Java/C++. Given...

Java program

In this assignment you are required to create a text parser in Java/C++. Given a input text file you need to parse it and answer a set of frequency related questions.

Technical Requirement of Solution:

You are required to do this ab initio (bare-bones from scratch). This means, your solution cannot use any library methods in Java except the ones listed below (or equivalent library functions in C++).

  • String.split() and other String operations can be used wherever required.
  • You can use any Regular Expression related facilities (java.util.regex) to match target words and phrases.
  • You are also allowed to use different variants of array and list based built-in data structures such as Array, List, ArrayList, Vector.
  • Standard file IO facilities for reading/writing, such as BufferedReader.

Create as many files, intermediate array-based data structures as you wish, and allocate as much heap memory from JVM as you need. BUT you are allowed to read the input file EXACTLY ONCE to answer ALL the questions. You however can use any internal array based representation of the whole file to do multiple rounds of processing if needed after having read it EXACTLY ONCE.

Suggested programming language Java. However considering this is the very first assignment, you can use C/C++ provided similar constructs and rules are followed and no library functions that leverage hash and maps are used.

For the following questions, list all matching output if there are ties

  1. List the most frequent word(s) in the whole file and its frequency.
  2. List the 3rd most frequent word(s) in the whole file and its frequency.
  3. List the word(s) with the highest frequency in a sentence across all sentences in the whole file, also print its frequency and the corresponding sentence.
  4. List sentence(s) with the maximum no. of occurrences of the word "the" in the entire file and also list the corresponding frequency.
  5. List sentence(s) with the maximum no. of occurrences of the word "of" in the entire file and also list the corresponding frequency.
  6. List sentence(s) with the maximum no. of occurrences of the word "was" in the entire file and also list the corresponding frequency.
  7. List sentence(s) with the maximum no. of occurrences of the phrase "but the" in the entire file and also list the corresponding frequency.
  8. List sentence(s) with the maximum no. of occurrences of the phrase "it was" in the entire file and also list the corresponding frequency.
  9. List sentence(s) with the maximum no. of occurrences of the phrase "in my" in the entire file and also list the corresponding frequency.

Implementation Detail:

Inputs

The program has two arguments:

  • The first argument: path to the input text file.
  • The second argument: name prefix for the output files

For example:

$ java HW1  "./input.txt" "output"

input file: A text document. Assume each newline (\n) defines a paragraph. Each period (.) defines end of an sentence. Or if a sentence is the last in a paragraph and doesn’t have an explicit period (.), its end marker is the same as a newline. Each space within a sentence (character ‘32’) define the word delimiter. The assignment is case insensitive so you must transform and work in lower case.

Outputs: Click here to download sample input and answer files. As grading is automated, your output must conform to the following specifications. For each of the 9 questions you must create one single output file. So your program should produce 9 output files each time you run it. If a question has multiple output (multiple sentences/words/…) you should print each sentence in a new line. Do not print them on the same line! The order of the sentences/words/phrases is not important. However, the order of the output file name must be matching the order of the questions. For example, given prefix “output”, the output file of the first question should be output1.txt and for the second question it is output2.txt. The output format depends on the question type and it must be:

  • For question 1 and 2:
    word:frequency e.g.
    the:10

  • For question 3:
    word:frequency:sentence e.g.
    the:9:you see watson he explained in the early hours of the morning...

  • For question 4-9:
    word:frequency:sentence e.g.
    was:2:then it was withdrawn as suddenly as it appeared...
    was:2:the 4 a week was a lure which must draw him and...

Other details

  • For Word related questions, Words are defined as whole word separated by a space. So “there” does not count towards the frequence of “the”. Whereas for phrases you are required to consider substrings as well, for example “within my” will also count towards the frequence of “in my”
  • When printing the sentences, don’t print out the period (.) at the end
  • When there is a tie, print all words/sentences with the maximum frequency

Solutions

Expert Solution

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
import java.util.Set;

public class FileSentenceStats {
  
public static void main(String[]args)
{
Scanner sc = new Scanner(System.in);
Scanner fileReader;
ArrayList<String> sentences = new ArrayList<>();
  
System.out.print("Enter the input file name: ");
String inputFileName = sc.nextLine().trim();
System.out.print("Enter the output file name (without file extension): ");
String outputFileName = sc.nextLine().trim() + ".txt";
  
// read input file
ArrayList<String> temp = new ArrayList<>();
try
{
fileReader = new Scanner(new File(inputFileName));
while(fileReader.hasNextLine())
{
temp.add(fileReader.nextLine().trim());
}
fileReader.close();
}catch(FileNotFoundException fnfe){
System.out.println("Could not locate " + inputFileName);
System.exit(0);
}
  
for(int i = 0; i < temp.size(); i++)
{
String[] info = temp.get(i).split("(?<=[.!?])\\s*");
for(String sens : info)
{
sentences.add(sens);
}
}
  
// write the results to the file
FileWriter fw;
PrintWriter pw;
try {
fw = new FileWriter(new File(outputFileName), true); // open the file in append mode
pw = new PrintWriter(fw);
  
pw.write("Word with the highest frequency: " + findWordWithHighestFrequency(sentences)
+ System.lineSeparator());
  
pw.write("Word with the 3rd highest frequency: " + findWordWithThirdHighestFrequency(sentences)
+ System.lineSeparator());
  
ArrayList<String> highest = findWordWithHighestFrequencyPerSentence(sentences);
pw.write(System.lineSeparator() + "Highest words per sentences: " + System.lineSeparator());
for(String s : highest)
{
pw.write(s + System.lineSeparator());
}
  
pw.write(System.lineSeparator() + "Sentence(s) with maximum occurrence of the word \"the\": " + System.lineSeparator());
ArrayList<String> maxThe = findMaxOccurringWordInASentence(sentences, "the");
for(String s : maxThe)
{
pw.write("*-* " + s + System.lineSeparator());
}
  
pw.write(System.lineSeparator() + "Sentence(s) with maximum occurrence of the word \"of\": " + System.lineSeparator());
ArrayList<String> maxOf = findMaxOccurringWordInASentence(sentences, "of");
for(String s : maxOf)
{
pw.write("*-* " + s + System.lineSeparator());
}
  
pw.write(System.lineSeparator() + "Sentence(s) with maximum occurrence of the word \"was\": " + System.lineSeparator());
ArrayList<String> maxWas = findMaxOccurringWordInASentence(sentences, "was");
for(String s : maxWas)
{
pw.write("*-* " + s + System.lineSeparator());
}
  
pw.write(System.lineSeparator() + "Sentence(s) with maximum occurrence of the word \"but the\": " + System.lineSeparator());
ArrayList<String> maxButThe = findMaxOccurringWordInASentence(sentences, "but the");
for(String s : maxButThe)
{
pw.write("*-* " + s + System.lineSeparator());
}
  
pw.write(System.lineSeparator() + "Sentence(s) with maximum occurrence of the word \"it was\": " + System.lineSeparator());
ArrayList<String> maxItWas = findMaxOccurringWordInASentence(sentences, "it was");
for(String s : maxItWas)
{
pw.write("*-* " + s + System.lineSeparator());
}
  
pw.write(System.lineSeparator() + "Sentence(s) with maximum occurrence of the word \"in my\": " + System.lineSeparator());
ArrayList<String> maxInMy = findMaxOccurringWordInASentence(sentences, "in my");
for(String s : maxInMy)
{
pw.write("*-* " + s + System.lineSeparator());
}
  
pw.flush();
fw.close();
pw.close();
  
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
  
private static String findWordWithHighestFrequency(ArrayList<String> sentences)
{
HashMap<String, Integer> hashmap = new HashMap<>();
  
// iterate over the array of words
ArrayList<String> words = new ArrayList<>();
for(String sen : sentences)
{
String[] data = sen.split(" ");
for(int j = 0; j < data.length; j++)
{
words.add(data[j]);
}
}
  
for(int i = 0; i < words.size(); i++)
{
if(hashmap.containsKey(words.get(i)))
hashmap.put(words.get(i), hashmap.get(words.get(i)) + 1);
else
hashmap.put(words.get(i), 1);
}
  
Set<Map.Entry<String, Integer> > set = hashmap.entrySet();
String key = "";
int freq = 0;
  
for (Map.Entry<String, Integer> me : set)
{
if(me.getValue() > freq)
{
freq = me.getValue();
key = me.getKey();
}
}
  
String res = key + ": " + freq;
return res;
}
  
private static String findWordWithThirdHighestFrequency(ArrayList<String> sentences)
{
HashMap<String, Integer> hashmap = new HashMap<>();
  
// iterate over the array of words
ArrayList<String> words = new ArrayList<>();
for(String sen : sentences)
{
String[] data = sen.split(" ");
for(int j = 0; j < data.length; j++)
{
words.add(data[j]);
}
}
  
for(int i = 0; i < words.size(); i++)
{
if(hashmap.containsKey(words.get(i)))
hashmap.put(words.get(i), hashmap.get(words.get(i)) + 1);
else
hashmap.put(words.get(i), 1);
}
  
// 1st highest
Set<Map.Entry<String, Integer> > set = hashmap.entrySet();
String key1 = "";
int freq1 = 0;
  
for (Map.Entry<String, Integer> me : set)
{
if(me.getValue() > freq1)
{
freq1 = me.getValue();
key1 = me.getKey();
}
}
hashmap.remove(key1);
  
// 2nd highest
set = hashmap.entrySet();
String key2 = key1;
int freq2 = 0;
  
for (Map.Entry<String, Integer> me : set)
{
if(me.getValue() > freq2)
{
freq2 = me.getValue();
key2 = me.getKey();
}
}
hashmap.remove(key2);
  
// 3rd highest
set = hashmap.entrySet();
String key3 = key2;
int freq3 = 0;
  
for (Map.Entry<String, Integer> me : set)
{
if(me.getValue() > freq3)
{
freq3 = me.getValue();
key3 = me.getKey();
}
}
String result = key3 + ": " + freq3;
return result;
}
  
private static ArrayList<String> findWordWithHighestFrequencyPerSentence(ArrayList<String> sentences)
{
HashMap<String, Integer> hashmap = new HashMap<>();
ArrayList<String> highestWords = new ArrayList<>();
  
// iterate over all the sentences
for(String sen : sentences)
{
String[] data = sen.toLowerCase().split(" ");
for(int i = 0; i < data.length; i++)
{
if(hashmap.containsKey(data[i]))
hashmap.put(data[i], hashmap.get(data[i]) + 1);
else
hashmap.put(data[i], 1);
}
  
Set<Map.Entry<String, Integer> > set = hashmap.entrySet();
String key = "";
int freq = 0;
  
for (Map.Entry<String, Integer> me : set)
{
if(me.getValue() > freq)
{
freq = me.getValue();
key = me.getKey();
}
}
  
highestWords.add(key + ": " + freq + ": " + sen);
set.clear();
}
  
return highestWords;
}
  
private static ArrayList<String> findMaxOccurringWordInASentence(ArrayList<String> sentences, String word)
{
ArrayList<String> sens = new ArrayList<>();
ArrayList<Integer> freq = new ArrayList<>();
ArrayList<String> res = new ArrayList<>();
  
for(String sen : sentences)
{
String[] wordsPerSentence = sen.split(" ");
int count = 0;
for(int i = 0; i < wordsPerSentence.length; i++)
{
if(wordsPerSentence[i].equalsIgnoreCase(word))
{
count++;
}
}
if(count > 0)
{
sens.add(sen);
freq.add(count);
}
}
  
String result;
if(freq.size() > 0)
{
int max = freq.get(0);
int index = 0;
String sentence = "";
for(int i = 0; i < freq.size(); i++)
{
if(freq.get(i) > max)
{
max = freq.get(i);
index = i;
}
}
sentence = sens.get(index);
result = word + ": " + max + ": " + sentence;
res.add(result);
}
else
{
result = "No result";
res.add(result);
}
  
return res;
}
}

******************************************************************** SCREENSHOT *******************************************************

run: Enter the input file name: input.txt Word with the highest frequency: the: 27 Word with the 3rd highest frequency: of: 22 Highest words per sentences: the: 5: The play begins with the brief appearance of a trio of witches and then moves to a military camp, where the Scottish King Duncan hears the news that his generals, Macbeth and their: 1: Following their pitched battle with these enemy forces, Macbeth and Banquo encounter the witches as they cross a moor. of: 3: The Witches prophesy that Macbeth will be made thane (a rank of Scottish nobility) of Cawdor and eventually King of Scotland. will: 2: They also prophesy that Macbeth's companion, Banquo, will beget a line of Scottish kings, although Banquo will never be king himself. and: 3: The witches vanish, and Macbeth and Banquo treat their prophecies skeptically until some of King Duncan's men come to thank the two generals for their victories in battle and the: 2: The previous thane betrayed Scotland by fighting for the Norwegians and Duncan has condemned him to death. the: 3: Macbeth is intrigued by the possibility that the remainder of the witches' prophecy—that he will be crowned king-might be true, but he is uncertain what to expect. macbeth's: 1: He visits with King Duncan, and they plan to dine together at Inverness, Macbeth's castle, that night. telling: 1: Macbeth writes ahead to his wife, Lady Macbeth, telling her all that has happened. none: 1: Lady Macbeth suffers none of her husband's uncertainty. him: 2: She desires the kingship for him and wants him to murder Duncan in order to obtain it. arrives: 1: When Macbeth arrives at Inverness, she overrides all of her husband's objections and persuades him to kill the king that very night. will: 4: He and Lady Macbeth plan to get Duncan's two chamberlains drunk so they will black out; the next morning they will blame the murder on the chamberlains, who will be defensel a: 3: While Duncan is asleep, Macbeth stabs him, despite his doubts and a number of supernatural portents, including a vision of a bloody dagger. the: 3: When Duncan's death is discovered the next morning, Macbeth kills the chamberlains-ostensibly out of rage at their crime-and easily assumes the kingship. and: 2: Duncan's sons Malcolm and Donalbain flee to England and Ireland, respectively, fearing that whoever killed Duncan desires their demise as well. of: 2: Fearful of the witches' prophecy that Banquo's heirs will seize the throne, Macbeth hires a group of murderers to kill Banquo and his son Fleance! they: 2: They ambush Banquo on his way to a royal feast, but they fail to kill Fleance, who escapes into the night. as: 2: Macbeth becomes furious: as long as Fleance is alive, he fears that his power remains insecure. banquo's: 1: At the feast that night, Banquo's ghost visits Macbeth. the: 2: When he sees the ghost, Macbeth raves fearfully, startling his guests, who include most of the great Scottish nobility. but: 1: Lady Macbeth tries to neutralize the damage, but Macbeth's kingship incites increasing resistance from his nobles and subjects. their: 1: Frightened, Macbeth goes to visit the witches in their cavern. of: 4: There, they show him a sequence of demons and spirits who present him with further prophecies: he must beware of Macduff, a Scottish nobleman who opposed Macbeth's accession t that: 2: Macbeth is relieved and feels secure, because he knows that all men are born of women and that forests cannot move. that: 3: When he learns that Macduff has fled to England to join Malcolm, Macbeth orders that Macduff's castle be seized and, most cruelly, that Lady Macduff and her children be murd

arrives: 1: wnen Macpetn arrives at Inverness, sne overrides all of ner nuspana' s objections ana persuades nim to kill tne king tnat very nignt. will: 4: He and Lady Macbeth plan to get Duncan's two chamberlains drunk so they will black out; the next morning they will blame the murder on the chamberlains, who will be defensel a: 3: While Duncan is asleep, Macbeth stabs him, despite his doubts and a number of supernatural portents, including a vision of a bloody dagger. the: 3: When Duncan's death is discovered the next morning, Macbeth kills the chamberlains-ostensibly out of rage at their crime-and easily assumes the kingship. and: 2: Duncan's sons Malcolm and Donalbain flee to England and Ireland, respectively, fearing that whoever killed Duncan desires their demise as well. of: 2: Fearful of the witches' prophecy that Banquo's heirs will seize the throne, Macbeth hires a group of murderers to kill Banquo and his son Fleance! they: 2: They ambush Banquo on his way to a royal feast, but they fail to kill Fleance, who escapes into the night. as: 2: Macbeth becomes furious: as long as Fleance is alive, he fears that his power remains insecure. banquo's: 1: At the feast that night, Banquo's ghost visits Macbeth. the: 2: When he sees the ghost, Macbeth raves fearfully, startling his guests, who include most of the great Scottish nobility. but: 1: Lady Macbeth tries to neutralize the damage, but Macbeth's kingship incites increasing resistance from his nobles and subjects. their: 1: Frightened, Macbeth goes to visit the witches in their cavern. of: 4: There, they show him a sequence of demons and spirits who present him with further prophecies: he must beware of Macduff, a Scottish nobleman who opposed Macbeth's accession t that: 2: Macbeth is relieved and feels secure, because he knows that all men are born of women and that forests cannot move. that: 3: When he learns that Macduff has fled to England to join Malcolm, Macbeth orders that Macduff's castle be seized and, most cruelly, that Lady Macduff and her children be murd Sentence (s) with maximum occurence of the word "the" : *-* the: 5: The play begins with the brief appearance of a trio of witches and then moves to a military camp, where the Scottish King Duncan hears the news that his generals, Macbeth Sentence(s) with maximum occurence of the word "of": *-* of: 4: There, they show him a sequence of demons and spirits who present him with further prophecies: he must beware of Macduff, a Scottish nobleman who opposed Macbeth's accessi Sentence (s) with maximum occurence of the word "was": *-* No result Sentence (s) with maximum occurence of the word "but the": *-* No result Sentence (s) with maximum occurence of the word "it was": *-* No result Sentence (s) with maximum occurence of the word "in my": *-* No result BUILD SUCCESSFUL (total time: 5 seconds)


Related Solutions

Create an ID table class that maps identifiers to memory addresses for a Parser java program....
Create an ID table class that maps identifiers to memory addresses for a Parser java program. The first identifier will be at address 0, the second at address 1, and so on. I am working on a Lexer/Parser project for java. I've completed the lexer and parser portion of the program but I'm looking for some assistance in creating a ID table class. That has specific methods that add token identifiers into a hashmap. Here are the specifications that I'm...
Please do this in java program. In this assignment you are required to implement the Producer...
Please do this in java program. In this assignment you are required to implement the Producer Consumer Problem . Assume that there is only one Producer and there is only one Consumer. 1. The problem you will be solving is the bounded-buffer producer-consumer problem. You are required to implement this assignment in Java This buffer can hold a fixed number of items. This buffer needs to be a first-in first-out (FIFO) buffer. You should implement this as a Circular Buffer...
I. General Description In this assignment, you will create a Java program to read undergraduate and...
I. General Description In this assignment, you will create a Java program to read undergraduate and graduate students from an input file, sort them, and write them to an output file. This assignment is a follow up of assignment 5. Like assignment 5, your program will read from an input file and write to an output file. The input file name and the output file name are passed in as the first and second arguments at command line, respectively. Unlike...
I. General Description In this assignment, you will create a Java program to read undergraduate and...
I. General Description In this assignment, you will create a Java program to read undergraduate and graduate students from an input file, and write them in reverse order to an output file. 1. The input file name and the output file name are passed in as the first and second arguments at command line, respectively. For example, assume your package name is FuAssign5 and your main class name is FuAssignment5, and your executable files are in “C:\Users\2734848\eclipse-workspace\CIS 265 Assignments\bin”. The...
I. General Description In this assignment, you will create a Java program to read undergraduate and...
I. General Description In this assignment, you will create a Java program to read undergraduate and graduate students from an input file, sort them, and write them to an output file. This assignment is a follow up of assignment 5. Like assignment 5, your program will read from an input file and write to an output file. The input file name and the output file name are passed in as the first and second arguments at command line, respectively. Unlike...
I. General Description In this assignment, you will create a Java program to search recursively for...
I. General Description In this assignment, you will create a Java program to search recursively for a file in a directory. • The program must take two command line parameters. First parameter is the folder to search for. The second parameter is the filename to look for, which may only be a partial name. • If incorrect number of parameters are given, your program should print an error message and show the correct format. • Your program must search recursively...
I. General Description In this assignment, you will create a Java program to read undergraduate and...
I. General Description In this assignment, you will create a Java program to read undergraduate and graduate students from an input file, sort them, and write them to an output file. This assignment is a follow up of assignment 5. Like assignment 5, your program will read from an input file and write to an output file. The input file name and the output file name are passed in as the first and second arguments at command line, respectively. Unlike...
In this assignment, you shall create a complete C++ program that will read from a file,...
In this assignment, you shall create a complete C++ program that will read from a file, "studentInfo.txt", the user ID for a student (first letter of their first name connected to their last name Next it will need to read three integer values that will represent the 3 exam scores the student got for the semester. Once the values are read and stored in descriptive variables it will then need to calculate a weighted course average for that student. Below...
Your task is to modify the program from the Java Arrays programming assignment to use text...
Your task is to modify the program from the Java Arrays programming assignment to use text files for input and output. I suggest you save acopy of the original before modifying the software. Your modified program should: contain a for loop to read the five test score into the array from a text data file. You will need to create and save a data file for the program to use. It should have one test score on each line of...
C++ UML is required for this program. Partial credit will be given in the program if...
C++ UML is required for this program. Partial credit will be given in the program if you comment your code and I can see you had the right idea even if it is not working correctly. It is better to write comments for parts of the program you cannot figure out than to write nothing at all. Read the directions for the program carefully before starting. Make sure and ask if the directions are unclear. The Rent-a-Pig company rents guinea...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT