In: Computer Science
Java program
In this assignment you are required to create a text parser in Java/C++. Given a input text file you need to parse it and answer a set of frequency related questions.
Technical Requirement of Solution:
You are required to do this ab initio (bare-bones from scratch). This means, your solution cannot use any library methods in Java except the ones listed below (or equivalent library functions in C++).
Create as many files, intermediate array-based data structures as you wish, and allocate as much heap memory from JVM as you need. BUT you are allowed to read the input file EXACTLY ONCE to answer ALL the questions. You however can use any internal array based representation of the whole file to do multiple rounds of processing if needed after having read it EXACTLY ONCE.
Suggested programming language Java. However considering this is the very first assignment, you can use C/C++ provided similar constructs and rules are followed and no library functions that leverage hash and maps are used.
For the following questions, list all matching output if there are ties
Implementation Detail:
Inputs
The program has two arguments:
For example:
$ java HW1 "./input.txt" "output"
input file: A text document. Assume each newline (\n) defines a paragraph. Each period (.) defines end of an sentence. Or if a sentence is the last in a paragraph and doesn’t have an explicit period (.), its end marker is the same as a newline. Each space within a sentence (character ‘32’) define the word delimiter. The assignment is case insensitive so you must transform and work in lower case.
Outputs: Click here to download sample input and answer files. As grading is automated, your output must conform to the following specifications. For each of the 9 questions you must create one single output file. So your program should produce 9 output files each time you run it. If a question has multiple output (multiple sentences/words/…) you should print each sentence in a new line. Do not print them on the same line! The order of the sentences/words/phrases is not important. However, the order of the output file name must be matching the order of the questions. For example, given prefix “output”, the output file of the first question should be output1.txt and for the second question it is output2.txt. The output format depends on the question type and it must be:
For question 1 and 2:
word:frequency e.g.
the:10
For question 3:
word:frequency:sentence e.g.
the:9:you see watson he explained in the early hours of the
morning...
For question 4-9:
word:frequency:sentence e.g.
was:2:then it was withdrawn as suddenly as it appeared...
was:2:the 4 a week was a lure which must draw him and...
Other details
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;
import java.util.Scanner;
import java.util.Set;
public class FileSentenceStats {
public static void main(String[]args)
{
Scanner sc = new Scanner(System.in);
Scanner fileReader;
ArrayList<String> sentences = new ArrayList<>();
System.out.print("Enter the input file name: ");
String inputFileName = sc.nextLine().trim();
System.out.print("Enter the output file name (without file
extension): ");
String outputFileName = sc.nextLine().trim() + ".txt";
// read input file
ArrayList<String> temp = new ArrayList<>();
try
{
fileReader = new Scanner(new File(inputFileName));
while(fileReader.hasNextLine())
{
temp.add(fileReader.nextLine().trim());
}
fileReader.close();
}catch(FileNotFoundException fnfe){
System.out.println("Could not locate " + inputFileName);
System.exit(0);
}
for(int i = 0; i < temp.size(); i++)
{
String[] info = temp.get(i).split("(?<=[.!?])\\s*");
for(String sens : info)
{
sentences.add(sens);
}
}
// write the results to the file
FileWriter fw;
PrintWriter pw;
try {
fw = new FileWriter(new File(outputFileName), true); // open the
file in append mode
pw = new PrintWriter(fw);
pw.write("Word with the highest frequency: " +
findWordWithHighestFrequency(sentences)
+ System.lineSeparator());
pw.write("Word with the 3rd highest frequency: " +
findWordWithThirdHighestFrequency(sentences)
+ System.lineSeparator());
ArrayList<String> highest =
findWordWithHighestFrequencyPerSentence(sentences);
pw.write(System.lineSeparator() + "Highest words per sentences: " +
System.lineSeparator());
for(String s : highest)
{
pw.write(s + System.lineSeparator());
}
pw.write(System.lineSeparator() + "Sentence(s) with maximum
occurrence of the word \"the\": " + System.lineSeparator());
ArrayList<String> maxThe =
findMaxOccurringWordInASentence(sentences, "the");
for(String s : maxThe)
{
pw.write("*-* " + s + System.lineSeparator());
}
pw.write(System.lineSeparator() + "Sentence(s) with maximum
occurrence of the word \"of\": " + System.lineSeparator());
ArrayList<String> maxOf =
findMaxOccurringWordInASentence(sentences, "of");
for(String s : maxOf)
{
pw.write("*-* " + s + System.lineSeparator());
}
pw.write(System.lineSeparator() + "Sentence(s) with maximum
occurrence of the word \"was\": " + System.lineSeparator());
ArrayList<String> maxWas =
findMaxOccurringWordInASentence(sentences, "was");
for(String s : maxWas)
{
pw.write("*-* " + s + System.lineSeparator());
}
pw.write(System.lineSeparator() + "Sentence(s) with maximum
occurrence of the word \"but the\": " +
System.lineSeparator());
ArrayList<String> maxButThe =
findMaxOccurringWordInASentence(sentences, "but the");
for(String s : maxButThe)
{
pw.write("*-* " + s + System.lineSeparator());
}
pw.write(System.lineSeparator() + "Sentence(s) with maximum
occurrence of the word \"it was\": " +
System.lineSeparator());
ArrayList<String> maxItWas =
findMaxOccurringWordInASentence(sentences, "it was");
for(String s : maxItWas)
{
pw.write("*-* " + s + System.lineSeparator());
}
pw.write(System.lineSeparator() + "Sentence(s) with maximum
occurrence of the word \"in my\": " +
System.lineSeparator());
ArrayList<String> maxInMy =
findMaxOccurringWordInASentence(sentences, "in my");
for(String s : maxInMy)
{
pw.write("*-* " + s + System.lineSeparator());
}
pw.flush();
fw.close();
pw.close();
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
}
private static String
findWordWithHighestFrequency(ArrayList<String>
sentences)
{
HashMap<String, Integer> hashmap = new
HashMap<>();
// iterate over the array of words
ArrayList<String> words = new ArrayList<>();
for(String sen : sentences)
{
String[] data = sen.split(" ");
for(int j = 0; j < data.length; j++)
{
words.add(data[j]);
}
}
for(int i = 0; i < words.size(); i++)
{
if(hashmap.containsKey(words.get(i)))
hashmap.put(words.get(i), hashmap.get(words.get(i)) + 1);
else
hashmap.put(words.get(i), 1);
}
Set<Map.Entry<String, Integer> > set =
hashmap.entrySet();
String key = "";
int freq = 0;
for (Map.Entry<String, Integer> me : set)
{
if(me.getValue() > freq)
{
freq = me.getValue();
key = me.getKey();
}
}
String res = key + ": " + freq;
return res;
}
private static String
findWordWithThirdHighestFrequency(ArrayList<String>
sentences)
{
HashMap<String, Integer> hashmap = new
HashMap<>();
// iterate over the array of words
ArrayList<String> words = new ArrayList<>();
for(String sen : sentences)
{
String[] data = sen.split(" ");
for(int j = 0; j < data.length; j++)
{
words.add(data[j]);
}
}
for(int i = 0; i < words.size(); i++)
{
if(hashmap.containsKey(words.get(i)))
hashmap.put(words.get(i), hashmap.get(words.get(i)) + 1);
else
hashmap.put(words.get(i), 1);
}
// 1st highest
Set<Map.Entry<String, Integer> > set =
hashmap.entrySet();
String key1 = "";
int freq1 = 0;
for (Map.Entry<String, Integer> me : set)
{
if(me.getValue() > freq1)
{
freq1 = me.getValue();
key1 = me.getKey();
}
}
hashmap.remove(key1);
// 2nd highest
set = hashmap.entrySet();
String key2 = key1;
int freq2 = 0;
for (Map.Entry<String, Integer> me : set)
{
if(me.getValue() > freq2)
{
freq2 = me.getValue();
key2 = me.getKey();
}
}
hashmap.remove(key2);
// 3rd highest
set = hashmap.entrySet();
String key3 = key2;
int freq3 = 0;
for (Map.Entry<String, Integer> me : set)
{
if(me.getValue() > freq3)
{
freq3 = me.getValue();
key3 = me.getKey();
}
}
String result = key3 + ": " + freq3;
return result;
}
private static ArrayList<String>
findWordWithHighestFrequencyPerSentence(ArrayList<String>
sentences)
{
HashMap<String, Integer> hashmap = new
HashMap<>();
ArrayList<String> highestWords = new
ArrayList<>();
// iterate over all the sentences
for(String sen : sentences)
{
String[] data = sen.toLowerCase().split(" ");
for(int i = 0; i < data.length; i++)
{
if(hashmap.containsKey(data[i]))
hashmap.put(data[i], hashmap.get(data[i]) + 1);
else
hashmap.put(data[i], 1);
}
Set<Map.Entry<String, Integer> > set =
hashmap.entrySet();
String key = "";
int freq = 0;
for (Map.Entry<String, Integer> me : set)
{
if(me.getValue() > freq)
{
freq = me.getValue();
key = me.getKey();
}
}
highestWords.add(key + ": " + freq + ": " + sen);
set.clear();
}
return highestWords;
}
private static ArrayList<String>
findMaxOccurringWordInASentence(ArrayList<String> sentences,
String word)
{
ArrayList<String> sens = new ArrayList<>();
ArrayList<Integer> freq = new ArrayList<>();
ArrayList<String> res = new ArrayList<>();
for(String sen : sentences)
{
String[] wordsPerSentence = sen.split(" ");
int count = 0;
for(int i = 0; i < wordsPerSentence.length; i++)
{
if(wordsPerSentence[i].equalsIgnoreCase(word))
{
count++;
}
}
if(count > 0)
{
sens.add(sen);
freq.add(count);
}
}
String result;
if(freq.size() > 0)
{
int max = freq.get(0);
int index = 0;
String sentence = "";
for(int i = 0; i < freq.size(); i++)
{
if(freq.get(i) > max)
{
max = freq.get(i);
index = i;
}
}
sentence = sens.get(index);
result = word + ": " + max + ": " + sentence;
res.add(result);
}
else
{
result = "No result";
res.add(result);
}
return res;
}
}
******************************************************************** SCREENSHOT *******************************************************
run: Enter the input file name: input.txt Word with the highest frequency: the: 27 Word with the 3rd highest frequency: of: 22 Highest words per sentences: the: 5: The play begins with the brief appearance of a trio of witches and then moves to a military camp, where the Scottish King Duncan hears the news that his generals, Macbeth and their: 1: Following their pitched battle with these enemy forces, Macbeth and Banquo encounter the witches as they cross a moor. of: 3: The Witches prophesy that Macbeth will be made thane (a rank of Scottish nobility) of Cawdor and eventually King of Scotland. will: 2: They also prophesy that Macbeth's companion, Banquo, will beget a line of Scottish kings, although Banquo will never be king himself. and: 3: The witches vanish, and Macbeth and Banquo treat their prophecies skeptically until some of King Duncan's men come to thank the two generals for their victories in battle and the: 2: The previous thane betrayed Scotland by fighting for the Norwegians and Duncan has condemned him to death. the: 3: Macbeth is intrigued by the possibility that the remainder of the witches' prophecy—that he will be crowned king-might be true, but he is uncertain what to expect. macbeth's: 1: He visits with King Duncan, and they plan to dine together at Inverness, Macbeth's castle, that night. telling: 1: Macbeth writes ahead to his wife, Lady Macbeth, telling her all that has happened. none: 1: Lady Macbeth suffers none of her husband's uncertainty. him: 2: She desires the kingship for him and wants him to murder Duncan in order to obtain it. arrives: 1: When Macbeth arrives at Inverness, she overrides all of her husband's objections and persuades him to kill the king that very night. will: 4: He and Lady Macbeth plan to get Duncan's two chamberlains drunk so they will black out; the next morning they will blame the murder on the chamberlains, who will be defensel a: 3: While Duncan is asleep, Macbeth stabs him, despite his doubts and a number of supernatural portents, including a vision of a bloody dagger. the: 3: When Duncan's death is discovered the next morning, Macbeth kills the chamberlains-ostensibly out of rage at their crime-and easily assumes the kingship. and: 2: Duncan's sons Malcolm and Donalbain flee to England and Ireland, respectively, fearing that whoever killed Duncan desires their demise as well. of: 2: Fearful of the witches' prophecy that Banquo's heirs will seize the throne, Macbeth hires a group of murderers to kill Banquo and his son Fleance! they: 2: They ambush Banquo on his way to a royal feast, but they fail to kill Fleance, who escapes into the night. as: 2: Macbeth becomes furious: as long as Fleance is alive, he fears that his power remains insecure. banquo's: 1: At the feast that night, Banquo's ghost visits Macbeth. the: 2: When he sees the ghost, Macbeth raves fearfully, startling his guests, who include most of the great Scottish nobility. but: 1: Lady Macbeth tries to neutralize the damage, but Macbeth's kingship incites increasing resistance from his nobles and subjects. their: 1: Frightened, Macbeth goes to visit the witches in their cavern. of: 4: There, they show him a sequence of demons and spirits who present him with further prophecies: he must beware of Macduff, a Scottish nobleman who opposed Macbeth's accession t that: 2: Macbeth is relieved and feels secure, because he knows that all men are born of women and that forests cannot move. that: 3: When he learns that Macduff has fled to England to join Malcolm, Macbeth orders that Macduff's castle be seized and, most cruelly, that Lady Macduff and her children be murd
arrives: 1: wnen Macpetn arrives at Inverness, sne overrides all of ner nuspana' s objections ana persuades nim to kill tne king tnat very nignt. will: 4: He and Lady Macbeth plan to get Duncan's two chamberlains drunk so they will black out; the next morning they will blame the murder on the chamberlains, who will be defensel a: 3: While Duncan is asleep, Macbeth stabs him, despite his doubts and a number of supernatural portents, including a vision of a bloody dagger. the: 3: When Duncan's death is discovered the next morning, Macbeth kills the chamberlains-ostensibly out of rage at their crime-and easily assumes the kingship. and: 2: Duncan's sons Malcolm and Donalbain flee to England and Ireland, respectively, fearing that whoever killed Duncan desires their demise as well. of: 2: Fearful of the witches' prophecy that Banquo's heirs will seize the throne, Macbeth hires a group of murderers to kill Banquo and his son Fleance! they: 2: They ambush Banquo on his way to a royal feast, but they fail to kill Fleance, who escapes into the night. as: 2: Macbeth becomes furious: as long as Fleance is alive, he fears that his power remains insecure. banquo's: 1: At the feast that night, Banquo's ghost visits Macbeth. the: 2: When he sees the ghost, Macbeth raves fearfully, startling his guests, who include most of the great Scottish nobility. but: 1: Lady Macbeth tries to neutralize the damage, but Macbeth's kingship incites increasing resistance from his nobles and subjects. their: 1: Frightened, Macbeth goes to visit the witches in their cavern. of: 4: There, they show him a sequence of demons and spirits who present him with further prophecies: he must beware of Macduff, a Scottish nobleman who opposed Macbeth's accession t that: 2: Macbeth is relieved and feels secure, because he knows that all men are born of women and that forests cannot move. that: 3: When he learns that Macduff has fled to England to join Malcolm, Macbeth orders that Macduff's castle be seized and, most cruelly, that Lady Macduff and her children be murd Sentence (s) with maximum occurence of the word "the" : *-* the: 5: The play begins with the brief appearance of a trio of witches and then moves to a military camp, where the Scottish King Duncan hears the news that his generals, Macbeth Sentence(s) with maximum occurence of the word "of": *-* of: 4: There, they show him a sequence of demons and spirits who present him with further prophecies: he must beware of Macduff, a Scottish nobleman who opposed Macbeth's accessi Sentence (s) with maximum occurence of the word "was": *-* No result Sentence (s) with maximum occurence of the word "but the": *-* No result Sentence (s) with maximum occurence of the word "it was": *-* No result Sentence (s) with maximum occurence of the word "in my": *-* No result BUILD SUCCESSFUL (total time: 5 seconds)