In: Computer Science
Concepts tested by this program
Hash Table,
Link List,
hash code, buckets/chaining,
exception handling, read/write files (FileChooser)
A concordance lists every word that occurs in a document in alphabetical order, and for each word it gives the line number of every line in the document where the word occurs.
Write a program that creates a concordance. There will be two ways to create a concordance. The first requires a document to be read from an input file, and the concordance data is written to an output file. The second reads the input from a string and returns an ArrayList of strings that represent the concordance of the string.
Because they are so common, don't include the words "the" or “and” in your concordance. Also, do not include words that have length less than 3. Strip out all punctuation, except apostrophes that occur in the middle of a word, i.e. let’s, we’d, etc.
Data Elements – ConcordanceDataElement, implements Comparable and consists of a String (the word) and a reference to a LinkedList (list of line numbers where word occurs). Follow the Javadoc provided for you.
Data Structure – ConcordanceDataStructure,
Implements the ConcordanceDataStructureInterface Interface that is provided.
You will be implementing a hash table with buckets. It will be an array of linked list of ConcordanceDataElements. The add method will take a word and a line number to be added to the data structure. If the word already exists, the line number will be added to the linked list for this word. If the line number for the word already exists, don’t add it again to the linked list. (i.e. if Sarah was on line 5 twice, the first line 5 would be added to the linked list for Sarah, the second one would not). If the word doesn’t exist, create a ConcordanceDataElement and add it to the HashTable. Two constructors will be required, one that takes in an integer that is the estimated number of words in the text, the other is used for testing purposes. Look at the provided Javadoc.
Data Manager – ConcordanceDataManager
Implements the ConcordanceDataManagerInterface interface that is provided.
The data manager allows the client (user) to create a concordance file or a concordance list (ArrayList of strings). The input is read (from a file or string) and is added to the data structure through the add method. The add method requires a word and a line number. The line number is incremented every time a newline appears in the file or the string.
Concepts tested by this program
Hash Table,
Link List,
hash code, buckets/chaining,
exception handling, read/write files (FileChooser)
A concordance lists every word that occurs in a document in alphabetical order, and for each word it gives the line number of every line in the document where the word occurs.
Write a program that creates a concordance. There will be two ways to create a concordance. The first requires a document to be read from an input file, and the concordance data is written to an output file. The second reads the input from a string and returns an ArrayList of strings that represent the concordance of the string.
Because they are so common, don't include the words "the" or “and” in your concordance. Also, do not include words that have length less than 3. Strip out all punctuation, except apostrophes that occur in the middle of a word, i.e. let’s, we’d, etc.
Data Elements – ConcordanceDataElement, implements Comparable and consists of a String (the word) and a reference to a LinkedList (list of line numbers where word occurs). Follow the Javadoc provided for you.
Data Structure – ConcordanceDataStructure,
Implements the ConcordanceDataStructureInterface Interface that is provided.
You will be implementing a hash table with buckets. It will be an array of linked list of ConcordanceDataElements. The add method will take a word and a line number to be added to the data structure. If the word already exists, the line number will be added to the linked list for this word. If the line number for the word already exists, don’t add it again to the linked list. (i.e. if Sarah was on line 5 twice, the first line 5 would be added to the linked list for Sarah, the second one would not). If the word doesn’t exist, create a ConcordanceDataElement and add it to the HashTable. Two constructors will be required, one that takes in an integer that is the estimated number of words in the text, the other is used for testing purposes. Look at the provided Javadoc.
Data Manager – ConcordanceDataManager
Implements the ConcordanceDataManagerInterface interface that is provided.
The data manager allows the client (user) to create a concordance file or a concordance list (ArrayList of strings). The input is read (from a file or string) and is added to the data structure through the add method. The add method requires a word and a line number. The line number is incremented every time a newline appears in the file or the string.
ConcordanceDataElement.java
import java.util.LinkedList;
ublic class ConcordanceDataElement
{
private String concordanceWord;
private LinkedList
private int hashCodeNumber;
/**
* The constructor
*
* @param word the word for the concordance data
element
*/
public ConcordanceDataElement(java.lang.String
word)
{
concordanceWord = word;
pageNumbers = new
LinkedList
}
/**
* Returns the word followed by page numbers.
*
* @return a string in the following format: word: page
num, page num Example: after: 2,8,15
*/
public java.lang.String toString()
{
String display;
display = concordanceWord + ":
";
for(int i = 0; i <
pageNumbers.size(); i++)
{
// add a ","
after the page number up until before the last page number in the
linked list.
if(i <
pageNumbers.size() - 1)
{
display +=
pageNumbers.get(i) + ", ";
}
// do not add a
"," at the end of the last page number that will be
displayed.
else
{
display += pageNumbers.get(i);
}
}
return display;
}
/**
* Return the word portion of the Concordance Data
Element
*
* @return the word portion of the Concordance Data
Element
*/
public java.lang.String getWord()
{
return
concordanceWord;
}
/**
* Returns the hashCode. You may use the String class
hashCode method
*
* @return the hashCode.
*/
public int hashCode()
{
hashCodeNumber =
concordanceWord.hashCode();
return hashCodeNumber;
}
/**
* Returns the linked list of integers that represent
the line numbers
*
* @return the linked list of integers that represent
the line numbers
*/
public java.util.LinkedList
{
return pageNumbers;
}
/**
* add the page number if the number doesn't exist in
the list
*
* @param lineNum the line number to add to the linked
list
*/
public void addPage(int lineNum)
{
if(!pageNumbers.contains(lineNum))
{
pageNumbers.addLast(lineNum);
}
}
}
ConcordanceDataManager.java
import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintWriter;
import java.lang.reflect.Array;
import java.util.ArrayList;
import java.util.Scanner;
public class ConcordanceDataManager implements
ConcordanceDataManagerInterface
{
private ConcordanceDataStructure x = new
ConcordanceDataStructure();
/**
*
* Display the words in
Alphabetical Order followed by a :,
* followed by the line
numbers in numerical order, followed by a newline
* here's an example:
* after: 129, 175
* agree: 185
* agreed: 37
* all: 24, 93, 112, 175,
203
* always: 90, 128
*
* @param input a String
(usually consist of several lines) to
* make a concordance of
*
* @return an ArrayList of
Strings. Each string has one word,
* followed by a :, followed
by the line numbers in numerical order,
* followed by a
newline.
*/
@Override
public ArrayList
{
String[] line; // will hold the
contents of each line in the string passed in
String[] word; // will hold each
singular word from
int lineNum = 0;
//split each line of the string
into an array
line = input.split("\n");
//System.out.println(line[1]);
// loop through the array
containing each line of text
for(int i = 0; i < line.length;
i++)
{
// split each
word in the current line into a new array.
word =
line[i].split(" ");
lineNum = i + 1; // keep track of the current line number
// loop
through the array containing all the words of a line
for(int j = 0; j
< word.length; j++)
{
// "don't include the words "the" or �and� in
your concordance.
// Strip out all punctuation, except apostrophes
that occur in the middle of a word, i.e. let�s, we�d, etc."
if( !word[j].equals("the") &&
!word[j].equals("and") && word[j].length() >= 3 )
{
// Strip out all punctuation,
except apostrophes that occur in the middle of a word, i.e. let�s,
we�d, etc."
word[j] =
word[j].replaceAll("[.:,']","");
word[j] =
word[j].replaceAll("_","");
word[j] =
word[j].replaceAll("\"","");
// Also make the word all
lowercase
word[j] =
word[j].toLowerCase();
//Also, do not include words
that have length less than 3.
if(word[j].length() >= 3
)
{
//System.out.println(word[j] + ": " + lineNum);
// Add
each word into the ConcordanceDataStrcuture structure, for it to
store the word in the correct concordance position
x.add(word[j], lineNum);
}
}
}
}
ArrayList
return concordance;
}
/**
* Creates a file that holds
the concordance
*
* @param input the File to
read from
* @param output the File to
write to
*
* Following is an
example:
*
* about: 24, 210
* abuse: 96
* account: 79
* acknowledged: 10
*
* @return true if the
concordance file was created successfully.
* @throws
FileNotFoundException if file not found
*/
@SuppressWarnings("resource")
@Override
public boolean createConcordanceFile(File input, File
output) throws FileNotFoundException
{
ArrayList
String inputData = "";
String[] line;
String[] word;
int lineNum = 0;
if( !input.canRead() ||
!output.canWrite() )
{
throw new
FileNotFoundException();
}
Scanner inputFile;
inputFile = new
Scanner(input);
// Read each content, line by line
from the .txt file into a String ArrayList
while (inputFile.hasNext())
{
dataFile.add(inputFile.nextLine());
}
inputFile.close();
// loop through the ArrayList
containing all the lines
for(int i = 0; i <
dataFile.size(); i++)
{
// split each
word in the current line into a new array.
word =
dataFile.get(i).split(" ");
lineNum = i + 1;
// keep track of the current line number
// loop through
the array containing all the words of a line
for(int j = 0; j
< word.length; j++)
{
// "don't include the words "the" or �and� in
your concordance.
if( !word[j].equals("the") &&
!word[j].equals("and") && word[j].length() >= 3)
{
// Strip out all punctuation,
except apostrophes that occur in the middle of a word, i.e. let�s,
we�d, etc."
word[j] =
word[j].replaceAll("[.:,']","");
word[j] =
word[j].replaceAll("_","");
word[j] =
word[j].replaceAll("\"","");
// Also make the word all
lowercase
word[j] =
word[j].toLowerCase();
//Also, do not include words
that have length less than 3.
if(word[j].length() >= 3
)
{
// Add
each word into the ConcordanceDataStrcuture structure, for it to
store the word in the correct concordance position
x.add(word[j], lineNum);
}
}
}
ArrayList
// Will use the
output file that is passed into this method to write the
concordance into it.
PrintWriter
outFile = new PrintWriter(output);
for(int k =
0; k < concordanceOutputData.size(); k++)
{
// Print the words that have been arranged into
concordance into the output file.
outFile.print(concordanceOutputData.get(k));
}
outFile.close();
inputFile.close();
}
return true;
}
}
ConcordanceDataManager.java
import java.io.File;
import java.io.FileNotFoundException;
import java.io.PrintWriter;
import java.lang.reflect.Array;
import java.util.ArrayList;
import java.util.Scanner;
public class ConcordanceDataManager implements
ConcordanceDataManagerInterface
{
private ConcordanceDataStructure x = new
ConcordanceDataStructure();
/**
*
* Display the words in
Alphabetical Order followed by a :,
* followed by the line
numbers in numerical order, followed by a newline
* here's an example:
* after: 129, 175
* agree: 185
* agreed: 37
* all: 24, 93, 112, 175,
203
* always: 90, 128
*
* @param input a String
(usually consist of several lines) to
* make a concordance of
*
* @return an ArrayList of
Strings. Each string has one word,
* followed by a :, followed
by the line numbers in numerical order,
* followed by a
newline.
*/
@Override
public ArrayList
{
String[] line; // will hold the
contents of each line in the string passed in
String[] word; // will hold each
singular word from
int lineNum = 0;
//split each line of the string
into an array
line = input.split("\n");
//System.out.println(line[1]);
// loop through the array
containing each line of text
for(int i = 0; i < line.length;
i++)
{
// split each
word in the current line into a new array.
word =
line[i].split(" ");
lineNum = i + 1; // keep track of the current line number
// loop
through the array containing all the words of a line
for(int j = 0; j
< word.length; j++)
{
// "don't include the words "the" or �and� in
your concordance.
// Strip out all punctuation, except apostrophes
that occur in the middle of a word, i.e. let�s, we�d, etc."
if( !word[j].equals("the") &&
!word[j].equals("and") && word[j].length() >= 3 )
{
// Strip out all punctuation,
except apostrophes that occur in the middle of a word, i.e. let�s,
we�d, etc."
word[j] =
word[j].replaceAll("[.:,']","");
word[j] =
word[j].replaceAll("_","");
word[j] =
word[j].replaceAll("\"","");
// Also make the word all
lowercase
word[j] =
word[j].toLowerCase();
//Also, do not include words
that have length less than 3.
if(word[j].length() >= 3
)
{
//System.out.println(word[j] + ": " + lineNum);
// Add
each word into the ConcordanceDataStrcuture structure, for it to
store the word in the correct concordance position
x.add(word[j], lineNum);
}
}
}
}
ArrayList
return concordance;
}
/**
* Creates a file that holds
the concordance
*
* @param input the File to
read from
* @param output the File to
write to
*
* Following is an
example:
*
* about: 24, 210
* abuse: 96
* account: 79
* acknowledged: 10
*
* @return true if the
concordance file was created successfully.
* @throws
FileNotFoundException if file not found
*/
@SuppressWarnings("resource")
@Override
public boolean createConcordanceFile(File input, File
output) throws FileNotFoundException
{
ArrayList
String inputData = "";
String[] line;
String[] word;
int lineNum = 0;
if( !input.canRead() ||
!output.canWrite() )
{
throw new
FileNotFoundException();
}
Scanner inputFile;
inputFile = new
Scanner(input);
// Read each content, line by line
from the .txt file into a String ArrayList
while (inputFile.hasNext())
{
dataFile.add(inputFile.nextLine());
}
inputFile.close();
// loop through the ArrayList
containing all the lines
for(int i = 0; i <
dataFile.size(); i++)
{
// split each
word in the current line into a new array.
word =
dataFile.get(i).split(" ");
lineNum = i + 1;
// keep track of the current line number
// loop through
the array containing all the words of a line
for(int j = 0; j
< word.length; j++)
{
// "don't include the words "the" or �and� in
your concordance.
if( !word[j].equals("the") &&
!word[j].equals("and") && word[j].length() >= 3)
{
// Strip out all punctuation,
except apostrophes that occur in the middle of a word, i.e. let�s,
we�d, etc."
word[j] =
word[j].replaceAll("[.:,']","");
word[j] =
word[j].replaceAll("_","");
word[j] =
word[j].replaceAll("\"","");
// Also make the word all
lowercase
word[j] =
word[j].toLowerCase();
//Also, do not include words
that have length less than 3.
if(word[j].length() >= 3
)
{
// Add
each word into the ConcordanceDataStrcuture structure, for it to
store the word in the correct concordance position
x.add(word[j], lineNum);
}
}
}
ArrayList
// Will use the
output file that is passed into this method to write the
concordance into it.
PrintWriter
outFile = new PrintWriter(output);
for(int k =
0; k < concordanceOutputData.size(); k++)
{
// Print the words that have been arranged into
concordance into the output file.
outFile.print(concordanceOutputData.get(k));
}
outFile.close();
inputFile.close();
}
return true;
}
}
ConcordanceDataManagerInterface.java
import java.io.File;
import java.io.FileNotFoundException;
import java.util.ArrayList;
public interface ConcordanceDataManagerInterface
{
/**
*
* Display the words in
Alphabetical Order followed by a :,
* followed by the line
numbers in numerical order, followed by a newline
* here's an example:
* after: 129, 175
* agree: 185
* agreed: 37
* all: 24, 93, 112, 175,
203
* always: 90, 128
*
* @param input a String
(usually consist of several lines) to
* make a concordance of
*
* @return an ArrayList of
Strings. Each string has one word,
* followed by a :, followed
by the line numbers in numerical order,
* followed by a
newline.
*/
public ArrayList
/**
* Creates a file that holds
the concordance
*
* @param input the File to
read from
* @param output the File to
write to
*
* Following is an
example:
*
* about: 24, 210
* abuse: 96
* account: 79
* acknowledged: 10
*
* @return true if the
concordance file was created successfully.
* @throws
FileNotFoundException if file not found
*/
public boolean
createConcordanceFile(File input, File output) throws
FileNotFoundException;
} // end class Concordance
ConcordanceDataStructure.java
import java.util.ArrayList;
import java.util.Collections;
import java.util.LinkedList;
public class ConcordanceDataStructure implements
ConcordanceDataStructureInterface {
// Since there are 500 words in the data file given to
us for this assignment, I just chose to use the first prime number
thats greater than 75% of 500.
private static final int HASHTABLE_SIZE =
379;
// The primary storage area
private
LinkedList
/**
* Constructor which will initializes the
hash table.
*
* @param word the word to be added/updated
with a line number.
* @param lineNum the line number where the
word is found
*/
@SuppressWarnings("unchecked")
public ConcordanceDataStructure()
{
table = new
LinkedList[HASHTABLE_SIZE];
for (int i = 0; i < HASHTABLE_SIZE; i++){
table[i] = new LinkedList
}
}
/**
* Use the hashcode of the ConcordanceDataElement
to see if it is in the hashtable.
*
* If the word does not exist in the hashtable -
Add the ConcordanceDataElement
* to the hashtable. Put the line number in the
linked list
*
* If the word already exists in the
hashtable
* 1. add the line number to the end of the
linked list in the ConcordanceDataElement (if the line number is
not currently there).
*
* @param word the word to be added/updated with
a line number.
* @param lineNum the line number where the word
is found
*/
@Override
public void add(String word, int lineNum)
{
ConcordanceDataElement dataElement
= new ConcordanceDataElement(word);
dataElement.addPage(lineNum);
// Use the hashcode
for the word to insert it into the correct storage location in the
table.
int index =
Math.abs(dataElement.hashCode() % table.length);
//System.out.print(index
+ " ");
LinkedList
// If the hash
location does not contain the word then add it to the table.
if(current.contains(dataElement.getWord()) == false)
{
current.add(dataElement);
//System.out.println(row.toString());
}
// If the hash location
of the table already contains the word, but not the page number,
then add the page number to the linkedlist.
else
{
for (int i
= 0; i < current.size(); i++)
{
ConcordanceDataElement oldElement =
current.get(i);
if( oldElement.equals(dataElement))
{
if
(!oldElement.getList().contains(lineNum))
{
oldElement.addPage(lineNum);
}
break;
}
}
}
}
/**
* Display the words in Alphabetical Order
followed by a :,followed by the line numbers in numerical order,
followed by a newline
* here's an example:
* after: 129, 175
* agree: 185
* agreed: 37
* all: 24, 93, 112, 175, 203
* always: 90, 128
*
* @return an ArrayList of Strings. Each
string has one word,
* followed by a :, followed by the line
numbers in numerical order,
* followed by a newline.
*/
@Override
public ArrayList
{
ArrayList
for (int i = 0; i <
table.length; i++)
{
LinkedList
for (int a = 0; a < row.size(); a++)
{
showArray.add(row.get(a).toString() + "\n");
}
}
Collections.sort(showArray);
return showArray ;
}
}
ConcordanceDataStructureInterface.java
import java.util.ArrayList;
public interface ConcordanceDataStructureInterface{
/**
* Use the hashcode of the
ConcordanceDataElement to see if it is
* in the hashtable.
*
* If the word does not exist
in the hashtable - Add the ConcordanceDataElement
* to the hashtable. Put the
line number in the linked list
*
* If the word already exists
in the hashtable
* 1. add the line number to
the end of the linked list in the ConcordanceDataElement
* (if the line number is not
currently there).
*
* @param word the word to be
added/updated with a line number.
* @param lineNum the line
number where the word is found
*/
public void add(String word, int
lineNum);
/**
* Display the words in
Alphabetical Order followed by a :,
* followed by the line numbers
in numerical order, followed by a newline
* here's an example:
* after: 129, 175
* agree: 185
* agreed: 37
* all: 24, 93, 112, 175,
203
* always: 90, 128
*
* @return an ArrayList of
Strings. Each string has one word,
* followed by a :, followed by
the line numbers in numerical order,
* followed by a newline.
*/
public ArrayList
}// end of ConcordanceDataStructureInterface