Purpose
This project is meant to give you experience with sorting, binary searching, and Big-Oh complexity.
Objective "Write in Java please"
Your goal is to take a book, as a large text file, and build a digital “concordance”. A concordance is like an index for a book in that it contains entries for words in the book, but it also includes passages from the book containing that word. For example, a query for the word Dormouse in Alice in Wonderland would produce results like:
14153
he poured a little hot tea upon its nose. The Dormouse shook its head impatiently, and said
14803
`Once upon a time there were three little sisters,' the Dormouse began in a great hurry; `and their names were Elsie,
where 14153 is the position of the word Dormouse in the file, and the passage includes the 10 words before and after Dormouse.
Approach
You should approach the project in four steps:
1. Open the book as a text file, identify each word, and save its position
2. Sort the words in the text file alphabetically using a merge sort
3. Remove duplicate words, saving only their indices
4. Allow the user to enter words, and return the passages where the words occur.
STEP 1: READ THE TEXT FILE AND NUMBER ALL THE WORDS
Consider a text file fox.txt containing:
the quick brown fox jumps over the lazy dog
Generate a new file fox.txt_words.txt that reads as follows:
9
the 0
quick 1
brown 2
fox 3
jumps 4
over 5
the 6
lazy 7
dog 8
where the first number, 9, is the total number of words, and what follows is each word and its placement in the file.
Suggested approach
1. Read the text file and count the words but don’t save them.
2. Make an array to hold the words.txt
3. Make a new FileReader and Scanner object, read the words from the text file into that array
4. Write the number of words to a new text file, then step through the array and write each word and its index to that file.
STEP 2: SORT THE WORDS ALPHABETICALLY
Starting with the text file fox.txt_words.txt, make a new text file fox.txt_sorted.txt . That file should contain the same words as the previous file, but this time the words are in alphabetical order. This is what the file should look like:
9
brown 2
dog 8
fox 3
jumps 4
lazy 7
over 5
quick 1
the 0
the 6
You may try this initially with a bubble sort. However, to get full credit for this step, you must use a merge sort to sort the words.
Approach:
1. Make a filereader/scanner for words.txt, read the number of words, and make two arrays: a string array to hold the words, and an int array to hold the indices.
2. Use the merge sort algorithm we learned in class to sort the string array. Make sure when you adjust the string array that you also adjust the int array, preserving the index of each word
3. Make a printwriter and write the words and indices to a new file fox.txt_sorted.txt.
STEP 3: REMOVE DUPLICATE WORDS
Now use fox.txt_sorted.txt to make a fourth file fox.txt_index.txt that reads like this:
brown 2
dog 8
fox 3
jumps 4
lazy 7
over 5
quick 1
the 0 6
Where 8 refers to the number of unique words, and each word is followed by one or more indices.
Approach:
1. Make a filereader and printwriter
2. Repeat the following for each word:
3. Read a word and an index
4. Check if that word is the same as a the previous word read from the input file
5. If no, write that word and its index to the output file
6. If yes, just write the index to the output file
STEP 4: QUERY AND SEARCH
Your program should now prompt the user for a word, find the word, and return its indices. For example:
Enter a word: the
the: 0 6
Enter a word: lazy
lazy: 7
Enter a word: dormouse
Can’t find dormouse in the text
You must use a binary search, like we did in class, to locate the word.
Your program should continue to query and search until the user closes the program.
Approach:
1. Make an filereader/scanner for fox.txt_index.txt
2. Read in each word and its indices. You may want to use the nextLine function to read all the indices as a single string, in which case you’ll have two arrays: string words[] and string indices[].
3. Prompt the user for a word.
4. Use a binary search to find the matching word.
5. Print out its indices if it exists or an error if it doesn’t
6. Repeat steps 3 and 4 forever
STEP 5: PRINT OUT THE SURROUNDING WORDS
Modify your code in step 4 so that, for each index, it prints out text from the original file. You should print out the previous two words, the query word, and the subsequent two words. For example:
Enter a word: the
0: the quick brown
6: jumps over the lazy dog
Enter a word: lazy
7: over the lazy dog
Enter a word: dormouse
Can’t find dormouse in the text
Approach:
Modify your code from Step 4:
1. After reading from fox.txt_index.txt, make a second array string originaltext[] and read in the original fox.txt word-by-word.
2. After querying the user and searching for the word, get its index entry and:
3. Make an istringstream object to parse the index string
4. Read a number index from that string: an index where that word occurs
5. Make a for loop, starting at i=index-2 and going to i<=index+2
6. Print originaltext[i] followed by a space
7. Repeat steps 4 – 7 until you have printed all the indices.
STEP 6: TIMING
1.
Go to Project Gutenberg and download three books: a short book, a medium book, and a long book. For example, Alice in Wonderland’s page is
https://www.gutenberg.org/ebooks/11
and you should download the Plain Text as alice.txt.
I recommend Alice in Wonderland or the Wizard of Oz as a short book, Great Expectations as a medium book, and War and Peace or the King James Bible as a long book.
2.
Use your STEP 1 code and record the total number of words in each book.
3.
Use System.currentTimeMillis to measure the elapsed time for STEP 1 on each book. Depending on the speed of your computer, you may or may not get non-zero values. If you do, do you see a O(n) runtime? (Use the approach below).
4.
Now measure the merge sort times in STEP 2. Unless your computer is really fast, you ought to see a measurable delay. Run the program a few times on each book, throw out any outliers, and average the runtimes for each book.
Since merge-sort is O(n log n), the runtime should be
runtime = k * #words * log(#words)
Try computing k for each book:
k = runtime / ( #words * log(#words)) the log base shouldn’t matter.
Are all three k’s approximately the same?
5.
If you have a really slow computer, you might be able to time the queries and see if they have a O(log n) time. Don’t bother doing this if they all have zero time.
Grading
Grading for this project will be as follows:
You complete STEP 1 correctly 50% (F+)
You complete STEP 2, but use a sort that isn’t merge sort 60% (D-)
You complete STEP 2 using merge sort 70% (C-)
You complete STEP 3 75% (C)
You complete STEP 4 85% (B)
You complete STEP 5 95% (A)
You measure the timings in STEP 6 100% (A+)
No credit will be given for code that doesn’t compile. If you only partially complete a step, comment your work on the step out before submitting.
Basic Requirements
In: Computer Science
A 1000-word essay about one African or African-American thought leader in their field who has inspired them to achieve their goals 1000-word essay about one African or African-American thought leader in their field who has inspired them to achieve their goals. (Health-related)
In: Nursing
A 1000-word essay about one African or African-American thought leader in their field who has inspired them to achieve their goals 1000-word essay about one African or African-American thought leader in their field who has inspired them to achieve their goals. (Health-related)
In: Nursing
Add a new function that takes a phrase as an argument and counts each unique word in the phrase. The function should return a list of lists, where each sub-list is a unique [word, count] pair. Hint: A well-written list comprehension can solve this in a single line of code, but this approach is not required.
In: Computer Science
In Python write a function with prototype “def wordfreq(filename = "somefile.txt"):” that will read a given file that contains words separated by spaces (perhaps multiple words on a line) and will create a dictionary whose keys are the words and the value is the number of times the word appears. Convert each word to lower case before processing.
In: Computer Science
The assignment is to build a program in Python that can take a string as input and produce a “frequency list” of all of the wordsin the string (see the definition of a word below.) For the purposes of this assignment, the input strings can be assumed not to contain escape characters (\n, \t, …) and to be readable with a single input() statement.
When your program ends, it prints the list of words. In the output, each line contains of a single word and the number of times that word occurred in the input. For readability, the number should be the first thing on the line and the word should be second.
For example, here are two runs of the program, showing the user’s input and the output:
Enter a line of text: This is a very long line of text with many words in it, most of them only once.
1 this
1 is
1 a
1 very
1 long
1 line
2 of
1 text
1 with
1 many
1 words
1 in
1 it
1 most
1 them
1 only
1 once
In: Computer Science
Problem 1: Recursive anagram finder
please used Java please
Write a program that will take a word and print out every combination of letters in that word. For example, "sam" would print out sam, sma, asm, ams, mas, msa (the ordering doesn't matter)
input:
cram
output:
cram
crma
carm
camr
cmra
cmar
rcam
rcma
racm
ramc
rmca
rmac
acrm
acmr
arcm
armc
amcr
amrc
mcra
mcar
mrca
mrac
macr
marc
Hints:
Your recursive function should have no return value and two parameters: string letters and string wordSoFar. You can initially call it getComb("sam","");
Base case:
no letters left. print out the word and return
Recursive case:
go through each letter from the word, remove it from letters (use substring), add it to wordSoFar, call the recursive function
In: Computer Science
Write a program that prompts the user for a file name, make sure the file exists and if it does reads through the file, count the number of times each word appears and then output the word count in a sorted order from high to low.
The program should:
the - 7
in - 6
to - 5
and - 4
of - 4
Using Python as program language
In: Computer Science
Implement synchronous send and receive of one word messages (also known as Ada-style rendezvous), using condition variables (don't use semaphores!). Implement the Communicator class with operations, void speak(int word) and int listen().
speak() atomically waits until listen() is called on the same Communicator object, and then transfers the word over to listen(). Once the transfer is made, both can return. Similarly, listen() waits until speak() is called, at which point the transfer is made, and both can return (listen() returns the word). Your solution should work even if there are multiple speakers and listeners for the same Communicator (note: this is equivalent to a zero-length bounded buffer; since the buffer has no room, the producer and consumer must interact directly, requiring that they wait for one another). Each communicator should only use exactly one lock. If you're using more than one lock, you're making things too complicated.
In: Computer Science
1.Search the Web for a project that was completed successfully. Write a 100-200 word summary of the project, including the critical factors that made this project a success.
2.Search the Web for a project that was not completed successfully. Write a 100-200 word summary of the project, including the reasons why you think this project failed.
In: Operations Management