Question

In: Computer Science

Write a Python program that will process the text file, Gettysburg.txt, by calculating the total words...

Write a Python program that will process the text file, Gettysburg.txt, by calculating the total words and output the number of occurrences of each word in the file.

The program needs to open the file and process each line. You need to add each word to the dictionary with a frequency of 1 or update the word’s count by 1. You need to print the output from high to low frequency.

The program needs 4 functions.

The first function is called add_word where you add each word to the dictionary. The parameters are the word and a dictionary. There is no return value.

The second function is called Process_line where you strip off various characters, split out the words, and so on. The parameters are a line and a dictionary. It calls the add_word function with each processed word. There is no return value.

The third function is called Pretty_print where this will be the printing function. The parameter is a dictionary. There is no return value.

The fourth function is the main where it will open the file and call Process_line on each line. When finished, it will call the Pretty_print function to print the dictionary.

Gettysburg.txt

Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.

Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.

But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us -- that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion -- that we here highly resolve that these dead shall not have died in vain -- that this nation, under God, shall have a new birth of freedom -- and that government of the people, by the people, for the people, shall not perish from the earth.

Abraham Lincoln
November 19, 1863

Solutions

Expert Solution

To identify the words in a line, first remove all the special characters from the line like, "",.!~>? etc., symbols. There by perform the split() operation to divide the line into list of words. Then count the frequency of words.

Code:

#add_word() function
def add_word(word, dic):
#check a word is in dictionary or not
if word not in dic:
dic[word]=1
else:
dic[word]+=1

#process_line()
def process_line(line, dic):
punct = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
new_line =''
#remove the special characters from the line
for ch in line.strip():
if ch not in punct:
new_line+=ch
#split the new_line into words and call add_word()
words_list = new_line.split()
for word in words_list:
add_word(word, dic)

#print the total number of words and each words with it's number of occurence
def pretty_print(dic):
total_words = sum(list(dic.values()))
print('Total words: {}'.format(total_words))
for key, value in dic.items():
print(key, value)

#main()
def main():
dic={} #dictionary
with open('Gettysburg.txt', 'r') as file: #opening file
for line in file.readlines(): #read each line
process_line(line, dic)
file.close()
pretty_print(dic) #call pretty_print() to print the dictionary

main() #call main()

Output:

Total words: 276
Four 1
score 1
and 6
seven 1
years 1
ago 1
our 2
fathers 1
brought 1
forth 1
on 2
this 4
continent 1
a 7
new 2
nation 5
conceived 2
in 4
Liberty 1
dedicated 4
to 8
the 9
proposition 1
that 13
all 1
men 2
are 3
created 1
equal 1
Now 1
we 8
engaged 1
great 3
civil 1
war 2
testing 1
whether 1
or 2
any 1
so 3
can 5
long 2
endure 1
We 2
met 1
battlefield 1
of 5
have 5
come 1
dedicate 2
portion 1
field 1
as 1
final 1
resting 1
place 1
for 5
those 1
who 3
here 8
gave 2
their 1
lives 1
might 1
live 1
It 3
is 3
altogether 1
fitting 1
proper 1
should 1
do 1
But 1
larger 1
sense 1
not 5
consecrate 1
hallow 1
ground 1
The 2
brave 1
living 2
dead 3
struggled 1
consecrated 1
it 2
far 2
above 1
poor 1
power 1
add 1
detract 1
world 1
will 1
little 1
note 1
nor 1
remember 1
what 2
say 1
but 1
never 1
forget 1
they 3
did 1
us 3
rather 2
be 2
unfinished 1
work 1
which 2
fought 1
thus 1
nobly 1
advanced 1
task 1
remaining 1
before 1
from 2
these 2
honored 1
take 1
increased 1
devotion 2
cause 1
last 1
full 1
measure 1
highly 1
resolve 1
shall 3
died 1
vain 1
under 1
God 1
birth 1
freedom 1
government 1
people 3
by 1
perish 1
earth 1
Abraham 1
Lincoln 1
November 1
19 1
1863 1

Please refer to the screenshots below for correct indentations


Related Solutions

Write a program that reads a text file and reports the total count of words of...
Write a program that reads a text file and reports the total count of words of each length. A word is defined as any contiguous set of alphanumeric characters, including symbols. For example, in the current sentence there are 10 words. The filename should be given at the command line as an argument. The file should be read one word at a time. A count should be kept for how many words have a given length. For example, the word...
Write a python program that does the following: Prompt for a file name of text words....
Write a python program that does the following: Prompt for a file name of text words. Words can be on many lines with multiple words per line. Read the file and convert the words to a list. Call a function you created called list_to_once_words(), that takes a list as an argument and returns a list that contains only words that occurred once in the file. Print the results of the function with an appropriate description. Think about everything you must...
● Write a program that reads words from a text file and displays all the words...
● Write a program that reads words from a text file and displays all the words (duplicates allowed) in ascending alphabetical order. The words must start with a letter. Must use ArrayList. MY CODE IS INCORRECT PLEASE HELP THE TEXT FILE CONTAINS THESE WORDS IN THIS FORMAT: drunk topography microwave accession impressionist cascade payout schooner relationship reprint drunk impressionist schooner THE WORDS MUST BE PRINTED ON THE ECLIPSE CONSOLE BUT PRINTED OUT ON A TEXT FILE IN ALPHABETICAL ASCENDING ORDER...
● Write a program that reads words from a text file and displays all the words...
● Write a program that reads words from a text file and displays all the words (duplicates allowed) in ascending alphabetical order. The words must start with a letter. Must use ArrayList. THE TEXT FILE CONTAINS THESE WORDS IN THIS FORMAT: drunk topography microwave accession impressionist cascade payout schooner relationship reprint drunk impressionist schooner THE WORDS MUST BE PRINTED ON THE ECLIPSE CONSOLE BUT PRINTED OUT ON A TEXT FILE IN ALPHABETICAL ASCENDING ORDER IS PREFERRED THANK YOU IN ADVANCE...
Python program: Write a program that reads a text file named test_scores.txt to read the name...
Python program: Write a program that reads a text file named test_scores.txt to read the name of the student and his/her scores for 3 tests. The program should display class average for first test (average of scores of test 1) and average (average of 3 tests) for each student. Expected Output: ['John', '25', '26', '27'] ['Michael', '24', '28', '29'] ['Adelle', '23', '24', '20'] [['John', '25', '26', '27'], ['Michael', '24', '28', '29'], ['Adelle', '23', '24', '20']] Class average for test 1...
Design and write a python program that reads a file of text and stores each unique...
Design and write a python program that reads a file of text and stores each unique word in some node of binary search tree while maintaining a count of the number appearance of that word. The word is stored only one time; if it appears more than once, the count is increased. The program then prints out 1) the number of distinct words stored un the tree, Function name: nword 2) the longest word in the input, function name: longest...
Write a java program: Write a program that creates a text file. Write to the file...
Write a java program: Write a program that creates a text file. Write to the file three lines each line having a person's name. In the same program Append to the file one line of  'Kean University'.  In the same program then Read the file and print the four lines without lines between.
Write a C program to find out the number of words in an input text file...
Write a C program to find out the number of words in an input text file (in.txt). Also, make a copy of the input file. Solve in C programming.
Write a python program function to check the frequency of the words in text files. Make...
Write a python program function to check the frequency of the words in text files. Make sure to remove any punctuation and convert all words to lower case. If my text file is like this: Hello, This is Python Program? thAt chEcks% THE freQuency of the words! When is printed it should look like this: hello 1 this 1 is 1 python 1 program 1 that 1 checks 1 the 2 frequency 1 of 1 words 1
C++ Code You will write a program to process the lines in a text file using...
C++ Code You will write a program to process the lines in a text file using a linked list and shared pointers. You will create a class “Node” with the following private data attributes: • Line – line from a file (string) • Next (shared pointer to a Node) Put your class definition in a header file and the implementation of the methods in a .cpp file. The header file will be included in your project. If you follow the...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT