In: Computer Science
Write a Python program that will process the text file, Gettysburg.txt, by calculating the total words and output the number of occurrences of each word in the file.
The program needs to open the file and process each line. You need to add each word to the dictionary with a frequency of 1 or update the word’s count by 1. You need to print the output from high to low frequency.
The program needs 4 functions.
The first function is called add_word where you add each word to the dictionary. The parameters are the word and a dictionary. There is no return value.
The second function is called Process_line where you strip off various characters, split out the words, and so on. The parameters are a line and a dictionary. It calls the add_word function with each processed word. There is no return value.
The third function is called Pretty_print where this will be the printing function. The parameter is a dictionary. There is no return value.
The fourth function is the main where it will open the file and call Process_line on each line. When finished, it will call the Pretty_print function to print the dictionary.
Gettysburg.txt
Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.
Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.
But, in a larger sense, we can not dedicate -- we can not consecrate -- we can not hallow -- this ground. The brave men, living and dead, who struggled here, have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember what we say here, but it can never forget what they did here. It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us -- that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion -- that we here highly resolve that these dead shall not have died in vain -- that this nation, under God, shall have a new birth of freedom -- and that government of the people, by the people, for the people, shall not perish from the earth.
Abraham Lincoln
November 19, 1863
To identify the words in a line, first remove all the special characters from the line like, "",.!~>? etc., symbols. There by perform the split() operation to divide the line into list of words. Then count the frequency of words.
Code:
#add_word() function
def add_word(word, dic):
#check a word is in dictionary or not
if word not in dic:
dic[word]=1
else:
dic[word]+=1
#process_line()
def process_line(line, dic):
punct = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
new_line =''
#remove the special characters from the line
for ch in line.strip():
if ch not in punct:
new_line+=ch
#split the new_line into words and call add_word()
words_list = new_line.split()
for word in words_list:
add_word(word, dic)
#print the total number of words and each words with it's number
of occurence
def pretty_print(dic):
total_words = sum(list(dic.values()))
print('Total words: {}'.format(total_words))
for key, value in dic.items():
print(key, value)
#main()
def main():
dic={} #dictionary
with open('Gettysburg.txt', 'r') as file: #opening file
for line in file.readlines(): #read each line
process_line(line, dic)
file.close()
pretty_print(dic) #call pretty_print() to print the dictionary
main() #call main()
Output:
Total words: 276
Four 1
score 1
and 6
seven 1
years 1
ago 1
our 2
fathers 1
brought 1
forth 1
on 2
this 4
continent 1
a 7
new 2
nation 5
conceived 2
in 4
Liberty 1
dedicated 4
to 8
the 9
proposition 1
that 13
all 1
men 2
are 3
created 1
equal 1
Now 1
we 8
engaged 1
great 3
civil 1
war 2
testing 1
whether 1
or 2
any 1
so 3
can 5
long 2
endure 1
We 2
met 1
battlefield 1
of 5
have 5
come 1
dedicate 2
portion 1
field 1
as 1
final 1
resting 1
place 1
for 5
those 1
who 3
here 8
gave 2
their 1
lives 1
might 1
live 1
It 3
is 3
altogether 1
fitting 1
proper 1
should 1
do 1
But 1
larger 1
sense 1
not 5
consecrate 1
hallow 1
ground 1
The 2
brave 1
living 2
dead 3
struggled 1
consecrated 1
it 2
far 2
above 1
poor 1
power 1
add 1
detract 1
world 1
will 1
little 1
note 1
nor 1
remember 1
what 2
say 1
but 1
never 1
forget 1
they 3
did 1
us 3
rather 2
be 2
unfinished 1
work 1
which 2
fought 1
thus 1
nobly 1
advanced 1
task 1
remaining 1
before 1
from 2
these 2
honored 1
take 1
increased 1
devotion 2
cause 1
last 1
full 1
measure 1
highly 1
resolve 1
shall 3
died 1
vain 1
under 1
God 1
birth 1
freedom 1
government 1
people 3
by 1
perish 1
earth 1
Abraham 1
Lincoln 1
November 1
19 1
1863 1
Please refer to the screenshots below for correct indentations