In: Computer Science
Write a code to find the following in a text file (Letter). language: Python
(a) Find the 20 most common words
(b) How many unique words are used?
(c) How many words are used at least 5 times?
(d) Write the 200 most common words, and their counts, to a
file.
text file:
Look in thy glass and tell the face thou viewest,
Now is the time that face should form another,
Whose fresh repair if now thou not renewest,
Thou dost beguile the world, unbless some mother.
For where is she so fair whose uneared womb
Disdains the tillage of thy husbandry?
Or who is he so fond will be the tomb,
Of his self-love to stop posterity?
Thou art thy mother's glass and she in thee
Calls back the lovely April of her prime,
So thou through windows of thine age shalt see,
Despite of wrinkles this thy golden time.
But if thou live remembered not to be,
Die single and thine image dies with thee.
4
Unthrifty loveliness why dost thou spend,
Upon thy self thy beauty's legacy?
Nature's bequest gives nothing but doth lend,
And being frank she lends to those are free:
Then beauteous niggard why dost thou abuse,
The bounteous largess given thee to give?
Profitless usurer why dost thou use
So great a sum of sums yet canst not live?
For having traffic with thy self alone,
Thou of thy self thy sweet self dost deceive,
Then how when nature calls thee to be gone,
What acceptable audit canst thou leave?
Thy unused beauty must be tombed with thee,
Which used lives th' executor to be.
5
Those hours that with gentle work did frame
The lovely gaze where every eye doth dwell
Will play the tyrants to the very same,
And that unfair which fairly doth excel:
For never-resting time leads summer on
To hideous winter and confounds him there,
Sap checked with frost and lusty leaves quite gone,
Beauty o'er-snowed and bareness every where:
Then were not summer's distillation left
A liquid prisoner pent in walls of glass,
Beauty's effect with beauty were bereft,
Nor it nor no remembrance what it was.
But flowers distilled though they with winter meet,
Leese but their show, their substance still lives sweet.
import collections
file = open('file1.txt', encoding="utf8")
b= file.read()
unique=0
time5=0
wc = {}
for w in b.lower().split():
w = w.replace(".","")
w = w.replace(",","")
w = w.replace(":","")
w = w.replace("\"","")
w = w.replace("!","")
w = w.replace("â??","")
w = w.replace("â??","")
w = w.replace("*","")
if w not in wc:
wc[w] = 1
else:
wc[w] += 1
n_print = int(input("How many most appeared words to print:
"))
print("\nOK. The {} most counted words in the txt file are as
follows\n".format(n_print))
word_counter = collections.Counter(wc)
for w, c in word_counter.most_common(n_print):
print(w, ": ", c)
print("\nOK. all unique words are as
follows\n".format(n_print))
for w, c in word_counter.most_common():
if c==1:
print(w, ": ", c)
unique=unique+1
print("\nOK. Total number of unique words are ",unique )
print("\nOK. all words with count more than 5 are as
follows\n".format(n_print))
for w, c in word_counter.most_common():
if c>=5:
print(w, ": ", c)
time5=time5+1
print("\nOK. Total number of words appeared equal or more than 5
times are ",time5 )
print("\nOK. The 200 most common words are in the file
com200.txt\n".format(n_print))
word_counter = collections.Counter(wc)
f1= open("com200.txt","w+")
for w, c in word_counter.most_common(200):
f1.write(w)
f1.write("\n")
f1.close()
# Close the file
file.close()
#############################################################################
output:
##########################################################################
How many most appeared words to print: 20
OK. The 20 most counted words in the txt file are as follows
thou : 11
thy : 10
the : 10
and : 8
of : 8
to : 8
with : 7
dost : 5
be : 5
thee : 5
not : 4
so : 4
but : 4
self : 4
in : 3
glass : 3
is : 3
time : 3
that : 3
for : 3
OK. all unique words are as follows
ok : 1
tell : 1
viewest : 1
should : 1
form : 1
another : 1
fresh : 1
repair : 1
renewest : 1
beguile : 1
world : 1
.
..........
OK. Total number of unique words are 150
OK. all words with count more than 5 are as follows
thou : 11
thy : 10
the : 10
and : 8
of : 8
to : 8
with : 7
dost : 5
be : 5
thee : 5
OK. Total number of words appeared equal or more than 5 times are
10
OK. The 200 most common words are in the file:com200.txt