In: Computer Science
Write a Python program for the following:
A given author will use roughly use the same proportion of, say, four-letter words in something she writes this year as she did in whatever she wrote last year. The same holds true for words of any length. BUT, the proportion of four-letter words that Author A consistently uses will very likely be different than the proportion of four-letter words that Author B uses. Theoretically, then, authorship controversies can sometimes be resolved by computing the proportion of 1-letter, 2-letter, 3-letter, ..., 13-letter words in the writing and then comparing it with the same statistics from known authors.
Your task is to write a Python program that computes the above statistics from any text file. Note that apostrophes do not count in the word length. For example, "he's" is a three-letter word. Words like hard-hearted should be replaced with two words with a space between them (hard hearted).
Name of input file: romeo_and_juliet.txt
Proportion of 1- letter words: 4.8% (1231 words)
Proportion of 2- letter words: 16.1% (4177 words)
Proportion of 3- letter words: 20.3% (5261 words)
Proportion of 4- letter words: 24.3% (6295 words)
Proportion of 5- letter words: 15.0% (3889 words)
Proportion of 6- letter words: 7.9% (2048 words)
Proportion of 7- letter words: 5.2% (1352 words)
Proportion of 8- letter words: 3.7% (953 words)
Proportion of 9- letter words: 1.5% (378 words)
Proportion of 10- letter words: 0.7% (190 words)
Proportion of 11- letter words: 0.3% (71 words)
Proportion of 12- letter words: 0.1% (20 words)
Proportion of 13- (or more) letter words: 0.0% (12 words)
Here the program is tested on the full text of Romeo and Juliet but it should work for any file. The sample run above shows the actual proportion and count of different sized words in the file. Hint: make sure to replace each character in ",.!?;:][-\"" in the text with a space before doing any splitting.
PROGRAM:
f=open("demo.txt")
st=f.read() #reading and storing data of a file
for i in ",.!?;:][-\"": #replace special characters
st=st.replace(i," ")
ss=st.split()
size=len(ss)
o=t=th=f=fi=s=se=e=n=te=el=tw=thir=0
for i in ss: #iterating through words and adding based on its
length
if len(i)==1:
o=o+1
elif len(i)==2:
t=t+1
elif len(i)==3:
th=th+1
elif len(i)==4:
f=f+1
elif len(i)==5:
fi=fi+1
elif len(i)==6:
s=s+1
elif len(i)==7:
se=se+1
elif len(i)==8:
e=e+1
elif len(i)==9:
n=n+1
elif len(i)==10:
te=te+1
elif len(i)==11:
el=el+1
elif len(i)==12:
tw=tw+1
elif len(i)>12:
thir=thir+1
print("proportion of 1-letter words:",round((o/size)*100,1),"%
(",o,"words)") #printing the proportion
print("proportion of 2-letter words:",round((t/size)*100,1),"%
(",t,"words)")
print("proportion of 3-letter words:",round((th/size)*100,1),"%
(",th,"words)")
print("proportion of 4-letter words:",round((f/size)*100,1),"%
(",f,"words)")
print("proportion of 5-letter words:",round((fi/size)*100,1),"%
(",fi,"words)")
print("proportion of 6-letter words:",round((s/size)*100,1),"%
(",s,"words)")
print("proportion of 7-letter words:",round((se/size)*100,1),"%
(",se,"words)")
print("proportion of 8-letter words:",round((e/size)*100,1),"%
(",e,"words)")
print("proportion of 9-letter words:",round((n/size)*100,1),"%
(",n,"words)")
print("proportion of 10-letter words:",round((t/size)*100,1),"%
(",te,"words)")
print("proportion of 11-letter words:",round((el/size)*100,1),"%
(",el,"words)")
print("proportion of 12-letter words:",round((tw/size)*100,1),"%
(",tw,"words)")
print("proportion of 13(or more)-letter
words:",round((thir/size)*100,1),"% (",thir,"words)")
SCREENSHOT: if
any indentation errors please verify with the screen
shot
file used for the following output:
OUTPUT: