In: Computer Science
1 Introduction
You will write a program to find the top k most repeated words in a
file, where k will be an input integer provided by the user. It is
preferred, but not necessary, that your algorithm is as efficient as
possible, both in time to process the input as well as optimizing
memory management.
2 Input
The input is one text file, with at most 1000 different words,
separated by spaces or punctuation symbols. Repeated spaces should
be considered a sone space.
Youcanassumethatwordswillbeatmost100characters long. The file may
also contain number that you can ignore. Example of input and
result file:
input.txt
-----------------------------------------------------------The
Hungry, hungry caterpillar a Eric Williams. Yu Xu Z ABC abc
hand-made Abc Denon xyz 123456 the end
-----------------------------------------------------------
3 Program and input specification
The main program should be called ”topkword” (exactly like this, in
lower case, avoid other names). Call syntax (assuming your main
program is topkword.py):
python3 topkword "input=input.txt;k=2;output=topwords.txt"
A word is regular expression of letters: letter+, where letters can
be lower or upper case. Therefore, numbers and strings mixing
letters + numbers can be ignored.
Noticethatwordscanbefollowedbypunctuationsignsandthatsomeofthemcanbecapitalized.
This should not interfere with your results: a capitalized
word is the same as a non-capitalized word for the purposes of this
homework. Uppercase letters can be anywhere in the word, for ex.:
”Hungry”, ”hungry” and ”hUngry” should be counted as the same word.
Punctuation can also be ignored. Notice that the file name will not
necessarily be the same every time. Therefore ,your program will
have to parse the input file name. The output should be sent to the
output file exactly in the format below with each word and its
frequency, one per line. Avoid changing the output format or adding
any other messages since this will prevent automated testing.
Output example with k = 2:
abc 3 hungry 2
4 Requirements • Programming language: Python, version 3 (any)
meeting the requirements below. • Recursive functions are required.
It is unacceptable to have loops (while/for) to process the
list(s). While loops are acceptable only to read the input file, but
recursive functions are encouraged. • Lists (dynamically sized) are
required to store the list of unique words. A program using arrays
to store words will receive a low grade. However, (small) arrays
for input parameters or other auxiliary variables are acceptable. •
The output should always be in lowercase, regardless of how the
words appear in the text. Therefore, your program should internally
convert words to lower case. • Correctness is the most important
requirement: TEST your program with many expressions. Your program
should not crash or produce exceptions. • The list needs to be
passed as function argument and/or function return value, but
cannot be a global variable. The program must be prepared to handle
zeroes or simple errors like spaces or missing numbers extra credit
for developing both forward and backward recursion starting from
either side of the list
import sys
import re
import string
def getListFromStr(str):
xs = [x.lower() for x in
re.findall(r"[\w']+|[.,!?;]", str)]
return [s for s in xs if s not in
string.punctuation]
#Read input from console
inputCmd = str(sys.argv[1])
inputDict = dict(map(lambda x: x.split('='),
inputCmd.split(';')))
inputPath = inputDict['input']
k = int(inputDict['k'])
outputPath = inputDict['output']
#Read input from file
inputFile = open(inputPath, "r")
content = str(inputFile.read())
strList = sorted(getListFromStr(content))
#recursive function
def recursiveCount(ls, result={}):
if ls:
elem = ls.pop()
if elem in result.keys():
result[elem] =
result[elem] + 1
else:
result[elem] =
1
return recursiveCount(ls,
result)
return sorted(result.items(), key=lambda x: (-x[1],x[0]))[:k]
wordWithCount = recursiveCount(strList)
#Write result to file
result = " ".join( k + " "+ str(v) for (k, v) in
wordWithCount)
print(result)
outputFile = open(outputPath,"w+")
outputFile.write(result)
outputFile.close()