Question

In: Computer Science

1 Introduction You will write a program to find the top k most repeated words in...

1 Introduction
You will write a program to find the top k most repeated words in a file, where k will be an input integer provided by the user. It is preferred, but not necessary, that your algorithm is as efficient as possible, both in time to process the input as well as optimizing memory management.
2 Input
The input is one text file, with at most 1000 different words, separated by spaces or punctuation symbols. Repeated spaces should be considered a sone space. Youcanassumethatwordswillbeatmost100characters long. The file may also contain number that you can ignore. Example of input and result file:
input.txt -----------------------------------------------------------The Hungry, hungry caterpillar a Eric Williams. Yu Xu Z ABC abc hand-made Abc Denon xyz 123456 the end -----------------------------------------------------------
3 Program and input specification
The main program should be called ”topkword” (exactly like this, in lower case, avoid other names). Call syntax (assuming your main program is topkword.py):
python3 topkword "input=input.txt;k=2;output=topwords.txt"
A word is regular expression of letters: letter+, where letters can be lower or upper case. Therefore, numbers and strings mixing letters + numbers can be ignored. Noticethatwordscanbefollowedbypunctuationsignsandthatsomeofthemcanbecapitalized. This should not interfere with your results: a capitalized
word is the same as a non-capitalized word for the purposes of this homework. Uppercase letters can be anywhere in the word, for ex.: ”Hungry”, ”hungry” and ”hUngry” should be counted as the same word. Punctuation can also be ignored. Notice that the file name will not necessarily be the same every time. Therefore ,your program will have to parse the input file name. The output should be sent to the output file exactly in the format below with each word and its frequency, one per line. Avoid changing the output format or adding any other messages since this will prevent automated testing. Output example with k = 2:
abc 3 hungry 2
4 Requirements • Programming language: Python, version 3 (any) meeting the requirements below. • Recursive functions are required. It is unacceptable to have loops (while/for) to process the list(s). While loops are acceptable only to read the input file, but recursive functions are encouraged. • Lists (dynamically sized) are required to store the list of unique words. A program using arrays to store words will receive a low grade. However, (small) arrays for input parameters or other auxiliary variables are acceptable. • The output should always be in lowercase, regardless of how the words appear in the text. Therefore, your program should internally convert words to lower case. • Correctness is the most important requirement: TEST your program with many expressions. Your program should not crash or produce exceptions. • The list needs to be passed as function argument and/or function return value, but cannot be a global variable. The program must be prepared to handle zeroes or simple errors like spaces or missing numbers extra credit for developing both forward and backward recursion starting from either side of the list

Solutions

Expert Solution

import sys
import re
import string

def getListFromStr(str):
    xs = [x.lower() for x in re.findall(r"[\w']+|[.,!?;]", str)]
    return [s for s in xs if s not in string.punctuation]


#Read input from console
inputCmd = str(sys.argv[1])
inputDict = dict(map(lambda x: x.split('='), inputCmd.split(';')))
inputPath = inputDict['input']
k = int(inputDict['k'])
outputPath = inputDict['output']

#Read input from file
inputFile = open(inputPath, "r")
content = str(inputFile.read())
strList = sorted(getListFromStr(content))

#recursive function
def recursiveCount(ls, result={}):
   if ls:
       elem = ls.pop()
       if elem in result.keys():
           result[elem] = result[elem] + 1
       else:
           result[elem] = 1
       return recursiveCount(ls, result)  

   return sorted(result.items(), key=lambda x: (-x[1],x[0]))[:k]

wordWithCount = recursiveCount(strList)

#Write result to file
result = " ".join( k + " "+ str(v) for (k, v) in wordWithCount)
print(result)
outputFile = open(outputPath,"w+")
outputFile.write(result)
outputFile.close()


Related Solutions

1. INTRODUCTION The goal of this programming assignment is for students to write a Python program...
1. INTRODUCTION The goal of this programming assignment is for students to write a Python program that uses repetition (i.e. “loops”) and decision structures to solve a problem. 2. PROBLEM DEFINITION  Write a Python program that performs simple math operations. It will present the user with a menu and prompt the user for an option to be selected, such as: (1) addition (2) subtraction (3) multiplication (4) division (5) quit Please select an option (1 – 5) from the...
Write a C program that counts the number of repeated characters in a phrase entered by...
Write a C program that counts the number of repeated characters in a phrase entered by the user and prints them. If none of the characters are repeated, then print “No character is repeated” For example: If the phrase is “full proof” then the output will be Number of characters repeated: 3 Characters repeated: f, l, o Note: Assume the length of the string is 10. ###Note: the output should print exactly as it is stated in the example if...
Introduction Write in C++ at the Linux command line a program that is the same as...
Introduction Write in C++ at the Linux command line a program that is the same as the previous collection app project but now uses a class to store the items and also can save the items to a file that can be read back into the array by the user when the program is re-started. You can use your project 1 submission as a starting point or you can do something new as long as it meets the listed requirements....
Statistics exercises Friedman’s K/One-way repeated measures ANOVA 1. Suppose you are interested in learning if practice...
Statistics exercises Friedman’s K/One-way repeated measures ANOVA 1. Suppose you are interested in learning if practice on the ACT improves test scores. You sample a random group of 10 people and ask them to take the ACT 1 time per week for 3 consecutive weeks. Use the data below to determine if practice improves test scores. Participant Test 1 Test 2 Test 3 1 18 23 24 2 20 22 26 3 21 24 23 4 19 25 28 5...
1. His introduction is a maximum of 500 words. The introduction is very important because it...
1. His introduction is a maximum of 500 words. The introduction is very important because it defines the context of the report. Provide a short introduction and background on the water supply system design. Presentation of water flow concepts in the piping system focusing on the energy equation (Bernoulli equation), head losses and momentum. Support all information with references. Submit goals at the end of the section. Writing 500 words
JAVA Write a program that checks the spelling of words in a document. This program uses...
JAVA Write a program that checks the spelling of words in a document. This program uses two text files: A dictionary file containing all known correctly spelled words, and A document to be spell-checked against the dictionary. As the document to be spell checked is read, each of its words is checked against the dictionary words. The program determines whether each word is misspelled. Misspelled words are recorded. is spelled correctly. Correctly spelled words are recorded and their frequency counted....
Write a program in python that corrects misspelled words. The input for the program is either...
Write a program in python that corrects misspelled words. The input for the program is either a string or a list of strings. The output should be a string or a list depending on the input and should have the spellings corrected. The speller should be able to correct at least the words listed below. You are free to use any data structures we have covered so far including lists and dictionaries. Your program should be in autocorrect.py. Here is...
Write a C program that does the following. (a) Start with comment statements at the top...
Write a C program that does the following. (a) Start with comment statements at the top (before your includes) with the following info: Name, Date (b) Write the following functions: void setupRandArray(int n, int x[], int min, int max) void printArray(int n, int x[], char label[]) float getAverage(int n, int x[]) int getMaximum(int n, int x[]) int getCountInRange(int n, int x[], int min, int max) Note: This function returns the number elements of x that are between min and max...
In your own words, write an introduction and conclusion, in regrds to customer privacy in business?
In your own words, write an introduction and conclusion, in regrds to customer privacy in business?
1.You are required to form a business and to write a brief introduction about it (name,...
1.You are required to form a business and to write a brief introduction about it (name, services or products, legal form, merchandising or manufacturing, retail or wholesale, size, industry, etc…) 2. In deciding on the legal form for your business, discuss the reasons for your choice. 3. Discuss the different financing options to set up your business and the reasons for your choice (s). 4. Identify the users of accounting information for your business and briefly discuss on how each...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT