In: Computer Science
Lesson Assignment
There's not much content to this lesson other than to create the table of words and counts (which will be a list of tuples). The words are already parsed out for you (same as the previous lesson).
Build the following three functions:
def clean(words):
def build_table(words):
def top_n(table, n):
Notes:
the function top_n does not have to worry about the order of items for those words that have the same count. This feature is called stable sorting -- where the items after the sort will always be in the same order (more discussion in the extra credit). You can use collections.Counter to help you with this lesson, but it will NOT return a stable order.
Be sure to test your pipeline on multiple texts. Each 'run' should not affect others:
v1 = list(pipeline(['a','b','c'], 5))
v2 = list(pipeline(['a','b','c'], 5))
print(v1 == v2)
Below is a screen shot of the python program to check indentation. Comments are given on every line explaining the code. Below is the output of the program: Below is the code to copy: #CODE STARTS HERE----------------
def clean(words): #Converts each word to lowercase return [x.lower() for x in words] def build_table(words): #Use dictionary to count words count = dict() for i in words: #loop through every word #Increment counter by 1 if the word is already present #Or add the new word to the dict count[i] = count.get(i, 0) + 1 return count def top_n(table, n): #Sorts the table dict and returns top 'n' words list_of_tup = [] #Used to store the list of tuples counter = 0 #Counter to filter top 'n' words #Sort dict using sorted() and loop through its key,value pair for k, v in sorted(table.items(), key=lambda item: item[1],reverse=True): if counter>=n: break list_of_tup.append((k,v)) #Append to the list of tuples counter+=1 return list_of_tup def pipeline(words,n): #Custom pipeline created to run the test case given in question # Calling all 3 functions cleaned = clean(words) counter_dict = build_table(cleaned) top_tup = top_n(counter_dict,n) return top_tup #Returns the list of top tuples v1 = list(pipeline(['a','b','c','A'],2)) #I have added "A" to the list for testing v2 = list(pipeline(['a','b','c'],2)) print(v1) #Prints v1 print(v2) #Prints v2 print(v1 == v2) #CODE ENDS HERE------------------