In: Computer Science
Lesson Assignment
There's not much content to this lesson other than to create the table of words and counts (which will be a list of tuples). The words are already parsed out for you (same as the previous lesson).
Build the following three functions:
def clean(words):
def build_table(words):
def top_n(table, n):
Notes:
the function top_n does not have to worry about the order of items for those words that have the same count. This feature is called stable sorting -- where the items after the sort will always be in the same order (more discussion in the extra credit). You can use collections.Counter to help you with this lesson, but it will NOT return a stable order.
Be sure to test your pipeline on multiple texts. Each 'run' should not affect others:
v1 = list(pipeline(['a','b','c'], 5))
v2 = list(pipeline(['a','b','c'], 5))
print(v1 == v2)
Below is a screen shot of the python program to check indentation. Comments are given on every line explaining the code.Below is the output of the program:
Below is the code to copy: #CODE STARTS HERE----------------
def clean(words): #Converts each word to lowercase
   return [x.lower() for x in words]
def build_table(words): #Use dictionary to count words
   count = dict()
   for i in words: #loop through every word
      #Increment counter by 1 if the word is already present
      #Or add the new word to the dict
      count[i] = count.get(i, 0) + 1
   return count
def top_n(table, n): #Sorts the table dict and returns top 'n' words
   list_of_tup = [] #Used to store the list of tuples
   counter = 0 #Counter to filter top 'n' words
   #Sort dict using sorted() and loop through its key,value pair
   for k, v in sorted(table.items(), key=lambda item: item[1],reverse=True):
      if counter>=n:
         break
      list_of_tup.append((k,v)) #Append to the list of tuples
      counter+=1
   return list_of_tup
def pipeline(words,n): #Custom pipeline created to run the test case given in question
   # Calling all 3 functions
   cleaned = clean(words)
   counter_dict = build_table(cleaned)
   top_tup = top_n(counter_dict,n)
   return top_tup #Returns the list of top tuples
v1 = list(pipeline(['a','b','c','A'],2)) #I have added "A" to the list for testing
v2 = list(pipeline(['a','b','c'],2))
print(v1) #Prints v1
print(v2) #Prints v2
print(v1 == v2)
#CODE ENDS HERE------------------