Question

In: Computer Science

The final step in our authorship attribution system will be to perform authorship attribution based on a selection of sample documents from a range of authors, and a document of unknown origin.

Python3

The final step in our authorship attribution system will be to perform authorship attribution based on a selection of sample documents from a range of authors, and a document of unknown origin. You will be given a selection of sample documents from a range of authors (from which we will learn our word frequency dictionaries), and a document of unknown origin. Given these, you need to return a list of authors in ascending order of out-of-place distance between the document of unknown origin and the combined set of documents from each of the authors. You should do this according to the following steps: compute a single dictionary of word frequencies for each author based on the combined set of documents from that author (provided in the form of a list of strings) compute a dictionary of word frequencies for the document of unknown origin compare the document of unknown origin with the combined works of each author, based on the out-of-place distance metric calculate and return a ranking of authors, from most similar (smallest distance) to least similar (greatest distance), resolving any ties in the ranking based on an alphabetic sort You have been provided with reference implementations of the functions authattr_worddict and authattr_oop from the preceding questions in order to complete this question, and should make use of these in your solution. These are provided via the from hidden_lib import authattr_worddict, authattr_oop statement, which must not removed from the header of your code for these functions to work. Write a function authattr_authorpred(authordict, unknown, maxrank) that takes three arguments: authordict: a dictionary of authors (each of which is a str), associated with a non-empty list of documents (each of which is a str) unknown: a str contained the document of unknown origin maxrank: the positive int value to set maxrank to in the call to authattr_oop and returns a list of (author, oop) tuples, where author is the name of an author from authordict, and oop is the out-of-place distance between unknown and the combined works of author, in the form of a float.

For example:

>>> authattr_authorpred({'tim': ['One One was a racehorse; Two Two was one too', 'How much wood could a woodchuck chuck'], 'einstein': ['Unthinking respect for authority is the greatest enemy of truth.', 'Not everything that can be counted counts, and not everything that counts can be counted.']}, 'She sells sea shells on the seashore', 20) [('tim', 287.0), ('einstein', 290.0)] >>> authattr_authorpred({'Beatles': ['Hey Jude', 'The Fool on the Hill', "A Hard Day's Night", "Yesterday"], 'Rolling Stones': ["(I Can't Get No) Satisfation", 'Ruby Tuesday', 'Paint it Black']}, 'Eleanor Rigby', 15) [('Beatles', 129.0), ('Rolling Stones', 129.0)]

Solutions

Expert Solution

Below is your answers: -

# import pre-implemented versions of Q3 and Q4 functions
from hidden_lib import authattr_worddict, authattr_oop


def authattr_authorpred(authordict, unknown, maxrank):
  
# Obtain the documents from each author
  
list_of_documents = [authordict[k] for k in authordict]
print(tuple((key, value) for key, value in authordict.items()))
print(list_of_documents)
print(authordict.values())
for i in authordict:
print(i, authordict[i])
for k, v in authordict.items():
print(k, v)
  

# Call authattr_worddict(value) on each value
#get dict with author_1 and dict with author_2
word_freq1 = authattr_worddict(str(list_of_documents[0]))
word_freq2 = authattr_worddict(str(list_of_documents[1]))
print(word_freq1) #dict with word freqs, the list [ are counted!!!
print(word_freq2)
  
# Make this a for loop to ensure cases with more than two docs will be incl
for document in list_of_documents:
print(document)

unknown_dict = authattr_worddict(unknown)
print(unknown_dict)
  
# Must compare unknown_dict to each of the others, throuhg authattr_oop()
# OR: must this be done for each of the texts from the author?
# MAKE THIS A LOOP SOMEHOW
comparisonlist = []
comparison1 = authattr_oop(word_freq1, unknown_dict, maxrank)
comparison2 = authattr_oop(word_freq2, unknown_dict, maxrank)
print(comparison1)
print(comparison2)
comparisonlist.append(comparison1)
comparisonlist.append(comparison2)
  
# Add the comparison value into a dict with the author as key, which
# shows comparison of authors to unknown doc
comparison_dict = {} # make this a defaultdict
for key in authordict.keys():
comparison_dict[key] = comparisonlist[0]
comparisonlist = comparisonlist[1:]
return sorted(comparison_dict.items())

Indentation


Related Solutions

Part II Based on our analysis of our sample data, we are asked to perform a...
Part II Based on our analysis of our sample data, we are asked to perform a formal hypothesis test (test of significance) to examine if the average sodium content of all Jupiter Bars may actually be more than 96 milligrams. 1) Develop the correct null and alternative hypotheses using standard statistics symbols, using words, or using both. 2) Based on the context of the question we are investigating and the nature of our sample data: a) Determine if using a...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT