Question

In: Computer Science

Please complete in Python and neatly explain and format code. Use snake case style when defining...

Please complete in Python and neatly explain and format code. Use snake case style when defining variables. Write a program named wordhistogram.py which takes one file as an argument. The file is an plain text file(make your own) which shall be analyzed by the program. Upon completing the analysis, the program shall output a report detailing the shortest word(s), the longest word(s), the most frequently used word(s), and a histogram of all the words used in the input file.

If there is a tie, then all words that are of the same length for that classification (longest, shortest, most frequent) are displayed as part of that class.

A word histogram shows an approximate representation of the distribution of words used in the input file. An example text, The_Jungle_Upton_Sinclair.txt, is provided as a starting point for analyzing natural language. Draw your histogram by listing the word first and then print up to 65 * characters to represent the frequency of the word.

Since there is limited space on a terminal, ensure that your histogram does not wrap along the right edge of the terminal. Assume that the width of the histogram can not be wider than 65 characters. In calculating your histogram, map the highest frequency to 65 characters. For example, if the text has a word that appears 2000 times and it is the most frequently used word, then divide 2000 by 65 to approximate that each * character represents 30 occurrences of the word in question in the text. Thus if a word should appear less than 30 times, it receives zero * characters, a word that appeared 125 time would receive 4 * characters (0-30, 31-60, 61-90, 91-120, 120-150).

Print the order of the histogram from most frequent to least frequent.

The program must have a class named WordHistogram. This class must be the data structure that keeps track of the words that appear in the input text file and can return the histogram as a single string. The main function is responsible for opening and reading the the input text file.

Make sure your WordHistogram class has data members that are correctly named (use the underscore character!), has an initializer, and any necessary methods and data members.

In your main, use the given function to read from the filehandle.

def word_iterator(file_handle):
""" This iterates through all the words of a given file handle. """
  for line in file_handle:
    for word in line.split():
      yield word

DO NOT COMPUTE OR STORE THE HISTOGRAM OUTSIDE OF AN OBJECT named WordHistogram

Example Output

$ ./wordhistogram.py Candide_Voltaire.txt
Word Histogram Report
the (2179) ******************************************************************
of (1233) *************************************
to (1130) **********************************
and (1127) **********************************
a (863) **************************
in (623) ******************
i (446) *************
was (434) *************
that (414) ************
he (410) ************
with (395) ***********
is (348) **********
his (333) **********
you (317) *********
said (302) *********
not (276) ********
...
$

Solutions

Expert Solution

First we get all the words from the file using the function and store it into a variable called all_words:

def word_iterator(file_handle):
    """ This iterates through all the words of a given file handle. """
    for line in file_handle:
        for word in line.split():
            yield word


if __name__ == '__main__':
    file_name = sys.argv[1]  # get the name of the file from the command line
    file_handle = open(file_name)
    file_iterator = word_iterator(file_handle)

    all_words = []
    for word in file_iterator:
        all_words.append(word.lower())  # store all the words to the all_words list

Next we define the class and the necessary class variables:

class WordHistogram:
    def __init__(self, all_words):
        self.all_words = all_words  # stores all the words from the file
        self.shortest_words = []  # we use a list because there could be more than one shortest word
        self.longest_words = []  # we use a list because there could be more than one longest word
        self.freq = {}  # to store the frequency of all the words

Then we define the class functions :

    def find_shortest_words(self):
        """
        Function to calculate the shortest word(s)
        """
        shortest_length = len(self.all_words[0])  # set the shortest word length as the length of the first word

        for word in all_words[1:]:  # loop over all the words except the first word
            shortest_length = min(shortest_length, len(word))

        # as we found the shortest word length we find all the words equal to that length
        for word in all_words:
            if len(word) == shortest_length:
                self.shortest_words.append(word)

    def find_longest_words(self):
        """
        Function to calculate the longest word(s)
        """
        longest_length = 0  # set the longest word length to 0 as all words will be greater than 0

        for word in all_words:  # loop over all the words to find the longest word
            longest_length = max(longest_length, len(word))

        # as we found the longest word length we find all the words equal to that length
        for word in all_words:
            if len(word) == longest_length:
                self.longest_words.append(word)

    def calculate_word_frequency(self):
        """
        Function to calculate the word frequency
        """

        # We loop over all the words an add it to the dictionary. If it is already present in the
        # dictionary, we increment the count. If it is not present in freq, then we get a KeyError.
        # In that case, we set the count to 1.
        for word in self.all_words:
            try:
                self.freq[word] += 1
            except KeyError:
                self.freq[word] = 1

        # We sort the  dictionary based on the descending value of the word frequency count, as mentioned
        # in the question
        self.freq = {key: value for key, value in sorted(self.freq.items(), key=lambda item: item[1], reverse=True)}

    def print_histogram(self):
        """
        Function to print the Word Histogram Report
        """

        print("Word Histogram Report")
        print("---------------------")

        print("Shortest word(s):")
        print(", ".join(set(self.shortest_words)))
        print()

        print("Longest word(s):")
        print(", ".join(set(self.longest_words)))
        print()

        print("Word frequency:")
        max_count = len(self.longest_words[0])  # get the word length of the longest word
        for key in self.freq:
            count = self.freq[key]
            number_of_stars = count / max_count * 65  # calculate the number of stars to be displayed
            print(f"{key} ({count}) " + "*" * int(number_of_stars))

Now we need to get the words from the file and call the necessary class functions from the main method:

if __name__ == '__main__':
    file_name = sys.argv[1]  # get the name of the file from the command line
    file_handle = open(file_name)
    file_iterator = word_iterator(file_handle)

    all_words = []
    for word in file_iterator:
        # store all the words to the all_words list
        all_words.append(word.lower())  # since case doesn't matter, we convert the words to lower case

    histogram = WordHistogram(all_words)  # pass all the words to the WordHistogram class
    histogram.find_longest_words()
    histogram.find_shortest_words()
    histogram.calculate_word_frequency()
    histogram.print_histogram()

The complete code is below:

import sys


class WordHistogram:
    def __init__(self, all_words):
        self.all_words = all_words  # stores all the words from the file
        self.shortest_words = []  # we use a list because there could be more than one shortest word
        self.longest_words = []  # we use a list because there could be more than one longest word
        self.freq = {}  # to store the frequency of all the words

    def find_shortest_words(self):
        """
        Function to calculate the shortest word(s)
        """
        shortest_length = len(self.all_words[0])  # set the shortest word length as the length of the first word

        for word in all_words[1:]:  # loop over all the words except the first word
            shortest_length = min(shortest_length, len(word))

        # as we found the shortest word length we find all the words equal to that length
        for word in all_words:
            if len(word) == shortest_length:
                self.shortest_words.append(word)

    def find_longest_words(self):
        """
        Function to calculate the longest word(s)
        """
        longest_length = 0  # set the longest word length to 0 as all words will be greater than 0

        for word in all_words:  # loop over all the words to find the longest word
            longest_length = max(longest_length, len(word))

        # as we found the longest word length we find all the words equal to that length
        for word in all_words:
            if len(word) == longest_length:
                self.longest_words.append(word)

    def calculate_word_frequency(self):
        """
        Function to calculate the word frequency
        """

        # We loop over all the words an add it to the dictionary. If it is already present in the
        # dictionary, we increment the count. If it is not present in freq, then we get a KeyError.
        # In that case, we set the count to 1.
        for word in self.all_words:
            try:
                self.freq[word] += 1
            except KeyError:
                self.freq[word] = 1

        # We sort the  dictionary based on the descending value of the word frequency count, as mentioned
        # in the question
        self.freq = {key: value for key, value in sorted(self.freq.items(), key=lambda item: item[1], reverse=True)}

    def print_histogram(self):
        """
        Function to print the Word Histogram Report
        """

        print("Word Histogram Report")
        print("---------------------")

        print("Shortest word(s):")
        print(", ".join(set(self.shortest_words)))
        print()

        print("Longest word(s):")
        print(", ".join(set(self.longest_words)))
        print()

        print("Word frequency:")
        max_count = len(self.longest_words[0])  # get the word length of the longest word
        for key in self.freq:
            count = self.freq[key]
            number_of_stars = count / max_count * 65  # calculate the number of stars to be displayed
            print(f"{key} ({count}) " + "*" * int(number_of_stars))


def word_iterator(file_handle):
    """ This iterates through all the words of a given file handle. """
    for line in file_handle:
        for word in line.split():
            yield word


if __name__ == '__main__':
    file_name = sys.argv[1]  # get the name of the file from the command line
    file_handle = open(file_name)
    file_iterator = word_iterator(file_handle)

    all_words = []
    for word in file_iterator:
        # store all the words to the all_words list
        all_words.append(word.lower())  # since case doesn't matter, we convert the words to lower case

    histogram = WordHistogram(all_words)  # pass all the words to the WordHistogram class
    histogram.find_longest_words()
    histogram.find_shortest_words()
    histogram.calculate_word_frequency()
    histogram.print_histogram()

If you want to change the way the report is displayed, you can edit the print_histogram function.


Related Solutions

Please complete in Python and neatly explain and format code. Use snake case style when defining...
Please complete in Python and neatly explain and format code. Use snake case style when defining variables. Write a program named wordhistogram.py which takes one file as an argument. The file is an plain text file(make your own) which shall be analyzed by the program. Upon completing the analysis, the program shall output a report detailing the shortest word(s), the longest word(s), the most frequently used word(s), and a histogram of all the words used in the input file. If...
Use Python to Complete the following on a single text file and submit your code and...
Use Python to Complete the following on a single text file and submit your code and your output as separate documents. For each problem create the necessary list objects and write code to perform the following examples: Sum all the items in a list. Multiply all the items in a list. Get the largest number from a list. Get the smallest number from a list. Remove duplicates from a list. Check a list is empty or not. Clone or copy...
Use a style sheet to define the following rules and implement the given HTML code. Please...
Use a style sheet to define the following rules and implement the given HTML code. Please put your style information within the same file as the HTML code. Rules • Hyperlinks using the nodec class should display no decoration. • Hyperlinks should display text in white with a green background color when the mouse pointer is held over the link. (use the hover pseudo-class) • Unordered lists not nested within any other lists should be displayed in blue text and...
Please write a python code for the following. Use dictionaries and list comprehensions to implement the...
Please write a python code for the following. Use dictionaries and list comprehensions to implement the functions defined below. You are expected to re-use these functions in implementing other functions in the file. Include a triple-quoted string at the bottom displaying your output. Here is the starter outline for the homework: a. def count_character(text, char): """ Count the number of times a character occurs in some text. Do not use the count() method. """ return 0 b. def count_sentences(text): """...
Please write a python code for the following. Use dictionaries and list comprehensions to implement the...
Please write a python code for the following. Use dictionaries and list comprehensions to implement the functions defined below. You are expected to re-use these functions in implementing other functions in the file. Include a triple-quoted string at the bottom displaying your output. Here is the starter outline for the homework: g. def big_words(text, min_length=10): """ Return a list of big words whose length is at least min_length """ return [] h. def common_words(text, min_frequency=10): """ Return words occurring at...
PLEASE USE PYTHON CODE 7. Use Newton's method to find the polynomial that fits the following...
PLEASE USE PYTHON CODE 7. Use Newton's method to find the polynomial that fits the following points: x = -3, 2, -1, 3, 1 y = 0, 5, -4, 12, 0
Python Explain Code #Python program class TreeNode:
Python Explain Code   #Python program class TreeNode:    def __init__(self, key):        self.key = key        self.left = None        self.right = Nonedef findMaxDifference(root, diff=float('-inf')):    if root is None:        return float('inf'), diff    leftVal, diff = findMaxDifference(root.left, diff)    rightVal, diff = findMaxDifference(root.right, diff)    currentDiff = root.key - min(leftVal, rightVal)    diff = max(diff, currentDiff)     return min(min(leftVal, rightVal), root.key), diff root = TreeNode(6)root.left = TreeNode(3)root.right = TreeNode(8)root.right.left = TreeNode(2)root.right.right = TreeNode(4)root.right.left.left = TreeNode(1)root.right.left.right = TreeNode(7)print(findMaxDifference(root)[1])
Important: please use python. Using while loop, write python code to print the times table (from...
Important: please use python. Using while loop, write python code to print the times table (from 0 to 20, incremented by 2) for number 5. Add asterisks (****) so the output looks exactly as shown below.   Please send the code and the output of the program. ****************************************************************** This Program Shows Times Table for Number 5 (from 0 to 20) Incremented by 2 * ****************************************************************** 0 x 5 = 0 2 x 5 = 10 4 x 5 = 20 6...
please use linux or unix to complete, and include pictures of the output. Modify the code...
please use linux or unix to complete, and include pictures of the output. Modify the code below to implement the program that will sum up 1000 numbers using 5 threads. 1st thread will sum up numbers from 1-200 2nd thread will sum up numbers from 201 - 400 ... 5th thread will sum up numbers from 801 - 1000 Make main thread wait for other threads to finish execution and sum up all the results. Display the total to the...
Please complete in MASM (x86 assembly language). Use the code below to get started. Use a...
Please complete in MASM (x86 assembly language). Use the code below to get started. Use a loop with indirect or indexed addressing to reverse the elements of an integer array in place. Do not copy the elements to any other array. Use the SIZEOF, TYPE, and LENGTHOF operators to make the program as flexible as possible if the array size and type should be changed in the future. .386 .model flat,stdcall .stack 4096 ExitProcess PROTO,dwExitCode:DWORD .data    ; define your...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT