Question

In: Computer Science

Solve using PYTHON PROGRAMMING 9. Write a script that reads a file “ai_trends.txt”, into a list...

Solve using PYTHON PROGRAMMING

9. Write a script that reads a file “ai_trends.txt”, into a list of words, eliminates from the list of words the words in the file “stopwords_en.txt” and then

a. Calculates the average occurrence of the words. Occurrence is the number of times a word is appearing in the text

b. Calculates the longest word

c. Calculates the average word length. This is based on the unique words: each word counts as one

d. Create a bar chart with the 10 most frequent words.

Solve using PYTHON PROGRAMMING

Solutions

Expert Solution

import re
import operator
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure

stopwords = []
#Read stop words from file
filename = "stopwords_en.txt"
with open(filename, 'r') as file:
    content = file.readlines()
    #Read line by line, split using comma, add to list after stripping any extra space
    for aLine in content:
        words = aLine.split(",")
        for word in words:
            if word not in stopwords:
                stopwords.append(word.strip())
#print(stopwords)

textwords = []
#Read words from text file
filename = "ai_trends.txt"
with open(filename, 'r') as file:
    content = file.readlines()
    for aLine in content:
        words = aLine.split(" ")
        for word in words:
            #Remove punctuation from word using regex
            res = re.sub(r'[^\w\s]', '', word)
            textwords.append(res)
#print(textwords)

#Remove stop words from textwords
temp = []
for word in textwords:
    if word not in stopwords:
        temp.append(word)
textwords = temp

#Create dictionary of words with frequency
freq = dict()
for word in textwords:
    if len(word)==0:
        continue
    if word in freq:
        freq[word] += 1
    else:
        freq[word] = 1
      
#print(freq)

#Calculate average occurence
sum_ = 0
count = 0
for key,value in freq.items():
    sum_ += freq[key]
    count += 1
print("Average occurence: ", str(sum_/count))

#Calculate longest word
maximum = 0
max_word = ""
for key in freq.keys():
    if len(key)>maximum:
        maximum = len(key)
        max_word = key
print(max_word, maximum)

#Average length of unique words after stopwords removal
sum_ = 0
count = 0
for key in freq.keys():
    sum_ += len(key)
    count += 1
print("Average word length of unique words:", str(sum_/count))

#Sort the dictionary to find 10 most frequent words
newDict = dict( sorted(freq.items(), key=operator.itemgetter(1), reverse=True))
print(newDict)

#Create bar chart
i = 0
word = []
frequency = []
for key, value in newDict.items():
    if i==10:
        break;
    else:
        word.append(key)
        frequency.append(value)
    i += 1

#Set figure size
figure(num=None, figsize=(15,7), dpi=80)
plt.bar(word, frequency, color="orange")
plt.xlabel('Word')
plt.ylabel('Frequency')
plt.show()


Related Solutions

Solve using PYTHON PROGRAMMING Write a script that reads a file “cars.csv”, into a pandas structure...
Solve using PYTHON PROGRAMMING Write a script that reads a file “cars.csv”, into a pandas structure and then print a. the first 3 rows and the last 3 of the dataset b. the 3 cars with the lowest average-mileage c. the 3 cars with the highest average-mileage. Solve using PYTHON PROGRAMMING
This is a python file Reads information from a text file into a list of sublists....
This is a python file Reads information from a text file into a list of sublists. Be sure to ask the user to enter the file name and end the program if the file doesn’t exist. Text file format will be as shown, where each item is separated by a comma and a space: ID, firstName, lastName, birthDate, hireDate, salary Store the information into a list of sublists called empRoster. EmpRoster will be a list of sublists, where each sublist...
Implement in Python a script that does the following: 1) reads input from a supposed file...
Implement in Python a script that does the following: 1) reads input from a supposed file called firstnames_2.txt. 2) processes the input and writes and saves the output to a file. NOTE: Please make sure that the names are written in the outfile with one name on each line no comma ( , ) after the name in the output
Using Python The script is to open a given file. The user is to be asked...
Using Python The script is to open a given file. The user is to be asked what the name of the file is. The script will then open the file for processing and when done, close that file. The script will produce an output file based on the name of the input file. The output file will have the same name as the input file except that it will begin with "Analysis-". This file will be opened and closed by...
Using Python 3 Write a function reads the file called simpleinterest.txt. Each row in simpleinterest.txt is...
Using Python 3 Write a function reads the file called simpleinterest.txt. Each row in simpleinterest.txt is a comma seperated list of values in the order of PV, FV, n, r. For each row, there is one value missing. Write an output file that fills in the missing value for each row.
Write a python script to solve the 4-queens problem using. The code should allow for random...
Write a python script to solve the 4-queens problem using. The code should allow for random starting, and for placed starting. "The 4-Queens Problem[1] consists in placing four queens on a 4 x 4 chessboard so that no two queens can capture each other. That is, no two queens are allowed to be placed on the same row, the same column or the same diagonal." Display the iterations until the final solution Hill Climbing (your choice of variant)
Question: Write a program in python that reads in the file climate_data_2017_numeric.csv and prompts the user...
Question: Write a program in python that reads in the file climate_data_2017_numeric.csv and prompts the user to enter the name of a field (other than Date), and then outputs the highest and lowest values recorded in that field for the month of August. The file climate_data_2017_numeric.csv contains the following fields: Date Minimum temperature (C) Maximum temperature (C) Rainfall (mm) Speed of maximum wind gust (km/h) 9am Temperature (C) 9am relative humidity (%) 3pm Temperature (C) 3pm relative humidity (%) Expected...
C Programming Write a program in C that reads in a file, stores its contents as...
C Programming Write a program in C that reads in a file, stores its contents as a character array/pointer (char*) into an unsigned character array/pointer (unsigned char* message). Note: the input file can have one line or multiple lines and vary in length
Design and write a python program that reads a file of text and stores each unique...
Design and write a python program that reads a file of text and stores each unique word in some node of binary search tree while maintaining a count of the number appearance of that word. The word is stored only one time; if it appears more than once, the count is increased. The program then prints out 1) the number of distinct words stored un the tree, Function name: nword 2) the longest word in the input, function name: longest...
Write a MATLAB script file to numerically solve any first order initial value problem using Rulers...
Write a MATLAB script file to numerically solve any first order initial value problem using Rulers method. Once code is working use it to solve the mixing tank problem below. Use a step size of 1 minute, and simulate the solution until the tank contains no more salt. Plot both the Euler approximation and the exact solution on the same set of axes. A tank contains 100 gallons of fresh water. At t=0 minutes, a solution containing 1 lb/gal of...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT