Question

In: Computer Science

Solve using PYTHON PROGRAMMING 9. Write a script that reads a file “ai_trends.txt”, into a list...

Solve using PYTHON PROGRAMMING

9. Write a script that reads a file “ai_trends.txt”, into a list of words, eliminates from the list of words the words in the file “stopwords_en.txt” and then

a. Calculates the average occurrence of the words. Occurrence is the number of times a word is appearing in the text

b. Calculates the longest word

c. Calculates the average word length. This is based on the unique words: each word counts as one

d. Create a bar chart with the 10 most frequent words.

Solve using PYTHON PROGRAMMING

Solutions

Expert Solution

import re
import operator
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure

stopwords = []
#Read stop words from file
filename = "stopwords_en.txt"
with open(filename, 'r') as file:
    content = file.readlines()
    #Read line by line, split using comma, add to list after stripping any extra space
    for aLine in content:
        words = aLine.split(",")
        for word in words:
            if word not in stopwords:
                stopwords.append(word.strip())
#print(stopwords)

textwords = []
#Read words from text file
filename = "ai_trends.txt"
with open(filename, 'r') as file:
    content = file.readlines()
    for aLine in content:
        words = aLine.split(" ")
        for word in words:
            #Remove punctuation from word using regex
            res = re.sub(r'[^\w\s]', '', word)
            textwords.append(res)
#print(textwords)

#Remove stop words from textwords
temp = []
for word in textwords:
    if word not in stopwords:
        temp.append(word)
textwords = temp

#Create dictionary of words with frequency
freq = dict()
for word in textwords:
    if len(word)==0:
        continue
    if word in freq:
        freq[word] += 1
    else:
        freq[word] = 1
      
#print(freq)

#Calculate average occurence
sum_ = 0
count = 0
for key,value in freq.items():
    sum_ += freq[key]
    count += 1
print("Average occurence: ", str(sum_/count))

#Calculate longest word
maximum = 0
max_word = ""
for key in freq.keys():
    if len(key)>maximum:
        maximum = len(key)
        max_word = key
print(max_word, maximum)

#Average length of unique words after stopwords removal
sum_ = 0
count = 0
for key in freq.keys():
    sum_ += len(key)
    count += 1
print("Average word length of unique words:", str(sum_/count))

#Sort the dictionary to find 10 most frequent words
newDict = dict( sorted(freq.items(), key=operator.itemgetter(1), reverse=True))
print(newDict)

#Create bar chart
i = 0
word = []
frequency = []
for key, value in newDict.items():
    if i==10:
        break;
    else:
        word.append(key)
        frequency.append(value)
    i += 1

#Set figure size
figure(num=None, figsize=(15,7), dpi=80)
plt.bar(word, frequency, color="orange")
plt.xlabel('Word')
plt.ylabel('Frequency')
plt.show()


Related Solutions

Solve using PYTHON PROGRAMMING Write a script that reads a file “cars.csv”, into a pandas structure...
Solve using PYTHON PROGRAMMING Write a script that reads a file “cars.csv”, into a pandas structure and then print a. the first 3 rows and the last 3 of the dataset b. the 3 cars with the lowest average-mileage c. the 3 cars with the highest average-mileage. Solve using PYTHON PROGRAMMING
1)  Write a python program that opens a file, reads all of the lines into a list...
1)  Write a python program that opens a file, reads all of the lines into a list of strings, and closes the file. Use the Readlines() method. Test your programing using the names.txt file provided. 2) Convert the program into a function called loadFile, that receives the file name as a parameter and returns a list of strings. 3) Write a main routine that calls loadFIle three times to load the three data files given into three lists. Then choose a...
This is a python file Reads information from a text file into a list of sublists....
This is a python file Reads information from a text file into a list of sublists. Be sure to ask the user to enter the file name and end the program if the file doesn’t exist. Text file format will be as shown, where each item is separated by a comma and a space: ID, firstName, lastName, birthDate, hireDate, salary Store the information into a list of sublists called empRoster. EmpRoster will be a list of sublists, where each sublist...
Implement in Python a script that does the following: 1) reads input from a supposed file...
Implement in Python a script that does the following: 1) reads input from a supposed file called firstnames_2.txt. 2) processes the input and writes and saves the output to a file. NOTE: Please make sure that the names are written in the outfile with one name on each line no comma ( , ) after the name in the output
Using Python The script is to open a given file. The user is to be asked...
Using Python The script is to open a given file. The user is to be asked what the name of the file is. The script will then open the file for processing and when done, close that file. The script will produce an output file based on the name of the input file. The output file will have the same name as the input file except that it will begin with "Analysis-". This file will be opened and closed by...
Write a Python program in a file called consonants.py, to solve the following problem using a...
Write a Python program in a file called consonants.py, to solve the following problem using a nested loop. For each input word, replace each consonant in the word with a question mark (?). Your program should print the original word and a count of the number of consonants replaced. Assume that the number of words to be processed is not known, hence a sentinel value (maybe "zzz") should be used. Sample input/output: Please enter a word or zzz to quit:...
Using Python 3 Write a function reads the file called simpleinterest.txt. Each row in simpleinterest.txt is...
Using Python 3 Write a function reads the file called simpleinterest.txt. Each row in simpleinterest.txt is a comma seperated list of values in the order of PV, FV, n, r. For each row, there is one value missing. Write an output file that fills in the missing value for each row.
Write a python script to solve the 4-queens problem using. The code should allow for random...
Write a python script to solve the 4-queens problem using. The code should allow for random starting, and for placed starting. "The 4-Queens Problem[1] consists in placing four queens on a 4 x 4 chessboard so that no two queens can capture each other. That is, no two queens are allowed to be placed on the same row, the same column or the same diagonal." Display the iterations until the final solution Hill Climbing (your choice of variant)
Write and Compile a python script to solve the 4-queens problem using Forward Checking algorithm ....
Write and Compile a python script to solve the 4-queens problem using Forward Checking algorithm . The code should allow for random starting, and for placed starting. Random Starting means randomly place a queen position on the chessboard. Placed Starting means asked for the user input to place a queen position on the chessboard. Display the iterations until the final solution "The 4-Queens Problem[1] consists in placing four queens on a 4 x 4 chessboard so that no two queens...
Write and Compile a python script to solve the 4-queens problem using. The code should allow...
Write and Compile a python script to solve the 4-queens problem using. The code should allow for random starting, and for placed starting using numpy "The 4-Queens Problem[1] consists in placing four queens on a 4 x 4 chessboard so that no two queens can capture each other. That is, no two queens are allowed to be placed on the same row, the same column or the same diagonal." Display the iterations until the final solution Arc consistency
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT