Question

In: Computer Science

Solve using PYTHON PROGRAMMING 9. Write a script that reads a file “ai_trends.txt”, into a list...

Solve using PYTHON PROGRAMMING

9. Write a script that reads a file “ai_trends.txt”, into a list of words, eliminates from the list of words the words in the file “stopwords_en.txt” and then

a. Calculates the average occurrence of the words. Occurrence is the number of times a word is appearing in the text

b. Calculates the longest word

c. Calculates the average word length. This is based on the unique words: each word counts as one

d. Create a bar chart with the 10 most frequent words.

Solve using PYTHON PROGRAMMING

Solutions

Expert Solution

import re
import operator
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure

stopwords = []
#Read stop words from file
filename = "stopwords_en.txt"
with open(filename, 'r') as file:
    content = file.readlines()
    #Read line by line, split using comma, add to list after stripping any extra space
    for aLine in content:
        words = aLine.split(",")
        for word in words:
            if word not in stopwords:
                stopwords.append(word.strip())
#print(stopwords)

textwords = []
#Read words from text file
filename = "ai_trends.txt"
with open(filename, 'r') as file:
    content = file.readlines()
    for aLine in content:
        words = aLine.split(" ")
        for word in words:
            #Remove punctuation from word using regex
            res = re.sub(r'[^\w\s]', '', word)
            textwords.append(res)
#print(textwords)

#Remove stop words from textwords
temp = []
for word in textwords:
    if word not in stopwords:
        temp.append(word)
textwords = temp

#Create dictionary of words with frequency
freq = dict()
for word in textwords:
    if len(word)==0:
        continue
    if word in freq:
        freq[word] += 1
    else:
        freq[word] = 1
      
#print(freq)

#Calculate average occurence
sum_ = 0
count = 0
for key,value in freq.items():
    sum_ += freq[key]
    count += 1
print("Average occurence: ", str(sum_/count))

#Calculate longest word
maximum = 0
max_word = ""
for key in freq.keys():
    if len(key)>maximum:
        maximum = len(key)
        max_word = key
print(max_word, maximum)

#Average length of unique words after stopwords removal
sum_ = 0
count = 0
for key in freq.keys():
    sum_ += len(key)
    count += 1
print("Average word length of unique words:", str(sum_/count))

#Sort the dictionary to find 10 most frequent words
newDict = dict( sorted(freq.items(), key=operator.itemgetter(1), reverse=True))
print(newDict)

#Create bar chart
i = 0
word = []
frequency = []
for key, value in newDict.items():
    if i==10:
        break;
    else:
        word.append(key)
        frequency.append(value)
    i += 1

#Set figure size
figure(num=None, figsize=(15,7), dpi=80)
plt.bar(word, frequency, color="orange")
plt.xlabel('Word')
plt.ylabel('Frequency')
plt.show()


Related Solutions

Solve using PYTHON PROGRAMMING Write a script that reads a file “cars.csv”, into a pandas structure...
Solve using PYTHON PROGRAMMING Write a script that reads a file “cars.csv”, into a pandas structure and then print a. the first 3 rows and the last 3 of the dataset b. the 3 cars with the lowest average-mileage c. the 3 cars with the highest average-mileage. Solve using PYTHON PROGRAMMING
This is a python file Reads information from a text file into a list of sublists....
This is a python file Reads information from a text file into a list of sublists. Be sure to ask the user to enter the file name and end the program if the file doesn’t exist. Text file format will be as shown, where each item is separated by a comma and a space: ID, firstName, lastName, birthDate, hireDate, salary Store the information into a list of sublists called empRoster. EmpRoster will be a list of sublists, where each sublist...
Implement in Python a script that does the following: 1) reads input from a supposed file...
Implement in Python a script that does the following: 1) reads input from a supposed file called firstnames_2.txt. 2) processes the input and writes and saves the output to a file. NOTE: Please make sure that the names are written in the outfile with one name on each line no comma ( , ) after the name in the output
Write a python script to solve the 4-queens problem using. The code should allow for random...
Write a python script to solve the 4-queens problem using. The code should allow for random starting, and for placed starting. "The 4-Queens Problem[1] consists in placing four queens on a 4 x 4 chessboard so that no two queens can capture each other. That is, no two queens are allowed to be placed on the same row, the same column or the same diagonal." Display the iterations until the final solution Hill Climbing (your choice of variant)
C Programming Write a program in C that reads in a file, stores its contents as...
C Programming Write a program in C that reads in a file, stores its contents as a character array/pointer (char*) into an unsigned character array/pointer (unsigned char* message). Note: the input file can have one line or multiple lines and vary in length
Design and write a python program that reads a file of text and stores each unique...
Design and write a python program that reads a file of text and stores each unique word in some node of binary search tree while maintaining a count of the number appearance of that word. The word is stored only one time; if it appears more than once, the count is increased. The program then prints out 1) the number of distinct words stored un the tree, Function name: nword 2) the longest word in the input, function name: longest...
Write a MATLAB script file to numerically solve any first order initial value problem using Rulers...
Write a MATLAB script file to numerically solve any first order initial value problem using Rulers method. Once code is working use it to solve the mixing tank problem below. Use a step size of 1 minute, and simulate the solution until the tank contains no more salt. Plot both the Euler approximation and the exact solution on the same set of axes. A tank contains 100 gallons of fresh water. At t=0 minutes, a solution containing 1 lb/gal of...
For this problem, you'll be writing a script that reads the content of a file, modifies...
For this problem, you'll be writing a script that reads the content of a file, modifies that content, and then writes the result to a new file. a. Define a function called modified() that takes a string and returns a "modified" version of the string. The type of modification is up to you (be creative!), but I have a couple requirements: You must make at least two different modifications, and at least one of those modifications must be some kind...
In this programming assignment, you will write a program that reads in the CSV file (passenger-data-short.csv),...
In this programming assignment, you will write a program that reads in the CSV file (passenger-data-short.csv), which contains passenger counts for February 2019 on 200 international flights. The data set (attached below) is a modified CSV file on all International flight departing from US Airports between January and June 2019 reported by the US Department of Transportation. You write a program that give some summary statistics on this data set. Create a header file named flights.h. In this file, you...
Python program: Write a program that reads a text file named test_scores.txt to read the name...
Python program: Write a program that reads a text file named test_scores.txt to read the name of the student and his/her scores for 3 tests. The program should display class average for first test (average of scores of test 1) and average (average of 3 tests) for each student. Expected Output: ['John', '25', '26', '27'] ['Michael', '24', '28', '29'] ['Adelle', '23', '24', '20'] [['John', '25', '26', '27'], ['Michael', '24', '28', '29'], ['Adelle', '23', '24', '20']] Class average for test 1...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT