In: Computer Science
Please Write-In Python Language (Topic: Word frequencies)
Method/Function: List<Token>
tokenize(TextFilePath)
Write a method/function that reads in a text file and returns a
list of the tokens in that file. For the purposes of this project,
a token is a sequence of alphanumeric characters, independent of
capitalization (so Apple, apple, aPpLe are the
same token). You are allowed to use regular expressions if you wish
to (and you can use some regexp engine, no need to write it from
scratch), but you are not allowed to import a tokenizer (e.g. from
NLTK), since you are being asked to write a tokenizer.
Method: Map<Token,Count>
computeWordFrequencies(List<Token>)
Write another method/function that counts the number of occurrences
of each token in the token list. Remember that you should write
this assignment yourself from scratch so you are not allowed to
import a counter when the assignment asks you to write that
method.
Method: void print(Frequencies<Token,
Count>)
Finally, write a method that prints out the word frequency count
onto the screen. The print out should be ordered by decreasing
frequency (so, the highest frequency words first).
Print the output in this format:
<token> -> <freq>
Please give me some notes about the codes, thanks!!!
import string
# 1) Splits the text file into individual
characters
# to identify the commas and parsing the individual
# tokens.
# create a list to store the inputted numbers
numbers = list()
# Open the input text file for reading
dataFile = open('numbers.txt', 'r')
# Loop through each line of the input data file
for eachLine in dataFile:
# setup a temporay variable
tmpStr = ''
# loop through each character in the
line
for char in eachLine:
# check whether
the char is a number
if
char.isdigit():
#
if it is a number add it to the tmpStr
tmpStr
+= char
#
if a comma is identified and tmpStr has a
#
value then append it to the numbers list
elif char == ','
and tmpStr != '':
numbers.append(int(tmpStr))
tmpStr
= ''
# if the tmpStr contains a number add
it to the
# numbers list.
if tmpStr.isdigit():
numbers.append(int(tmpStr))
# Print the number list
print numbers
# Close the input data file.
dataFile.close()
# 2) Uses the string function split to line from the
file
# into a list of substrings
numbers = list()
dataFile = open('C:\\PythonCourse\\unit3\\numbers.txt', 'r')
for eachLine in dataFile:
# Simplify the script by using a python
inbuilt
# function to separate the
tokens
substrs =
eachLine.split(',',eachLine.count(','))
# Iterate throught the output and check
that they
# are numbers before adding to the
numbers list
for strVar in substrs:
if
strVar.isdigit():
numbers.append(int(strVar))
print numbers
dataFile.close()