In: Computer Science
In python,
1- Modify your mapper to count words after removing punctuation marks during mapping.
Practice the given tasks in Jupyter notebook first before running them on AWS. If your program fails, check out stderr log file for information about the error.
import sys
sys.path.append('.')
for line in sys.stdin:
line = line.strip() #trim spaces from beginning and
end
keys = line.split() #split line by space
for key in keys:
value = 1
print ("%s\t%d" % (key,value)) #for
each word generate 'word TAB 1' line
If you have any doubts, please give me comment...
#!/usr/bin/env python
#the above just indicates to use python to intepret this file
#This mapper code will input a line of text and output <word, 1> #
import sys
import string
sys.path.append('.')
count = 0
for line in sys.stdin:
line = line.strip() #trim spaces from beginning and end
keys = line.split() #split line by space
for key in keys:
value = 1
key = key.translate(str.maketrans('','', string.punctuation))
print ("%s\t%d" % (key,value))
count += value
print("count of words: "+str(count))
Let me know, If it doesn't match with your output on AWS, I will help you...