In: Computer Science
I need to, Modify my mapper to count the number of occurrences of each character (including punctuation marks) in the file.
Code below:
#!/usr/bin/env python
#the above just indicates to use python to intepret this file
#This mapper code will input a line of text and output <word, 1> #
import sys
sys.path.append('.')
for line in sys.stdin:
line = line.strip() #trim spaces from beginning and end
keys = line.split() #split line by space
for key in keys:
value = 1
print ("%s\t%d" % (key,value)) #for each word generate 'word TAB 1' line
Assuming you need the number of occurrences for each character (not word). We use python's dictionary data structure to hold the unique characters and their counts. Iterate through each character in the line and add them into the dictionary with count 1 if not present in the dictionary. If the character is already present, increment its count by 1. At the end of the loop, you will have the total number of occurrences for all unique characters in the dictionary.
count_dict ={} # Initialize an empty dictionary. If you need to
process multiple lines, do this inside a loop
for char in line: # Iterate through each character in the
line
if char in count_dict: # If the character is already present in the
dictionary, increment its value by 1
count_dict[char] = count_dict[char] + 1
else: # Else if it is a new character, add it into the dictionary
with value as 1
count_dict[char] = 1
# Iterate through the dictionary and print the key- value pairs. Character is the key & count is the value in the dictionary
for key,value in count_dict.items():
print("%s\t%d" % (key,value))