Question

In: Computer Science

Instructions: 1. Write a MapReduce program to find the frequency of each letter, case insensitive, in...

Instructions:


1. Write a MapReduce program to find the frequency of each letter, case insensitive, in some given input. For example, "The quick brown fox jumps over the lazy dog" as input should generate the following (letter,count) pairs: (T, 2), (H, 1), (E, 3), etc.

2. Test your program against the 3 attached input files: HadoopFile0.txt, HadoopFile1.txt, and HadoopFile2.txt.

3. The input and output must be read/written from/into HDFS.

4. Please submit only the Java source file(s) on .

5. I've attached the WordCount.java as a sample MapReduce program. You might find it useful.

WordCount.java ::

import java.io.IOException;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import   org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
public class WordCount
{
public static void main(String[] args)
throws Exception {
if (args.length != 2) {
System.err.println("Usage: WordCount <input path> <output path>");
System.exit(-1); }
Job job = Job.getInstance(); job.setJarByClass(WordCount.class); job.setJobName("Word Count");
//job.setNumReduceTasks(2); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(WordMapper.class); job.setReducerClass(WordReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1);
}
public static class WordMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString().toLowerCase(); String[] tokens = line.split("\\W+");
for (String token : tokens) {
if (token.length() > 0)
context.write(new Text(token), new IntWritable(1));
}
}
}
public static class WordReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int count = 0;
for (IntWritable value : values) {
count += value.get();
}
context.write(key, new IntWritable(count));
}
}
}

Hadoopfile0.txt::
Hadoop is the Elephant King!
A yellow and elegant thing.
He never forgets
Useful data, or lets
An extraneous element cling!

Hadoopfile1.txt::
A wonderful king is Hadoop.
The elephant plays well with Sqoop.
But what helps him to thrive
Are Impala, and Hive,
And HDFS in the group.

Hadoopfile2.txt::
Hadoop is an elegant fellow.
An elephant gentle and mellow.
He never gets mad, Or does anything bad,
Because, at his core, he is yellow.

Solutions

Expert Solution

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
  // Map function
  public static class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable>{
    private Text word = new Text();
    public void map(LongWritable key, Text value, Context context) 
           throws IOException, InterruptedException {
      // Splitting the line on spaces
      String[] stringArr = value.toString().split("\\s+");
      for (String str : stringArr) {
        word.set(str);
        context.write(word, new IntWritable(1));
      }           
    }
  }
    
  // Reduce function
  public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable>{        
    private IntWritable result = new IntWritable();
    public void reduce(Text key, Iterable<IntWritable> values, Context context) 
            throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }
  public static void main(String[] args)  throws Exception{
    Configuration conf = new Configuration();

    Job job = Job.getInstance(conf, "WC");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(MyMapper.class);    
    job.setReducerClass(MyReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Related Solutions

Write a MapReduce program in hadoop to find the words that occurs more than 2000 times...
Write a MapReduce program in hadoop to find the words that occurs more than 2000 times in book.txt file. To count the occurrences of the words and filter the Filterwords in book.txt, convert all words into lower case. Also filter the digits (0-9) and punctuation. The class made to filter the word are below. Filterwords.isOneOfThem(String in) returns true if in is a Filterword. class Filterwords { public static String [] myFilterWordsArray = { "a", "an", "the", "am", "are", "is","at"}; public...
use C++ Write a program to calculate the frequency of every letter in an input string...
use C++ Write a program to calculate the frequency of every letter in an input string s. Your program should be able to input a string. Calculate the frequency of each element in the string and store the results Your program should be able to print each letter and its respective frequency. Identify the letter with the highest frequency in your message. Your message should be constructed from at least 50 letters. Example of Output: letters frequency a 10 b...
Write a program that uses a dictionary to assign “codes” to each letter of the alphabet....
Write a program that uses a dictionary to assign “codes” to each letter of the alphabet. For example:      codes = {'A':'%', 'a':'9', 'B':'@', 'b':'#', etc....} Using this example, the letter “A” would be assigned the symbol %, the letter “a” would be assigned the number 9, the letter “B” would be assigned the symbol “@” and so forth. The program should open a specified text file, read its contents, and then use the dictionary to write an encrypted version of...
Write a program that translates a letter grade into a number grade. Letter grades are A,...
Write a program that translates a letter grade into a number grade. Letter grades are A, B, C, D, and F, possibly followed by + or -. Their numeric values are 4, 3, 2, 1 and 0. There is no F+ or F-. A “+” increases the numeric value by 0.3, a “–“ decreases it by 0.3. However, an A+ has value 4.0.4 pts Enter a Letter grade: B- The numeric value is 2.7 Your code with comments A screenshot...
write a Java program Write a program to assign a letter grade according to the following...
write a Java program Write a program to assign a letter grade according to the following scheme: A: total score >= 90 B: 80 <= total score < 90 C: 70 <= total score < 80 D: 60 <= total score < 70 F: total score < 60 Output - the student's total score (float, 2 decimals) and letter grade Testing - test your program with the following input data: test1 = 95; test2 = 80; final = 90; assignments...
C LANGUAGE ONLY Write a C program to count the frequency of each element in an...
C LANGUAGE ONLY Write a C program to count the frequency of each element in an array. Enter the number of elements to be stored in the array: 3 Input 3 elements of the array: element [0]: 25 element [1]: 12 element [2]: 43 Expected output: The frequency of all elements of an array: 25 occurs 1 times 12 occurs 1 times 3 occurs 1 times
C++ Write a program that reads a line of text, changes each uppercase letter to lowercase,...
C++ Write a program that reads a line of text, changes each uppercase letter to lowercase, and places each letter both in a queue and onto a stack. The program should then verify whether the line of text is a palindrome (a set of letters or numbers that is the same whether read forward or backward). Please use a Queue Class and Stack class.
3. Consider the string:”34456754”. Please find the frequency of each number. With a java program.
3. Consider the string:”34456754”. Please find the frequency of each number. With a java program.
1. Write a Formal Letter: SOLICITED LETTER: Write a cover letter for an advertised job, or...
1. Write a Formal Letter: SOLICITED LETTER: Write a cover letter for an advertised job, or a job about which you have specific knowledge (perhaps a new opening at your current place of employment).
write C++ program using functions (separate function for each bottom) Write a program to find if...
write C++ program using functions (separate function for each bottom) Write a program to find if a number is large word for two given bottom base - bottom1 and bottom2. You can predict that a number, when converted to any given base shall not exceed 10 digits. . the program should ask from user to enter a number that it should ask to enter the base ranging from 2 to 16 after that it should check if the number is...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT