Question

In: Computer Science

Write a MapReduce program in hadoop to find the words that occurs more than 2000 times...

Write a MapReduce program in hadoop to find the words that occurs more than 2000 times in book.txt file. To count the occurrences of the words and filter the Filterwords in book.txt, convert all words into lower case. Also filter the digits (0-9) and punctuation. The class made to filter the word are below. Filterwords.isOneOfThem(String in) returns true if in is a Filterword.

class Filterwords {

public static String [] myFilterWordsArray = { "a", "an", "the", "am", "are", "is","at"};

    public static Set myFilterWords = new HashSet(Arrays.asList(myFilterWordsArray));

    public static boolean isOneOfThem(String in) {
       return myFilterWords.contains(in);
    }

}

Solutions

Expert Solution

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {
public static class WordMapper extends Mapper<LongWritable,Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String[] stringArr = value.toString().split("\\s+");
for (String str : stringArr) {
word.set(str);
context.write(word, one);
}
}
}
public static class CountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

public static void main(String[] args) throws Exception{
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(WordMapper.class);   
job.setReducerClass(CountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}


Related Solutions

Utilize Python to create a key-value program similar to MapReduce on Hadoop. For this assignment, do...
Utilize Python to create a key-value program similar to MapReduce on Hadoop. For this assignment, do the following: Create a dictionary (also called an associative array) that contains at least 50 values. An example of a key may be state and value as capital. Another example may be number in an alphabet and letter in an alphabet. Write a command that enumerates the contents of key-values in the dictionary. Write a command that lists all keys. Write a command that...
Instructions: 1. Write a MapReduce program to find the frequency of each letter, case insensitive, in...
Instructions: 1. Write a MapReduce program to find the frequency of each letter, case insensitive, in some given input. For example, "The quick brown fox jumps over the lazy dog" as input should generate the following (letter,count) pairs: (T, 2), (H, 1), (E, 3), etc. 2. Test your program against the 3 attached input files: HadoopFile0.txt, HadoopFile1.txt, and HadoopFile2.txt. 3. The input and output must be read/written from/into HDFS. 4. Please submit only the Java source file(s) on . 5....
Write a ministerial brief of no more than 200 words containing no more than three graphs...
Write a ministerial brief of no more than 200 words containing no more than three graphs on whether you can confidently recommend investing in the Bitcoin, compared with ONE of three shares: BHP, CBA, and TLS.   Weekly returns Date CBA.AX TLS.AX BTC BHP 27/9/15 4/10/15 2.92% 0.36% 3.91% 14.22% 11/10/15 1.35% -3.55% 5.66% -4.50% 18/10/15 1.05% 2.57% 10.01% -0.86% 25/10/15 -0.80% -3.23% 12.19% -8.00% 1/11/15 -0.18% -1.67% 15.05% -6.51% 8/11/15 -1.08% -3.20% -14.56% -7.61% 15/11/15 5.37% 5.25% 1.68% 2.43% 22/11/15...
Write a project proposal on this topic, not less than 2,500 words and not more than...
Write a project proposal on this topic, not less than 2,500 words and not more than 3000 words including references - “ Investigating the challenges confronting the performance of small and medium enterprises (SMEs) in Abuja.”
Topic: Tuberculosis ( (write a review on it more than 1500 words)
Topic: Tuberculosis ( (write a review on it more than 1500 words)
2. Review Recent molecular biology advance in tumor diagnosis and treatment ( more than 2000 words)
2. Review Recent molecular biology advance in tumor diagnosis and treatment ( more than 2000 words)
Topic: Recent molecular biology advance in tumor diagnosis (write a review on it more than 2000...
Topic: Recent molecular biology advance in tumor diagnosis (write a review on it more than 2000 words) Please answer the question only if you can observe the minimum limit of 2000 words. Thank you
In not more than thousand (100) words write an essay on the topic “ the impact...
In not more than thousand (100) words write an essay on the topic “ the impact of the covid19 pandemic on university in Ghana .
Search for this article on Google and write a summary in no more than 300 words...
Search for this article on Google and write a summary in no more than 300 words Financial econometrics } A new discipline with new methods Robert Engle Department of Economics, University of California at San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0508, USA and Stern School of Business, New York University, 44 West 4th Street, Suite 9-160, New York, NY 10012, USA Financial econometrics is simply the application of econometric tools to "nancial data. For many years, least-squares techniques...
Students will write a strategy paper with a length of no more than 1,000 words. This,...
Students will write a strategy paper with a length of no more than 1,000 words. This, strategy paper will focus on a goal defined by you. That goal should be a personal objective of yours. The goal can be short-term and tactical or a strategic, long-range goal. The paper will also explain the key organizational and cultural requirements expected to meet the goal you have defined and a plan for identifying those requirements in prospective places to work. The Paper...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT