In: Operations Management
In your own words, explain the core principle of the MapReduce algorithm (6 pts)
MapReduce may be a process or technique and a program model for distributed computing supported java. The MapReduce formula contains 2 vital tasks, specifically Map and scale back. Map takes a collection of knowledge and converts it into another set of knowledge, wherever individual components area unit counteracted into tuples (key/value pairs). Secondly, scale back task, that takes the output from a map as associate degree input and combines those information tuples into a smaller set of tuples. because the sequence of the name MapReduce implies, the scale back task is usually performed once the map job.
The major advantage of MapReduce is that it's simple to scale processing over multiple computing nodes. below the MapReduce model, the info process primitives area unit referred to as mappers and reducers. moldering a knowledge process application into mappers and reducers is typically nontrivial. But, once we tend to write associate degree application within the MapReduce type, scaling the applying to run over a whole lot, thousands, or maybe tens of thousands of machines during a cluster is just a configuration amendment. this easy quantifiability is what has attracted several programmers to use the MapReduce model.
Core principle of the MapReduce algorithm-
1- usually MapReduce paradigm relies on causing the pc to wherever the info resides!
2- MapReduce program executes in 3 stages, specifically map stage, shuffle stage, and scale back stage.
Map stage − The map or mapper’s job is to method the input file.
usually the input file is within the style of file or directory and
is keep within the
Hadoop filing system (HDFS). The input data is passed to the clerk
operate line by line. The clerk processes the info and creates many
tiny chunks of knowledge.This stage is that the
combination of the Shuffle stage and therefore the scale back
stage. The Reducer’s job is to method the info that comes from the
clerk. once process, it produces a replacement set of output, which
can be keep within the HDFS.
3- throughout a MapReduce job, Hadoop sends the Map and scale back tasks to the acceptable servers within the cluster.
4- The framework manages all the small print of data-passing like supply tasks, corroborative task completion, and repeating information round the cluster between the nodes.
5-Most of the computing takes place on nodes with information on native disks that reduces the network traffic.
6-After completion of the given tasks, the cluster collects
associate degreed reduces the info to create an acceptable result,
and sends it back to the Hadoop server.