Question

In: Computer Science

Suppose there are two sets of letters D1 and D2 stored on two nodes A1 and...

Suppose there are two sets of letters D1 and D2 stored on two nodes A1 and A2: • D1 = {“a”, “b”, “c”, “b”, “c”, “d”, “a”} on Node A1, and • D2 = {“a”, “a”, “a”, “d”, “d”, “c”} on Node A2. There is a MapReduce job running to process D1 and D2. The job is a “get average” application, namely, computing the average number of copies for each letter. In this job, two Map tasks run on Node A1 and Node A2, and one Reduce task runs on another node, say, Node A3. Describe the key-value data transferred from Node A1 and Node A2 to Node A3, respectively, with and without a Combiner, respectively. Explain how a Combiner can improve a MapReduce job in this example.

Solutions

Expert Solution

As soon as we start run MapReduce job on very large data sets say D1 and D2 sets in this given example, eventually the mapper processes and produces huge chunks of intermediate O/P data then send to Reducer , Reducer causes network congestion. In order to increase efficiency a Combiner can be specify via Reducer.class, to perform specific local aggregation of the intermediate O/P, which assist to minimizes the amount of data transferred from the Mapper to eventually the Reducer. Combiner acts like a  mini-reducer. Combiner processes and produce the O/P of Mapper and perform local aggregation before sending it to the reducer.

Example:

D1 = {“a”, “b”, “c”, “b”, “c”, “d”, “a”}

D2 = {“a”, “a”, “a”, “d”, “d”, “c”}


Primary function of Mappers is to split the I/P into key value pairs. In the above example we have set D1 is having 7 (6 in case of D2) input split it into 14 key(12 in case of D2) value pairs so we have total 98 key-value pair which has to copied to reducer. The reducer reduces 98 key-value pair into very less no of key value pairs and transfer to reducer node. In this case all the I/P data that has to be minimized

When we use combiner at the time of writing key value pair by a mapper for splitting I/P, then it writes very less number of key value pair in disk as compared with  without using combiner.

Combiner can improve a MapReduce job in this example.
1. Minimizes the time taken for transfer of data between mapper and reducer
2. Decreases the amount of data that needed by reducer for processing.
3. The overall efficacy of the reducer is drastically improved by the combiner.


Related Solutions

Astronomers know that the two lines, labeled as D1 and D2, are created by electron transitions...
Astronomers know that the two lines, labeled as D1 and D2, are created by electron transitions in sodium atoms. The D1 line is missing photons of wavelength 5895.924 Å. The D2 line is caused by absorption of 5889.950 Å photons. Using their known wavelengths, you will be able to convert pixels in the photograph into real wavelengths by determining a scale factor (i.e., determining the number of pixels per Angstrom) Calculate: how many Angstroms are there between the D1 and...
CLUK is a producer of sports nutrition drinks and has two divisions, D1 and D2. Division...
CLUK is a producer of sports nutrition drinks and has two divisions, D1 and D2. Division D1 manufactures recyclable plastic containers which it sells to both Division D2 and also external customers. Division D2 makes high protein drinks which it sells to the retail trade in the containers that it purchases from Division D1. You have been provided with the following budget information for Division D1: $ Selling price to retail customers per 1,000 containers 130 Variable costs per container...
Suppose that one individual’s demand curve is D1(p) = 20−p and another individual’s is D2(p) =...
Suppose that one individual’s demand curve is D1(p) = 20−p and another individual’s is D2(p) = 10−2p. What is the market demand function? We have to be a little careful here about what we mean by “linear” demand functions. Since a negative amount of a good usually has no meaning, we really mean that the individual demand functions have the form D1(p) = max{20 − p, 0} D2(p) = max{10 − 2p, 0}. What economists call “linear” demand curves actually...
Let (X,d) be the Cartesian product of the two metric spaces (X1,d1) and (X2,d2). a) show...
Let (X,d) be the Cartesian product of the two metric spaces (X1,d1) and (X2,d2). a) show that a sequence {(xn1,xn2)} in X is Cauchy sequence in X if and only if {xn1} is a Cauchy sequence in X1 and {xn2} is a Cauchy in X2. b) show that X is complete if and only if both X1 and X2​​​​​​​ are complete.
1. Rolling two D20 Consider what hapens when we roll two 20-sided dice d1 and d2...
1. Rolling two D20 Consider what hapens when we roll two 20-sided dice d1 and d2 (so the sample space is S={(d1,d2):d1,d2∈{1,2,3,…,20}} and Pr(ω)=1/|S| for each ω∈S). Consider the following events: A is the event "d1=13" B is the event "d1+d2=15" C is the event "d1+d2=21" Use the definitions of independence and conditional probability to answer these two questions: Are the events A and B independent? Are the events A and C independent?
Covert the schema into 2NF and 3NF TableD (D1,D2,D3,D4,D5) functionally dependencies: D4 --> D2 Answer: Relation1:...
Covert the schema into 2NF and 3NF TableD (D1,D2,D3,D4,D5) functionally dependencies: D4 --> D2 Answer: Relation1: Relation2:
A horizontal pipe has an abrupt expansion from D1 = 8 cm to D2 = 16...
A horizontal pipe has an abrupt expansion from D1 = 8 cm to D2 = 16 cm. The water velocity in the smaller section is 6 m/s and the flow is turbulent. The pressure in the smaller section is P1 = 418 kPa. Determine the downstream pressure P2 in kPa.
A maximum of 300 nodes are to be stored in a hashed data structure. Give the...
A maximum of 300 nodes are to be stored in a hashed data structure. Give the size of the primary storage area that would maximize the performance of the structure.
Let d1, d2, ..., dn, with n at least 2, be positive integers. Use mathematical induction...
Let d1, d2, ..., dn, with n at least 2, be positive integers. Use mathematical induction to explain why, if d1+ d2+…+dn = 2n-2, then there must be a tree with n vertices whose degrees are exactly d1, d2, ..., dn. (Be careful with reading this statement. It is not the same as saying that any tree with vertex degrees d1, d2, ..., dn must satisfy d1+ d2+...+dn = 2n-2, although this is also true. Rather, it says that if...
1. Let (A, B, C, D) and (A1, B1, C1, D1) be two non-degenerate quadrilateral. Explain...
1. Let (A, B, C, D) and (A1, B1, C1, D1) be two non-degenerate quadrilateral. Explain why there is exactly one projective transformation that maps one to the other. 5. Classify the conic x 2−4xy+4y 2−6x−8y+5 = 0. Determine its center/vertex, its axis and its eccentricity.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT