In: Computer Science
In 150 words or less, define key value pair and, from a high level, explain why the use of KVPs is important to MapReduce.
Solution :
Define key value pairs :
Key-value pair is two values usually connected in such a way that the value is accessed using the key. They are commonly used in various data-structures to provide fast access to values.
Why the use of KVPs is important to MapReduce :
MapReduce is the foundational framework for processing data at
scale because of its ability to break a large problem into any
smaller ones
⬢ Mappers read data in the form of KVPs and each call to a Mapper
is for a single KVP; it can return 0..m KVPs
⬢ The framework shuffles & sorts the Mappers’ outputted KVPs
with the guarantee that only one Reducer will be asked to process a
given Key’s data
⬢ Reducers are given a list of Values for a specific Key; they can
return 0..m KVPs
MapReduce Example –
Map Phase
Input to Mapper
(8675, ‘I will not eat green eggs and ham’)
(8709, ‘I will not eat them Sam I am’)
...
Output from Mapper
(‘I’, 1), (‘will’, 1), (‘not’, 1), (‘eat’, 1),(‘green’, 1),(‘eggs’, 1), (‘and’, 1)(‘ham’, 1), (‘I’, 1), (‘will’, 1),(‘not’, 1), (‘eat’, 1),(‘them’, 1), (‘Sam’, 1),(‘I’, 1), (‘am’, 1)
Reduce Phase
Input to Reducer
(‘I’, [1, 1, 1]) (‘Sam’, [1]) (‘am’, [1]) (‘and’, [1]) (‘eat’, [1,
1]) (‘eggs’, [1]) (‘green’, [1]) (‘ham’, [1]) (‘not’, [1, 1])
(‘them’, [1]) (‘will’, [1, 1])
Output from Reducer
(‘I’, 3)(‘Sam’, 1)(‘am’, 1) (‘and’, 1) (‘eat’, 2) (‘eggs’, 1) (‘green’, 1) (‘ham’, 1) (‘not’, 2) (‘them’, 1) (‘will’, 2)