Hadoop implements the MapReduce paradigm as mentioned in the original Google MapReduce Paper in terms of key/value pairs. The output types of the Map should match the input types of the Reduce as shown below.
(K1,V1) -> Map -> (K2,V2)
(K2, V2) -> Reduce -> (K3,V3)
The big question is why use key/value pairs?
MapReduce is derived from the concepts of functional programming. Here is a tutorial on the map/fold primitives in the functional programming which are used in the MapReduce paradigm and there is no mention of key/value pairs anywhere.
According to the Google MapReduce Paper
We realized that most of our computations involved applying a map operation to each logical “record” in our input in order to compute a set of intermediate key/value pairs, and then applying a reduce operation to all the values that shared the same key, in order to combine the derived data appropriately.
The only reason I could see why Hadoop used key/value pairs is that the Google Paper had key/value pairs to meet their requirements and the same had been implemented in Hadoop. And everyone is trying to fit the the problem space in the key/value pairs.
Here is an interesting article on using tuples in MapReduce.
(K1,V1) -> Map -> (K2,V2)
(K2, V2) -> Reduce -> (K3,V3)
The big question is why use key/value pairs?
MapReduce is derived from the concepts of functional programming. Here is a tutorial on the map/fold primitives in the functional programming which are used in the MapReduce paradigm and there is no mention of key/value pairs anywhere.
According to the Google MapReduce Paper
We realized that most of our computations involved applying a map operation to each logical “record” in our input in order to compute a set of intermediate key/value pairs, and then applying a reduce operation to all the values that shared the same key, in order to combine the derived data appropriately.
The only reason I could see why Hadoop used key/value pairs is that the Google Paper had key/value pairs to meet their requirements and the same had been implemented in Hadoop. And everyone is trying to fit the the problem space in the key/value pairs.
Here is an interesting article on using tuples in MapReduce.
No comments:
Post a Comment