Developing algorithms to process data on top of Distributing computing because of it's inherent distributed nature is difficult. So there are models like PRAM, MapReduce, BSP (Bulk Synchronous Parallel), MPI (Message Passing Interface) and others. Apache Hadoop, Apache Hama, Open MPI and other frameworks/software make it easier to develop algorithms on top of the earlier mentioned distributed computing models.
Coming back to our favorite topic MapReduce, this model is not something new. Map and Reduce had been known as Map and Fold which are higher order functions and had been used in functional programming languages like Lisp and ML. The different between a regular function and a higher order function is that while a higher order function can take multiple functions as input and return a function as output, a regular function doesn't.
As mentioned earlier Map and Fold had been there for ages and you might have heard about them if you had your education in Computer Science or used any of the functional programming. But, Google with it's paper on MapReduce made the functional programming and higher order functions popular.
Also, along the same lines BSP had been there for quite some time, but Google made BSP model popular with it's Large-scale graph computing at Google paper.
Some models suit well for some type of algorithms, while some won't. To mention, iterative algorithms fit better with the BSP model than the MR model, but still Mahout choose to use MR model than BSP model because Apache Hadoop (MR implementation) is much more mature and has a huge ecosystem when compared to Apache Hama (BSP implementation).
All the models which have been discussed need a different way of thinking to solve a particular problem. Here are the resources containing the pseudo code/logic for implementing some of the algorithms on top of MapReduce.
- Hadoop - The Definitive Guide (Join in Page 283)
- MapReduce Patterns
- Data Intensive Text Processing with MapReduce
Would be following with another blog with patterns around the BSP model.
Coming back to our favorite topic MapReduce, this model is not something new. Map and Reduce had been known as Map and Fold which are higher order functions and had been used in functional programming languages like Lisp and ML. The different between a regular function and a higher order function is that while a higher order function can take multiple functions as input and return a function as output, a regular function doesn't.
As mentioned earlier Map and Fold had been there for ages and you might have heard about them if you had your education in Computer Science or used any of the functional programming. But, Google with it's paper on MapReduce made the functional programming and higher order functions popular.
Also, along the same lines BSP had been there for quite some time, but Google made BSP model popular with it's Large-scale graph computing at Google paper.
Some models suit well for some type of algorithms, while some won't. To mention, iterative algorithms fit better with the BSP model than the MR model, but still Mahout choose to use MR model than BSP model because Apache Hadoop (MR implementation) is much more mature and has a huge ecosystem when compared to Apache Hama (BSP implementation).
All the models which have been discussed need a different way of thinking to solve a particular problem. Here are the resources containing the pseudo code/logic for implementing some of the algorithms on top of MapReduce.
- Hadoop - The Definitive Guide (Join in Page 283)
- MapReduce Patterns
- Data Intensive Text Processing with MapReduce
Would be following with another blog with patterns around the BSP model.
No comments:
Post a Comment