Friday, March 28, 2014

Mahout and MR

There has been a active discussion (1, 2, 3) in the Mahout Dev mailing list about the goals for Mahout 1.0 and also moving the underlying computation engine from MR to Spark or H20.  But as mentioned in the GigaOM article `Apache Mahout, Hadoop’s original machine learning project, is moving on from MapReduce`, the community hasn't yet decided yet.

As mentioned in the earlier blogs here, MR is by default batch oriented in nature and is also not suited for iterative processing and implementing Machine Learning algorithms as processing with MR involves R/Ws to HDFS after each step in the iteration. Mahout is pretty much tied to MR, though it's not impossible to rewrite the underlying MR algorithms, it's also not an easy task. It would be the right direction for the Mahout project to move to some non-MR platform and the sooner the better.

With the announcement of Oryx from Cloudera, we can expect quick progress around the distributed Machine Learning frameworks.
Directions by MShades From Flickr under CC

No comments:

Post a Comment