Sunday, July 8, 2012

Is Hadoop a square peg in a round hole?

There was an article in GigaOm about Hadoop days being numbered. I agree with some of the points with the author and not some.

Because of the HYPE many are doing something or other around Hadoop and so the ecosystem, support (commercial support/forums etc), production use, documentation is huge. So, trying to fit everything into Hadoop is not the right solution. Alternate paradigms have to be considered while architecting a system based on the requirements.

In the context of graph processing with pregel the author mentions

At the time of writing, the only viable option in the open source world is Giraph, an early Apache incubator project that leverages HDFS and Zookeeper. There’s another project called Golden Orb available on GitHub.

Besides Apache Giraph, there is also Apache Hama for graph processing based on pregel. Also, Apache Giraph and Hama have moved from incubator to Apache TLP (Top Level Project). While, Giraph can only be used for graph processing, Hama in a pure BSP engine which can be used for a lot of other things besides graph processing. In contrast, there was a blog entry mentioning that Giraph can be used to process other models also.

Then, there is also GraphLab and Golden Orb. While there had been some work going on GraphLab, Golden Orb had been dormant for more than an year.

For those interested here is a paper comparing MapReduce with Bulk Synchronous Parallel. The paper states that MR algorithms can be implemented in BSP and the other way also. But, some algorithms can be effectively implemented in BSP and some in MR.

Once again I would like to iterate to consider alternate paradigms/frameworks besides Hadoop/MapReduce while architecting a solution around big data.

No comments:

Post a Comment