Wednesday, November 2, 2011

Hadoop Jar Hell

It's just not possible to download the latest Hadoop related Projects from Apache and use them together because of the interoperability issues among the different Hadoop Projects and their release cycles.

That's the reason why BigTop an Apache Incubator project has evolved, to solve the interoperability issues around the different Hadoop projects by providing a test suite. Also, companies like Cloudera provide their own distribution with different Hadoop projects based on Apache distribution, with proper testing and support.

Now HortonWorks which has been spun from Yahoo joined the same ranks. Their initial manifesto was to make the Apache downloads a source where anyone can download the jars and use them without any issues. But, they have moved away from this with the recent announcement of the HortonWorks Data Platform which is again based on Apache distribution similar to what Cloudera has done with their CDH distributions. Although, HortonWorks and Cloudera have their own distribution, they would be actively contributing the Apache Hadoop ecosystem.

With the maturity of BigTop it would be possible to download different Hadoop related jar files from Apache and use them directly instead of depending on the distributions from HortonWorks and Cloudera.

As mentioned in the GigaOm Article, such distributions from HortonWorks and Cloudera make them easy to support their customers as they have to support limited number of Hadoop versions and they would also know the potential issues with those versions.

1 comment: