Wednesday, October 3, 2012

Is Java a prerequisite for getting started with Hadoop?

The query I get often from those who want to get started with Hadoop is if knowledge of Java is a prerequisite or not. The answer is both a Yes and a No, depends on the individual persons interest on what they would like to do with Hadoop.

Why No?

MapReduce provides Map and Reduce primitives which had been as Map and Fold primitives in the Functional Programming world in language like Lisp from quite some time. Hadoop provides interfaces to code in Java against those primitives. But, any language supporting read/write to STDIO like Perl, Python, PHP and others can also be used using Hadoop streaming feature.

Also, there are high level abstractions provided by Apache frameworks like Pig and Hive for which familiarity of Java is not required. Pig can be programmed in Pig Latin and Hive can be programmed using  HiveQL. Both of these programs will be automatically converted to MapReduce programs in Java.

According to the Cloudera blog entry `What Do Real-Life Hadoop Workloads Look Like?` - Pig and Hive constitute a majority of the workloads in a Hadoop cluster. Below is the histogram from the mentioned Cloudera blog.

Why Yes?

Hadoop and the ecosystem can be easily extended for additional functionality like developing custom Input and OutputFormats, UDF (User Defined Functions) and others. For customizing Hadoop knowledge of Java is mandatory.

Also, many times it's required to get deep into Hadoop code as to why something is behaving a particular way or to know more about the functionality of a particular module. Again knowledge of Java comes handy here.

Hadoop projects come with a lot of different roles like Architect, Developer, Tester, Linux/Network/Hardware Administrator and some of which require explicit knowledge of Java and some don't. My suggestion is if you are genuinely interested in Big Data and think that Big Data will make a difference then deep dive into Big Data technologies irrespective of knowledge about Java.

Happy Hadooping !!!!


  1. Does all new custom implementations in Hadoop required java skills?

    1. Yes, because Hadoop is developed in Java and to customize it you have to know Java.

    2. HOPE we can do from python as well...

  2. Any Site to learn about Hadoop Testing???? Please suggest.....