Distributed applications are by nature difficult to debug, Hadoop is no exception. This blog entry will try to explain how to put break points and debug a user defined Java MapReduce program in Eclipse.
Hadoop support executing a MapReduce job in Standalone, Pseudo-Distributed and Fully-Distributed Mode. As we move from one more to another in the same order, the debugging becomes harder and new bugs are found on the way. Standalone mode with the default Hadoop configuration properties allows MapReduce programs to be debugged in Eclipse.
Step 1: Create a Java Project in Eclipse.
Step 2: For the Java project created in the earlier step, add the following dependencies (commons-configuration-1.6.jar, commons-httpclient-3.0.1.jar, commons-lang-2.4.jar, commons-logging-1.1.1.jar, commons-logging-api-1.0.4.jar, hadoop-core-1.0.3.jar, jackson-core-asl-1.8.8.jar, jackson-mapper-asl-1.8.8.jar and log4j-1.2.15.jar) in Eclipse. The dependencies are available by downloading and extracting a Hadoop release.
Step 3: Copy the MaxTemperature.java, MaxTemperatureMapper.java, MaxTemperatureReducer.java, MaxTemperatureWithCombiner.java, NewMaxTemperature.java to the src folder under the project. The Sample.txt file which contains the input data should be copied to the input folder. The project folder structure should look like below, without any compilation errors.
Step 4: Add the input and the output folder as the arguments to the MaxTemperature.java program.
Step 5: Execute MaxTemepature.java from Eclipse. There should be no exceptions/errors shown in the console. And on refreshing the project, an output folder should appear as should below on successful completion of the MapReduce job. To rerun the program, the output folder has to be deleted.
Step 6: As in the case of any Java program, break points can be put in the MapReduce driver, mapper, reducer code and debugged.
In the upcoming blog, we will see how to include/compile/debug Hadoop code into Eclipse along with the user defined driver, mapper and the reducer code.
Happy Hadooping !!!!
Note (5th March, 2013) : The above instructions have been tried on Ubuntu 12.04 which has all the utilities like chmod and others, which Hadoop uses internally. These tools are not available by default in Windows and you might get error as mentioned in this thread, when trying the steps mentioned in this blog on a Windows machine.
One alternative it to install Cygwin on Windows as mentioned in this tutorial. This might or might not work smoothly.
Microsoft is working very aggressively to port Hadoop to the Windows platform and has released HDInsight recently. Check this and this for more details. This is the best bet for all the Windows fans. Download the HDInsight Server on a Windows machine and try out Hadoop.
Hadoop support executing a MapReduce job in Standalone, Pseudo-Distributed and Fully-Distributed Mode. As we move from one more to another in the same order, the debugging becomes harder and new bugs are found on the way. Standalone mode with the default Hadoop configuration properties allows MapReduce programs to be debugged in Eclipse.
Step 1: Create a Java Project in Eclipse.
Step 2: For the Java project created in the earlier step, add the following dependencies (commons-configuration-1.6.jar, commons-httpclient-3.0.1.jar, commons-lang-2.4.jar, commons-logging-1.1.1.jar, commons-logging-api-1.0.4.jar, hadoop-core-1.0.3.jar, jackson-core-asl-1.8.8.jar, jackson-mapper-asl-1.8.8.jar and log4j-1.2.15.jar) in Eclipse. The dependencies are available by downloading and extracting a Hadoop release.
Step 3: Copy the MaxTemperature.java, MaxTemperatureMapper.java, MaxTemperatureReducer.java, MaxTemperatureWithCombiner.java, NewMaxTemperature.java to the src folder under the project. The Sample.txt file which contains the input data should be copied to the input folder. The project folder structure should look like below, without any compilation errors.
Step 4: Add the input and the output folder as the arguments to the MaxTemperature.java program.
Step 5: Execute MaxTemepature.java from Eclipse. There should be no exceptions/errors shown in the console. And on refreshing the project, an output folder should appear as should below on successful completion of the MapReduce job. To rerun the program, the output folder has to be deleted.
Step 6: As in the case of any Java program, break points can be put in the MapReduce driver, mapper, reducer code and debugged.
In the upcoming blog, we will see how to include/compile/debug Hadoop code into Eclipse along with the user defined driver, mapper and the reducer code.
Happy Hadooping !!!!
Note (5th March, 2013) : The above instructions have been tried on Ubuntu 12.04 which has all the utilities like chmod and others, which Hadoop uses internally. These tools are not available by default in Windows and you might get error as mentioned in this thread, when trying the steps mentioned in this blog on a Windows machine.
One alternative it to install Cygwin on Windows as mentioned in this tutorial. This might or might not work smoothly.
Microsoft is working very aggressively to port Hadoop to the Windows platform and has released HDInsight recently. Check this and this for more details. This is the best bet for all the Windows fans. Download the HDInsight Server on a Windows machine and try out Hadoop.








Thanks for the tutorial, simple and accurate.
ReplyDeleteI think the source code files you' re pointing to are missing,
but anyway pretty helpful stuff.
Thanks for tip - Fixed it - I am always in a hurry to delete files :)
DeleteHow to write output dir onto HDFS server .. using above process what is code. ?
ReplyDeleteis there a way to switch between the standalone and pseudo-distributed modes?
ReplyDeleteI have tried Eclipse with standalone mode and not with other modes.
DeletePraveen
This comment has been removed by the author.
ReplyDeleteI did follow the steps in Ubuntu 12.04 which has the utilities like chmod and others, while the utilities are not there in Windows. So, cygwin has to be installed on Windows. Check the below thread from the Hadoop forums
Deletehttp://mail-archives.apache.org/mod_mbox/hadoop-common-user/201105.mbox/%3CBANLkTin-8+z8uYBTdmaa4cvxz4JzM14VfA@mail.gmail.com%3E
Thanks for the tip, I will update the blog accordingly.
Here is a tutorial on installing Cygwin/Hadoop on Windows
Deletehttp://v-lad.org/Tutorials/Hadoop/03%20-%20Prerequistes.html
But, I wouldn't bet on this approach since Microsoft is aggressively working to make Hadoop work on Windows.
Thanks for the reply......
DeleteI just removed my emp id from the log and posting the question here again to help other to know what the question is....?
Pardon me if you are gettign confused?
13/03/06 19:37:23 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/03/06 19:37:23 ERROR security.UserGroupInformation: PriviledgedActionException as:cause:java.io.IOException: Failed to set permissions of path: \tmp\hadoop-\mapred\staging\.staging to 0700
Exception in thread "main" java.io.IOException: Failed to set permissions of path: \tmp\hadoop-madhu\mapred\staging\275\.staging to 0700
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:662)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:856)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at maxtemp.MaxTemperature.main(MaxTemperature.java:36)
waiting for your tip to run hadoop map reduce on windows + eclipse...?
DeleteThanks in advance...
Thank You for your posts. I am doing my Master Thesis now and your web entries are very helpful.
ReplyDelete@Madhu Reddy
Why to drill someone's head about windows? Better try to run it on linux. Try to prepare ready-made virtual linux image with pre-installed everything what's needed or try to find one (Virtual Box). It will save your time and you could use this experience in the future when you'll run hadoop for some real work.
Also, Microsoft recently announced HDInsights (http://goo.gl/1LoHk) for running Hadoop on Windows Server and Azure. So, Hadoop will be natively supported on Windows by Microsoft
DeleteBut, not sure how many instances we will see of Hadoop/Windows. Hadoop clusters run into 10/100/1000s nodes and the Windows license has to got for all the machines.
I followed your tutorial, I thought to the letter. I am running into some pretty heavy errors - below. Any thoughts?
ReplyDelete13/04/21 22:09:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/04/21 22:09:27 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/04/21 22:09:27 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
13/04/21 22:09:27 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-hduser/mapred/staging/hduser1546474700/.staging/job_local_0001
Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file://home/hduser/workspace/output, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:381)
at org.apache.hadoop.fs.FileSystem.makeQualified(FileSystem.java:294)
at org.apache.hadoop.fs.FilterFileSystem.makeQualified(FilterFileSystem.java:85)
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:112)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:824)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1261)
at MaxTemperature.main(MaxTemperature.java:33)