Friday, December 13, 2013

Unit Testing MapReduce with MRUnit framework

http://www.breconbeacons.org/12things/llangorse-lake
Distributed computing model programs like MapReduce are difficult to debug, so it's better to find and fix the bugs in the early stages of development. In an earlier post we looked into debugging a MapReduce in Eclipse and here we looked at unit testing Hadoop R Streaming programs. In this blog entry, we will look into how to unit test MapReduce programs using Apache MRUnit. In either case there is no need start HDFS and MR related daemons.

MRUnit is a Apache Top Level project and had been there for some time, but is not in the lime light and doesn't get much attention as the case of other projects, but is equally important to get the proper product out in time.

So here are steps:

- Download the below mentioned jar files.

mrunit-1.0.0-hadoop1.jar from http://www.apache.org/dyn/closer.cgi/mrunit
mockito-all-1.9.5.jar from http://code.google.com/p/mockito/
junit-4.11.jar from http://junit.org/

- Copy the hadoop-core-1.2.1.jar, commons-cli-1.2.jar and commons-logging-1.1.1.jar files from the Hadoop installation folder into Eclipse and include them in the project dependencies.

- Create a project in Eclipse and add the above mentioned libraries as dependencies.
- Copy the WordCount.java and the WordCountMapperReducerTest.java files into the project and make sure they are compiled without any exceptions.
- Execute the WordCountMapperReducerTest.java and all the tests should pass without any failure as shown here.
- The WordCountMapperReducerTest.java has the input and the output to both the mapper and the reducer functions. Try modifying the input/output KV pairs for the mappers/reducers in the WordCountMapperReducerTest.java and watch some of the tests fail.

1 comment:

  1. Hi It helps lot, however, you are considering different packages/classes for the testing instead of WordCount example.. Could you explain how you considered
    import org.apache.hadoop.examples.WordCount.IntSumReducer;
    import org.apache.hadoop.examples.WordCount.TokenizerMapper;
    where you got these...

    thanks

    ReplyDelete